(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)
- In building phylogenetic trees, distances between sequences can be (a) ultrametric, (b) additive, (c) metric, or (d) neither of them. Create 4 distance matrices of 4 sequences, so that each one belongs to one of the above 4 classes of distance (i.e., create a matrix which is ultrametric, another matrix which is additive but not ultrametric, and so forth.)
- We commonly assume that one amino acid change occurs in a million years. However, evolution does not always tick as a clock. Provide some biological examples which deviate from this assumption for the following cases:
(a) Much slow change than one amino acid per million years
(b) Much rapid change than one amino acid per million years
- Write three sequences, each has the same length of 5 nucleotide, that do not obey the triangle inequality for their pairwise alignment scores. Assume that a match scores 1 and a mismatch scores 0. Also, use constant gap penalty of 1 and ignore end gaps. For example,
Now, let us consider the edit distance with the score of 1 for replacement, insertion, and deletion, respectively. Using this edit distance, can you make up 3 sequences (not necessarily of the same lengths) that do not obey the triangle inequality? Explain why edit distance is a more useful measure for distances between sequences in evolutionary study?
- Explain why UPGMA method should not be used for the data which do not obey the triangle inequality. What is the key step in the neighbor joining method that overcomes the weakness of UPGMA?
- Explain why neighbor joining method is better than UPGMA method in building phylogenetic tree.
- What is the most parsimonious unrooted tree for the following multiple alignment when using the maximum parsimony method.
Sequence 1. AAGAGTTCAG
Sequence 2. AGCCGATCTC
Sequence 3. AGATATCCAG
Sequence 4. AGAGAACCTC
- k-means clustering is another powerful algorithm for clustering, but the result of k-means clustering is not quite suitable for phylogenetic tree construction. Why is it so?