(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)
- The so-called Zipf’s law can be observed in many different situations. Using your own favorite tools (R, perl, gnulplot, etc.), (a) draw the plots for the Zipf’s law distribution of oligonucleotide frequencies of some genomic sequence. (Pick up your own favorite one from GenBank.) Make plots for monomers through octamers. (b) What can you tell about them? (c) If you transform the plots to log-log plots for the so-called power-law distribution, what would be the values for the exponents?
- Obtain one of the following sets of sequences and write a program to calculate the information content of individual sites.
- exon-intron boundary sequence of human
- RNA polymerase II promoter sequence of human
- any other sequence that you are interested in
Also, create a Sequence Logo using the web service at http://weblogo.berkeley.edu with the sequences.
- ClustalW is the most widely used software for multiple alignment. Describe how the following important problems in multiple alignment have been overcome in ClustalW. (within 7 lines each.)
(1) The problem that the multidimensional extension of Smith-Waterman algorithm is not computationally practical.
(2) The problem that the order is important for progressive pairwise alignment.
(3) The sequence weighting problem in multiple alignment.
(4) The gap propagation problem in multiple alignment.