Bioinformatics Exercises
  A.Problems. Gene Finding
(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)
  1. List possible reasons that produce the characteristic nucleotide patterns of protein coding regions. (No 7 line limitation.) A single set of Markov models can recognize most of the protein coding regions in a prokaryote species. (We are not talking about an overfitting or memorization problem here.) Why is it possible? (within 7 lines)
  2. Usually, about 90% of a prokaryote genome is occupied by protein coding regions. That makes prokaryote gene finding a relatively easy problem, comparing to eukaryote gene finding. The computational models for prokaryote gene finding are commonly based on the Markovian property of protein coding region. (a) What would be the biological reasons that give rise to the Markovian property of protein coding region? (b) The glimmer algorithm is one of the commonly used tools for prokaryote gene finding. The so-called Zipf’s law distribution of oligonucleotide frequencies plays some role in the algorithm. Briefly explain it.
    (Within 10 lines for each)
  3. Similar phenomenon as Zipf's law can also be observed in DNA sequence. Briefly explain how this fact has improved the power of prokaryote gene finding method.