Bioinformatics Exercises
  A.Problems. Genome Sequencing
  Writer : Seyeon Weon   Updated : 10-14   Hit : 2464   Updates 

(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)

  1. What is the percent sequenced for the 10 fold coverage in a shot-gun sequencing according to the Poisson calculation. However, the actual percent sequenced is lower than that of the theoretical calculation. What are the possible reasons?
  2. What is the theoretical number of gaps (using what was explained in the lecture) for the genome size of 2 Mbp, the read length of 500bp for shotgun sequencing, and 5 fold coverage?
  3. Sketch plots for the following relationships between the variables in shot-gun sequencing and explain the shape of each plot.

    (a) the fraction of genome covered vs. the total length of sequencing reads
    (b) the number of contigs vs. the total length of sequencing reads

    Based on the above plots,
    (c) Explain why the commonly used strategy of shot-gun sequencing is chosen such a way.
    (d) Explain the discrepancies between the plots drawn by the mathematical formula (i.e., theoretically derived from an ideal situation) and by the data from some real experiments for both plot (a) and (b).
    (Within 10 lines for each)
  4. Draw a Hamiltonian path for the following shot-gun sequencing reads:
    sequence 1: TAGG
    sequence 2: TTTTA
    sequence 3: GTTT
    sequence 4: TTTA
    sequence 5: ACGT
    sequence 6: TTAGG
    sequence 7: TTTAG
  5. A region from a sequencing read shows peaks as noisy (or clean) as the regions with 5 errors in 1000 nt in the standard data. What is the Phred score for the region?