(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)
- What is the percent sequenced for the 10 fold coverage in a shot-gun sequencing according to the Poisson calculation. However, the actual percent sequenced is lower than that of the theoretical calculation. What are the possible reasons?
- What is the theoretical number of gaps (using what was explained in the lecture) for the genome size of 2 Mbp, the read length of 500bp for shotgun sequencing, and 5 fold coverage?
- Sketch plots for the following relationships between the variables in shot-gun sequencing and explain the shape of each plot.
(a) the fraction of genome covered vs. the total length of sequencing reads
(b) the number of contigs vs. the total length of sequencing reads
Based on the above plots,
(c) Explain why the commonly used strategy of shot-gun sequencing is chosen such a way.
(d) Explain the discrepancies between the plots drawn by the mathematical formula (i.e., theoretically derived from an ideal situation) and by the data from some real experiments for both plot (a) and (b).
(Within 10 lines for each)
- Draw a Hamiltonian path for the following shot-gun sequencing reads:
sequence 1: TAGG
sequence 2: TTTTA
sequence 3: GTTT
sequence 4: TTTA
sequence 5: ACGT
sequence 6: TTAGG
sequence 7: TTTAG
- A region from a sequencing read shows peaks as noisy (or clean) as the regions with 5 errors in 1000 nt in the standard data. What is the Phred score for the region?