Bioinformatics Exercises
  B.Problems. Sequence Analysis III
  Writer : Seyeon Weon   Updated : 10-14   Hit : 2214   Updates 
(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)
 
Write perl scripts for the following common tasks:
(Do not submit the entire results. Submit only what are told so, which are mostly perl scripts.)
  1. Table Reading and UPGMA Clustering:

    Write a perl script for UPGMA clustering using bioperl. You should write your own code for table reading capability.

  2. More Table Reading:

    Write another perl script that can collect data from the 9 files in this directory. Submit the perl script and the resulting file. Since this data is actually counts of trinucleotides, you could use a list or a hash of a fix length of 64. However, don't do this way. Just assume as if you were counting for 15mers or so.

  3. File Management:

    This file is a result of 64 samples using an extremely poor DNA sequencer. Not only the machine has poor read length, but also it has poor computer interface. It produces readouts as a plain text file of bases in one line per sample. Write a perl script that splits the file into separate files, one for each sample. A user can provide the name of the first file and the names for other files are generated automatically following a meaningful scheme. If the name of the first file is not provided by user, the name should follow the name of the last file in the directory. If the directory is empty and the name of the first file is not provided, send user an error message. Also, check whether the names overlaps with other files in the directory.

  4. Calling Other Programs:

    This file contains 3,935 observations in 5 different conditions. We want to find out whether there is any correlation between the conditions. Create the scatter plots for all 10 pairwise combinations by calling gnuplot directly from the perl script. Submit the perl script and the resulting plots in postscript format.

  5. Correlation Coefficient:

    Calculate Pearson correlation coefficients for all 10 pairwise combinations in problem 4. Also, remove outliers. Submit the perl script as well as the results.

Up