Bioinformatics Exercises
  A.Problems. DNA Microarrays
  Writer : Seyeon Weon   Updated : 10-19   Hit : 4213   Updates 
(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)

(More basic questions on DNA microarray data analysis are in "A.Questions. DNA Microarrays".)

  1. Several expression profile methods are in use and DNA microarray is the most popular one among them. DNA microarray can measure the amounts of a large portion of mRNAs in a sample. However, measuring the amounts of mRNAs can only tell us so much. It does not directly answer the questions you are asking such as how cells work to perform their functions, what went wrong in this disease, and so on. With this in mind, answer the following general questions about DNA microarray: (Within 10 lines for each)
    1. What would be the major reasons that make it popular despite of its limitations?
    2. Briefly summarize the current commonly used methods to gain some useful knowledge after obtaining the data from DNA microarray experiments.
    3. The amounts of mRNAs for transcription factor genes are relatively small comparing to other structural genes such as the genes involved in glycolysis pathway. Then, how would these regulatory genes appear in DNA microarray experiments
  2. One typical and naive way of thinking about DNA microarray experiment is “finding genes highly expressed in this condition.” (a) Criticize what is wrong with it. By doing experiments with several samples or by doing time course experiments, we can obtain correlations of the amounts of mRNAs between genes in different samples, conditions, or time. This kind of “finding genes behaving similarly in these conditions” can tell us so much albeit it is currently the most popular way of doing DNA microarray experiments. (b) Briefly explain the limitation. (Within 10 lines for each)
  3. Your (three) samples were from three different conditions of one cell type. Since there are three combinations between them, you have decided to do the differential hybridization using three DNA microarrays (of the same kind), one for each combination. Now, you have just obtained the results of your first ever DNA microarray experiment and knowing that the data is not interpretable at all frustrates you. Then, what do you think that you have done wrong? In fact, DNA microarray should never be used in this way. Explain why.
  4. Now, you've just obtained new data from your second DNA microarray experiment. This time you've done a bit better. The same 3 different conditions as in problem 3 were applied, for this time, to 2 different types of cells. One cell type is the one in problem 3 and the other cell type is actually a mixture of several cell types from the organism. Differential hybridizations were conducted between these two cell types and repeated for 3 different conditions. (Therefore, the number of DNA microarrays used is still the same as in problem 3.) After image analysis and some necessary preprocessing, there are 7 genes with conspicuous difference in their expression levels in the conditions A, B, and C. The data is in the following table:

    Condition A
    Condition B
    Condition C
    Gene #1
    Gene #2
    Gene #3
    Gene #4
    Gene #5
    Gene #6
    Gene #7

    (plain text file of above table)

    Find out the clusters of genes that show similar behavior. Submit the result of UPGMA method. You can use any tool you like to do the calculation.
  5. One thing we can do beyond the simple grouping is drawing graphs or networks. Commonly, genes become nodes in these graphs and edges are drawn by assigning some type of relation based on the expression levels of mRNAs. Besides the fact that the amount of mRNA is not always correlated with the level of gene expression, of which every biologist must be aware, what other weakness do these kinds of attempt have? (Within 20 lines)
  6. After wet-lab experiments with DNA microarray, spot quantitation is important since it is the step which gives us the numbers for further analysis. Briefly answer the following questions about the spot quantitation step: (Within 10 lines for each)
    1. Combining both intensity segmentation and spatial segmentation can give us better results. Briefly explain why.
    2. Why is median used instead of mean?
    3. Briefly explain how we can deal with the background intensity.
  7. The spot intensities of DNA microarray form something similar to log-normal, Benford’s law, or power-law distribution. Since we do not understand the phenomena in detail, some of the questions below are without a set of correct answers. Still, briefly write your own opinions about them. (Within 10 lines for each)
    1. Do you think the distribution of the amounts of mRNAs in a cell also forms the similar distribution?
    2. Enumerate the steps or apparatus that can alter the distribution of the intensities in DNA microarray experiments.
    3. Can you think of any possible reason of having Benford’s law distribution of the spot intensities?
    4. The tail of the distribution (i.e., bright spots) seems to form a power-law distribution. Can you think of any possible reason?
  8. Enumerate the cases that may give rise to the missing values in DNA microarray data. Also, briefly describe how to deal with the missing values. (Within 20 lines)
  9. The so-called two color experiment with DNA microarray is a case of the so-called paired comparison design of experiments, which is in turn a special case of a more general type of design called the randomized block design. There are two main purposes (noise reducing and volume increasing) of statistical design of experiments and the randomized block design is the most common type of experimental design to achieve noise reduction. Explain the advantage of the two color experiment in statistical sense so that your slow biologist colleagues, who have not converted themselves yet to statistics and mathematics, can understand what is going on. Also, could you devise some experiment with DNA microarray that can give us a volume increasing design? (Within 30 lines)
  10. For the lowess normalization of DNA microarray data the so-called MA plot is used, in which log(R/G) of each gene is plotted against the sum of both colors of each gene. (a) Even though we do not understand it fully yet, could you provide some possible explanation why there is a systematic variation with the summed intensities of both colors? (b) Write how you would explain why normalization is necessary to your biologist colleagues as above problem. (Within 10 lines for each)
  11. There is a test for a disease, which is known to occur 1 per 10,000 people in Korean population. The test is known to produce positive results for 98% of the people with the disease. But the test is also known to produce positive results for 1% of the people without the disease. Suppose that you have just received a positive result of the test. What would be the probability that you actually have the disease? In this context, explain why current DNA microarray is not suitable yet to be used for such purpose.
  12. SOM(Self-Organizing Maps) has an unique advantage over other clustering methods, especially over k-means clustering. Briefly explain what it is.