Bioinformatics Exercises
  C.Problems. Computer Science
  Writer : Seyeon Weon   Updated : 10-14   Hit : 2236   Updates 
(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page.)
  1. Assume that you are given a sorted array, A1, and trying to find out an item in the array. One obvious method is searching the array from the top toward the bottom, which has the computational complexity of O(n), where n is the size of the array. Devise other method that has better asymptotic complexity than O(n). If a new array, A2, is an unsorted one, what can be the best asymptotic complexity? If you have to search the array, A2, many times, then what should you do to obtain a better overall performance?
  2. Often you have to deal with oligomers of nucleotide sequence or protein sequence. However, you need some smarter methods here since a simple array is not always suitable to represent oligomers. For example, if you want represent all combinations of 3mers of nucleotide, you can use an one dimensional array with 64 elements. But, a problem arises when accessing each oligomer in the array. Since you often want to access it sequentially, you'd better use a multidimensional array like a[4][4][4] instead of one dimensional array a[64]. Also, often the length becomes long, such as 25mer or 30mer, and then only a small portion of the possible combinations are actually relevant. Furthermore, if the length becomes much longer, you cannot statically create the multidimensional array for the oligomer because of the memory space limitation. In this situation, what kind of data structure would you use? Explain some detail, but no more than 7 lines in total as usual.
  3. Ethernet is perhaps the RNA of Internet. One of the major reasons for its success is efficiency. However, it is not wise to use ethernet as the computer network for a mission critical job (e.g., control mechanism for an airplane). Explain the reason.
  4. Whenever it can be applied, a greedy algorithm is a good choice since you can accomplish lower computational complexity than other methods such as dynamic programming. There are two general requirements to be able to apply greedy algorithm; greedy-choice property and optimal substructure. Greedy-choice property is that a globally optimal solution can be arrived at by making a locally optimal (greedy) choice. Also, a problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. The latter property is common to both dynamic programming and greedy algorithm. But, the former one is only for greedy algorithm (among these two methods). Explain Smith-Waterman algorithm in this context. In other word, answer the question that why we cannot use greedy algorithm for sequence alignment?
  5. The infinite sequence of digits in an irrational number, such as the square root of 2, can be served as a good source for random numbers. What is the problem when using it in practice?
  6. You are trying to write your own parser for GenBank flat file. Write a set of grammatical rules that can be used for it. The parser should recognize fields (i.e., LOCUS, KEYWORDS, etc), and don't try to write more than that at least for this assignment since you will be doing too much. Of course, make your own one. (No 7 lines limitation.)