Bioinformatics Q&A
  A Bioinformatics and Linux Course
  Writer : Seyeon Weon     Date : 10-19     Hit : 25072    
  트랙백 주소 : (5613)
Bioinformatics and Linux Course (210007) 2007/2008


2007-06-20: Home page for 2007/2008 lunched. Some adjustments will appear. Check out the practical information as well as the schedule. Read very carefully the report requirements!

More news here.

Today, even fairly small labs can generate a substantial amount of biological sequence data. However, to make use of these data in further experimental analysis as well as gaining insight in biological problems, efficient and task specific computational tools are required. Even though some of these tools are available online, they are often limited in resources and requires much manual intervention. This is true even of many otherwise user-friendly commercial bioinformatics packages: they automate some analysis tasks well, but make the automation of other tasks substantially more complicated and demanding than necessary.

The bioinformatic challenges in designing methods and specific tools can be well coped with in the Linux (Unix) operating system. In addition, specific tools and existing programs, can well be implemented and integrated by the scripting language Perl.

Motivation, however, is more than just applying tools, but also to use and develop them to gain insight in molecular mechanisms. Finally, it can be a time saving factor, as expressed by Alan Bleasby (The Biochemist, Oct. 1997):
Two months in the lab can easily save an afternoon on the computer.

The aim of the course is to introduce basic concepts of bioinformatics, along with basic concepts of Linux and the Perl script language such that the student is able to construct the needed tools for automated analysis of biological sequences. The student should also aquire basic skills to navigate in a bioinformatics command line environment.

Intended audience
The course is mainly intended to students with little or no skills in Linux and Perl. We can also recommend the course to researchers interested in improving their skills in handling large scale data.

Lectures and Exercises
Even though the lectures mainly will provide an overview of relevant topics (and go in detail with the hard ones), students will be expected to take actively part of the problems discussed in the lectures. Not all reading material will be addressed during the lectures, but will still be a necessity for the exercises. In the exercises the students will be faced with a number of practical bioinformatic problems they should solve. If the exercises are not completed within the available time, they are to be considered as home work and should be completed before next week exercises. Note that exercises might be based on previous solved exercises. For the exercises we will this year use bootable cd's with a complete linux installation on them. You should bring a usb memory stick or use secure copy (scp) to store your data from the exercises.

The main topics in the course:

    * Introduction to basic concepts in bioinformatics.
    * Introdcution to linux (the command line etc.).
    * Perl and shell scripts.
    * Conversion between dataformats (incl. processing of blast output).
    * Implementing alignment algorithms.
    * Applying bioinformatics algorithms.
    * Integrating bioinformatics software.

Required skills
(Course numbers to be updated.)
Basic knowledge to computers, and the KVL courses "matematisk grundkursus" and "statistisk grundkursus" or equivalent. In addition one of the courses "Molekylær genetik", "Molekylær cellebiologi". "Cellebiologi", "Genetik og husdyravl", or equivalent is also recommended.

To pass the course, each student must carry out an individual mini-project, implementing a bioinformatics algorithm, and writing a report. See the details here. In addition the report will also contain a detailed description of a topic from one of the lectures. Also

    * Reports and thereby the course will be graded by the ``7-scala''.
    * Points: 7.5 ECTS.

Course type
Common for both bachelor and Master of science.

Reading material
Course will be based on the following books, but other material might also be handed out:

    * Developing Bioinformatics Computer Skills. Cynthis Gibas and Per Jambeck.
      (Check out errata of the book.)
      O'Reilly & Associates Inc. Sebastopol, CA, USA.
      (We refer to this book in the schedule as the bioinf-book.)
      Order it on amazon.
    * Learning Perl, 4th Edition.Randal L. Schwartz, Tom Phoenix and brian d foy
      (UPDATE Check out errata of the book.)
      O'Reilly & Associates Inc. Sebastopol, CA, USA.
      (We refer to this book on the schedule as the perl-book.)
      Order it on amazon.

Supplementary material

    * Beginning Perl for Bioinformatics. James Tisdall.
      O'Reilly & Associates Inc. Sebastopol, CA, USA.

Practical information
How to find the course @ KVL:
The course has KVL number: 210007.
Time and Place
The course takes place in the period November 12th, 2007 to January 25th, 2008. Lectures and (computer) exercises will be from November 12th, 2007 to December 19th, 2007.
Lectures: Monday 8:30-10:00 Room to be announced.
Wednesday 9:30-11:30 Room to be announced.
Exercises: Monday 10:00-12:00 Room to be announced.
Wednesday 13:00-16:30 Room to be announced.

Reports carried out
Reports are written in the period November 21st, 2007 to January 27th, 2008 and is due January 25th, 2008, 2pm. See details here.

    * Jan Gorodkin (JG), ( Course responsible.
    * Jacob Engelbrecht (JE) (
    * Karsten Scheibye-Knudsen (KSK), ( [Teaching assistant].
    * Elfar Torarinsson (ET), ( [Teaching assistant].
    * Stefan Seemann (SS), ( [Teaching assistant].

When will this course be available again?
The next time is planned for the fall (Nov-Jan) 2008.

The schedule is preliminary and will updated as the course progresses. So changes are very likely to appear. Links to lecture slides remain empty until just before or after the lectures. You need a password to download the lecture slides.

Week 1: Introduction
    Lecture (Monday 2007-11-12): Welcome / Introduction to bioinformatics and linux (JG)
    Reading: Bioinf-book, chapters: 1, 2, + 4 (pages: 3-44 + 64-86).
    Lecture (Wednesday 2007-11-14): Intro continued and the Unix shell (JG)
    Reading: Bioinf-book, chapter: 5 (pages: 87-130)
    Exercise keywords:
    What is bioinformatics; Linux; the command line; the file system.
    Running bioinformatics programs; Redirecting data to files; piping data to other programs.
    Lecture slides: 2007-11-12.pdf 2007-11-14.pdf.

Week 2: Introduction to Perl
    Lecture (Monday 2007-11-19): Your first(?) perl script (JG)
    Reading: Perl-book, chapter: 2 (pages: 18-37).
    Lecture (Wednesday 2007-11-21): Storing data in perl scripts (JG)
    Reading: Perl-book, chapters: 3 + 5 (pages: 38-52 + 88-99).
    Exercise keywords:
    Read/write dna and protein sequence data; strings; basic functions.
    Storing a DNA and protein sequence.
    Lecture slides: 2006-11-19.pdf 2006-11-21.pdf.
    2007-11-21: Selection of project topic presentation.

Week 3: Manipulating data
    Lecture (Monday 2007-11-26): Read, compute, and write on sequence data (JE)
    Reading: Perl-book, chapters: 4 + 5 (pages: 54-67 + 68-86).
    Lecture (Wednesday 2007-11-28): Sequence data by regular expression (JE)
    Reading: Perl-book, chapters: 7, 8, + 9 (pages: 100-106, 107-119 + 121-134).
    Exercise keywords:
    Compute alignment scores ; Sequence identity in alignments ; codon translator ; reverse complement.
    Converting GenBank format to fasta format (gb2fasta script)
    Lecture slides: 2007-11-26.pdf 2007-11-28.pdf.

Week 4: More perl, more bioinformatics introduction and course evaluation
    Lecture (Monday 2007-12-03): Course evaluation and Supplement concepts of perl (JE)
    Reading: Perl-book, chapter: 10 (pages: 135-152) and Pearson WR: Protein Sequence Comparison and Protein Evolution ISMB2000 Tutorial 53 pages (pages 1-36).
    Lecture (Wednesday 2007-12-05): More sequence data by regular expression (JE)
    Reading: Bioinf-book, chapter: 7 (pages: 159-190), and Tisdall: Beginning Perl for Bioinformatics, pages 274-290.
    Exercise keywords:
    Pairwise alignments, dynamical programming, database search, expectation value.
    Parsing and processing BLAST output BLAST course.
    Lecture slides: 2007-12-03.pdf 2007-12-05.pdf.

Week 5: Integrating and creating bioinformatic tools
    Lecture (Monday 2007-12-10): Using XML and web based technologies in bioinformatics (JE)
    Reading: Material from A. Møller and M. Schwartzbach: An Introduction to XML and Web Technologies.
    Lecture (Wednesday 2007-12-12): Algorithms for pairwise sequence alignments and SVG graphics presentation (JE)
    Reading: Brush up on at least Section 2 of Pearson's notes and the bioinformatics book, Chapter 7. Material on SVG.
    Exercise keywords:
    [Peter Key words to first lecture.] Implementation of Needleman-Wunsch and Smith-Waterman alignments.
    Lecture slides: 2007-12-10.pdf 2007-12-12.pdf.

Week 6: Multiple sequence alignments and RNA structure and folding
    Lecture (Monday 2007-12-17): Multiple alignments (JG)
    Reading: Bioinf-book, chapter 8.
    Lecture (Wednesday 2007-12-19): Basic RNA folding algorithms (JG)
    Reading: Computational Genomics of Noncoding RNA Genes, S. R. Eddy, Cell, Vol 109, 137-140, 19 April 2002.; Non-Coding RNA genes and the modern RNA World, S. R. Eddy, Nature Reviews Genetics, 919 - 929, 2001.; Pages 260-272 in ``Biological sequecne analysis'', by R. Durbin, S. Eddy, A Krogh, and G Mitchison, Cambridge University Press, 1998.
    Exercise keywords:
    ClustalW; Weight matrices; information content of alignments; motif search.
    RNA structure, non-coding RNA genes, folding, dynamical programming, the Nussinov algorithm.
    Lecture slides: 2007-12-17.pdf 2007-12-19.pdf.

Week 9: January 25th, 2pm: Reports handed in.

Comments, questions, etc., email