A practical class in programming for biologists

MolGen Class “A Practical Course in Programming for Biologists”

Philip M. Kim and Gary D. Bader

This course is designed to teach experimental biologists the basics and hands-on knowledge of bioinformatics programming. In today’s world, most graduate students in the Molecular Genetics will encounter situations where they have to make use of computational tools and deal with large amounts of data. The main objective of this class is to give students the power of automation via Bioinformatics programming. The class teaches by example and makes students comfortable with doing basic programming in perl, adapting existing programs to their need and interfacing with standard bioinformatics software such as BLAST, clustalW or others. High level views of bioperl and R are covered as well.

The class is a standard MolGen module, covering six weeks, each week there will be a total of 2.5 hours of class split into two 1.25 hour sessions. In each week, ~one homework assignments will be handed out. Completing assignments is the best way to learn programming and is an integral component to the class and a large portion of the grade will be based on it. We emphasize here that this will be a relatively time-consuming class. The effort will be worth it, the best way of learning programming is by performing the assignments.

Each 45 min lecture is preceded by 30 minutes of recitation section, during which the solution to the homework assignment will be discussed. Both the lecture and the recitation section are meant to be highly interactive, which is why the class initially is limited to 15 students. There will be in-class labs and exercises, so each student is expected to bring a laptop computer.

Users of Mac OS X or Linux computers should ensure that a terminal application (e.g., "Terminal" for OS X or "xterm" for Linux) is installed. Users of Windows computers are asked to install the cygwin environment. All students should make sure that perl is installed on their computers (typing "perl -h" in a terminal window will reveal this) and should install a text editor application. Recommended choices are komodo edit or GNU emacs.

We recommend the textbook "Learning Perl, 5th Edition". However, a number of very good (and free) online resources also exist, such as perldoc.perl.org and beginning perl. Also, a number of handy reference tables exist, such as this little cheatsheet.

Final Project:
Every student is expected to complete a final project, which he/she will start after finishing the last class. Ideally, students choose their own projects and it is related to their research. It should be a data analysis or programming project that requires writing a program of >100 lines of code and use a large fraction of the material covered in class. Every student should submit a short (one paragraph) description of their proposed project to the instructors by Nov 21 at the latest for approval. Students who do not submit a proposal will get a project assigned by the instructors. The final project will compose 40% of the grade.

Date and Time:
October 17th - December 2nd, 2011
Mondays, 2:30-3:45pm and Fridays, 1:30-2:45pm
Donnelly Centre, 6th Floor meeting room

Instructor contact info:
Philip M. Kim, Rm 606, Donnelly Centre, pm.kim (at) utoronto
Gary D. Bader, Rm 602, Donnelly Centre, gary.bader (at) utoronto

TA contact info:TBA

General course email (email your homework assignments here): progclass@kimlab.org

Syllabus:

Week 1: Goal getting comfortable with basic tools

Lecture 1

Introduction to programming. Why programming? Typical problems.
Getting comfortable with the UNIX shell (command line), basic shell commands. Exercises to use core utilities such as grep, cut, head, tail and redirecting output

Data to be used: class1 data
Lecture notes: pdf. First assignments: Lab1 and first_script

Lecture 2

- Perl Statements, Basic Syntax and Variables
- String variables and operators – how to manipulate string data

Class 2 notes and sample code: class2  In-class exercises: inclass2
No assignment today

Week 2: Basic perl

Lecture 3

- Arrays and lists
- File input/output (I/O)

Class 3 notes and sample code: class3
Assignment: Lab2

SolutionSolution lab2

Lecture 4

- More on arrays and lists
- Intro to loops and conditional statements

Class 4 notes and sample code: class4  In-class exercises: inclass4  Slides: class4-slides (slides and solutions will be posted after class)
 

Week 3: Writing more complex scripts

Lecture 5

- More on loops and flow control. Common programming patterns
- Hashes

Class 5 notes and sample code: class5 In-class exercises: inclass5
Assignment: Lab3

Solution: Solution Lab 3

Lecture 6

- More on Hashes, more on string manipulation

- Writing more complex programs: Devising subroutines for common tasks

Class 6 notes and sample code: class6  In-class exercises: inclass6


Week 4: Parsing and advanced regular expressions

Lecture 7

-Regular expressions, REALLY manipulating strings

Please see this link for an introduction on regular expressions: http://perldoc.perl.org/perlretut.html

Class 7 sample code: Class7sampleCode   Inclass-solutions   LectureSlides
Assignment: Lab4
Solution: Solution 4

Lecture 8

-Continuing regular expressions

- Interfacing with external programs, recipes, reading directories

Class 8 sample code: Class8sampleCode   Inclass-solutions   LectureSlides


Week 5: Intro to the R programming language (statistical programming language)

Lecture 9

- Intro to R, variables and constructs in R
- basic I/O

Class 9 notes and sample code: class9 In-class exercises: inclassLectureSlides9

Assignment: Lab5

Solution: Solution 5


Additional projects can be found here.

Lecture 10

- Basic visualization in R
- Statistical tests and analyses in R

Samples of fancy R plots here 

Class 10 notes and sample code: class10 In-class exercises: inclass10 LectureSlides10

Week 6: The R programming (continued)

Lecture 11: Intro to Bioconductor in R

- Installing and using bioconductor modules in R

- Basic bioconductor usage, working with expression data

Note: Make sure you install Bioconductor HOWTO : Install Bioconductor

And also the affy package, by running the following commands.

source("http://www.bioconductor.org/biocLite.R")
biocLite("affy")

Other packages may need to be installed in class this week over the internet, so ensure your computer is connected when in class.

Class 11 sample code and data solution code Slides

Assignment: Lab6

Lecture 12: Bioconductor in R (continued)

- Continuing with bioconductor - working with sequence data

Class 12 sample code and data additionalSampleCode Slides  solution-code

Č
ċ
ď
sol4.tar
(10k)
Roland Arnold,
Nov 15, 2011 7:08 PM
ċ
ď
solution.tar
(20k)
Programming Class,
Nov 24, 2011 2:50 PM
ċ
ď
solution_lab2.tar
(20k)
Programming Class,
Oct 31, 2011 10:30 AM
ċ
ď
solution_lab3.tar.gz
(125k)
Programming Class,
Nov 11, 2011 11:45 AM

Recent site activity