A practical class in programming for biologists
MolGen Class “A Practical Course in Programming for Biologists”
Philip M. Kim and Gary D. Bader
This course is designed to teach experimental biologists the basics and hands-on knowledge of bioinformatics programming. In today’s world, most graduate students in the Molecular Genetics will encounter situations where they have to make use of computational tools and deal with large amounts of data. The main objective of this class is to give students the power of automation via Bioinformatics programming. The class teaches by example and makes students comfortable with doing basic programming in perl, adapting existing programs to their need and interfacing with standard bioinformatics software such as BLAST, clustalW or others. High level views of bioperl and R are covered as well.
The class is a standard MolGen module, covering six weeks, each week there will be a total of 2.5 hours of class split into two 1.25 hour sessions. In each week, ~one homework assignments will be handed out. Completing assignments is the best way to learn programming and is an integral component to the class and a large portion of the grade will be based on it. We emphasize here that this will be a relatively time-consuming class. The effort will be worth it, the best way of learning programming is by performing the assignments.
Each 45 min lecture is preceded by 30 minutes of recitation section, during which the solution to the homework assignment will be discussed. Both the lecture and the recitation section are meant to be highly interactive, which is why the class initially is limited to 15 students. There will be in-class labs and exercises, so each student is expected to bring a laptop computer.
Users of Mac OS X or Linux computers should ensure that a terminal application (e.g., "Terminal" for OS X or "xterm" for Linux) is installed. Users of Windows computers are asked to install the cygwin environment. All students should make sure that perl is installed on their computers (typing "perl -h" in a terminal window will reveal this) and should install a text editor application. Recommended choices are komodo edit or GNU emacs.
We recommend the textbook "Learning Perl, 5th Edition". However, a number of very good (and free) online resources also exist, such as perldoc.perl.org and beginning perl. Also, a number of handy reference tables exist, such as this little cheatsheet.
Every student is expected to complete a final project, which he/she will start after finishing the last class. Ideally, students choose their own projects and it is related to their research. It should be a data analysis or programming project that requires writing a program of >100 lines of code and use a large fraction of the material covered in class. Every student should submit a short (one paragraph) description of their proposed project to the instructors (email@example.com) by Oct 17, 2016 at the latest for approval. Students who do not submit a proposal will get a project assigned by the instructors. The final project will compose 40% of the grade. It will be due on Dec 1, 2016.
Date and Time:
September 12th - October 28th, 2016
Mondays, 2pm-3:15pm and Fridays, 2pm -3:15pm (No class on Sep 23 and Oct 10)
Donnelly Centre, Red Seminar Room
Instructor contact info:
Philip M. Kim, Rm 606, Donnelly Centre, pm.kim (at) utoronto
Gary D. Bader, Rm 602, Donnelly Centre, gary.bader (at) utoronto
TA contact info:
Minggao Liang (PGCRL) firstname.lastname@example.org (Office hours, Thu 11am-noon, PGCRL)
Hamed Heyday (Donnelly) email@example.com (Office hours, Mo 10am-11am, Donnelly)
General course email (email your homework assignments here): firstname.lastname@example.org
Week 1: Goal getting comfortable with basic tools
Introduction to programming. Why programming? Typical problems.
Getting comfortable with the UNIX shell (command line), basic shell commands. Exercises to use core utilities such as grep, cut, head, tail and redirecting output
Data to be used: class1 data
- Perl Statements, Basic Syntax and Variables
- String variables and operators – how to manipulate string data
No assignment today
Week 2: Basic perl
- Arrays and lists
- File input/output (I/O)
- More on arrays and lists
- Intro to loops and conditional statements
Week 3: Writing more complex scripts
- More on loops and flow control. Common programming patterns
- Writing more complex programs: Devising subroutines for common tasks
Class 6 lecture slides: Module7
In-class sample code: Module7
In-class exercise solutions: Module7-sols
Week 4: Parsing and advanced regular expressions
-Regular expressions, REALLY manipulating strings
Please see this link for an introduction on regular expressions: http://perldoc.perl.org/perlretut.html
Class 7 lecture slides: Module8
In-class sample code: Module8
In-class exercise solutions: Module8-sols
-Continuing regular expressions
- Interfacing with external programs, recipes, reading directories
Class 8 lecture slides: Module9
In-class sample code: Module9
In-class exercise solutions: Module9-sols
Week 5: Intro to the R programming language (statistical programming language)
- Intro to R, variables and constructs in R
- basic I/O
Class 9 lecture slides: Module10
Sample code: Module10
In-class exercise solutions: Module10-sols
- More R
Samples of fancy R plots here
Class 10 lecture slides: Module11
Sample code: Module11
In-class exercise solutions: Module11-sols
Week 6: The R programming (continued)
Lecture 11: More R
Class 11 lecture slides: Module12
In-class exercise solutions: Module12-sols
- Installing and using bioconductor modules in R
- Basic bioconductor usage, working with expression data
Note: Make sure you install Bioconductor HOWTO : Install Bioconductor
And also the affy package, by running the following commands.
Other packages may need to be installed in class this week over the internet, so ensure your computer is connected when in class.
Lecture 12: Intro to python and course recap
Class 12 lecture slides: Module14
Sample code link: Module14
In-class exercise solutions: Module14-sol
Note: Some of the course material was adapted (with permission from Prof Chad Myers) from course CSCI2003 at U of Minnesota.