A practical class in programming for biologists

MolGen Class “A Practical Course in Programming for Biologists”

Philip M. Kim and Gary D. Bader

This course is designed to teach experimental biologists the basics and hands-on knowledge of bioinformatics programming. In today’s world, most graduate students in the Molecular Genetics will encounter situations where they have to make use of computational tools and deal with large amounts of data. The main objective of this class is to give students the power of automation via Bioinformatics programming. The class teaches by example and makes students comfortable with doing basic programming in perl, adapting existing programs to their need and interfacing with standard bioinformatics software such as BLAST, clustalW or others. High level views of bioperl and R are covered as well.

The class is a standard MolGen module, covering six weeks, each week there will be a total of 2.5 hours of class split into two 1.25 hour sessions. In each week, ~one homework assignments will be handed out. Completing assignments is the best way to learn programming and is an integral component to the class and a large portion of the grade will be based on it. We emphasize here that this will be a relatively time-consuming class. The effort will be worth it, the best way of learning programming is by performing the assignments.

Each 45 min lecture is preceded by 30 minutes of recitation section, during which the solution to the homework assignment will be discussed. Both the lecture and the recitation section are meant to be highly interactive, which is why the class initially is limited to 15 students. There will be in-class labs and exercises, so each student is expected to bring a laptop computer.

Users of Mac OS X or Linux computers should ensure that a terminal application (e.g., "Terminal" for OS X or "xterm" for Linux) is installed. Users of Windows computers are asked to install the cygwin environment. All students should make sure that perl is installed on their computers (typing "perl -h" in a terminal window will reveal this) and should install a text editor application. Recommended choices are komodo edit or GNU emacs.

We recommend the textbook "Learning Perl, 5th Edition". However, a number of very good (and free) online resources also exist, such as perldoc.perl.org and beginning perl. Also, a number of handy reference tables exist, such as this little cheatsheet.

Final Project:

Every student is expected to complete a final project, which he/she will start after finishing the last class. Ideally, students choose their own projects and it is related to their research. It should be a data analysis or programming project that requires writing a program of >100 lines of code and use a large fraction of the material covered in class. Every student should submit a short (one paragraph) description of their proposed project to the instructors (progclass@kimlab.org) by Oct 17, 2016 at the latest for approval. Students who do not submit a proposal will get a project assigned by the instructors. The final project will compose 40% of the grade. It will be due on Dec 1, 2016.

Date and Time:

September 12th - October 28th, 2016

Mondays, 2pm-3:15pm and Fridays, 2pm -3:15pm (No class on Sep 23 and Oct 10)

Donnelly Centre, Red Seminar Room

Instructor contact info:

Philip M. Kim, Rm 606, Donnelly Centre, pm.kim (at) utoronto

Gary D. Bader, Rm 602, Donnelly Centre, gary.bader (at) utoronto

TA contact info:

Minggao Liang (PGCRL) m.liang@mail.utoronto.ca (Office hours, Thu 11am-noon, PGCRL)

Hamed Heyday (Donnelly) hheydarii@gmail.com (Office hours, Mo 10am-11am, Donnelly)

General course email (email your homework assignments here): progclass@kimlab.org

Syllabus:

Week 1: Goal getting comfortable with basic tools

Lecture 1

Introduction to programming. Why programming? Typical problems.

Getting comfortable with the UNIX shell (command line), basic shell commands. Exercises to use core utilities such as grep, cut, head, tail and redirecting output

Data to be used: class1 data

Lecture notes: PPT. First assignments: Lab1

Lecture 2

- Perl Statements, Basic Syntax and Variables

- String variables and operators – how to manipulate string data

Class 2 notes: Module1 sample code: Module1 In-class exercise solutions: Module1_solutions

No assignment today

Week 2: Basic perl

Lecture 3

- Arrays and lists

- File input/output (I/O)

Class 3 slides: Module2 Module3 Sample code: Module2 Module3 In-class exercise solutions: Module2 Module3

Assignment: Lab2 Example solutions: Lab2-sols

Lecture 4

- More on arrays and lists

- Intro to loops and conditional statements

Class 4 slides: Module4 and sample code: Module4-code In-class exercises: Module4-sols

Week 3: Writing more complex scripts

Lecture 5

- More on loops and flow control. Common programming patterns

- Hashes

Class 5 lecture slides: Module5 Module6

In-class sample code: Module5 Module6

In-class exercises: Module5-sols Module6-sols

Assignment: Lab3

Lecture 6

- Writing more complex programs: Devising subroutines for common tasks

Class 6 lecture slides: Module7

In-class sample code: Module7

In-class exercise solutions: Module7-sols

Week 4: Parsing and advanced regular expressions

Lecture 7

-Regular expressions, REALLY manipulating strings

Please see this link for an introduction on regular expressions: http://perldoc.perl.org/perlretut.html

Class 7 lecture slides: Module8

In-class sample code: Module8

In-class exercise solutions: Module8-sols

Assignment: Lab4

Lecture 8

-Continuing regular expressions

- Interfacing with external programs, recipes, reading directories

Class 8 lecture slides: Module9

In-class sample code: Module9

In-class exercise solutions: Module9-sols

Week 5: Intro to the R programming language (statistical programming language)

Lecture 9

- Intro to R, variables and constructs in R

- basic I/O

Class 9 lecture slides: Module10

Sample code: Module10

In-class exercise solutions: Module10-sols

Assignment: Lab5

Lecture 10

- More R

Samples of fancy R plots here

Class 10 lecture slides: Module11

Sample code: Module11

In-class exercise solutions: Module11-sols

Week 6: The R programming (continued)

Lecture 11: More R

Class 11 lecture slides: Module12

Sample code link: Module12 Module13

In-class exercise solutions: Module12-sols

Assignment: Lab6

Bioconductor information

- Installing and using bioconductor modules in R

- Basic bioconductor usage, working with expression data

Note: Make sure you install Bioconductor HOWTO : Install Bioconductor

And also the affy package, by running the following commands.

source("http://www.bioconductor.org/biocLite.R")

biocLite("affy")

Other packages may need to be installed in class this week over the internet, so ensure your computer is connected when in class.

Lecture 12: Intro to python and course recap

Class 12 lecture slides: Module14

Sample code link: Module14

In-class exercise solutions: Module14-sol

Note: Some of the course material was adapted (with permission from Prof Chad Myers) from course CSCI2003 at U of Minnesota.