Graduate Courses Offered
02-601 Programming for Scientists
Provides a practical introduction to programming for students with little or no prior programming experience who are interested in science. Fundamental scientific algorithms will be introduced, and extensive programming assignments will be based on analytical tasks that might be faced by scientists, such as parsing, simulation, and optimization. Principles of good software engineering will also be stressed, *and students will have the opportunity to design their own programming project on a scientific topic of their course*. The course will introduce students to the Go programming language, an industry-supported, modern programming language, the syntax of which will be covered in depth. Other assignments will be given in other programming languages such as Python and Java to highlight the commonalities and differences between languages. No prior programming experience is assumed, and no biology background is needed. Analytical skills and mathematical maturity are required. Course not open to CS majors.
This course gives masters students an opportunity to develop professional skills necessary for a successful career in computational biology. This course will include assistance with resume writing, interview preparation, presentation skills, and job search techniques. This course will also include opportunities to network with computational biology professionals and academic researchers. This course will meet once per week. This course is pass/fail only. Grading scheme will be discussed on first day of class.
How do we find potentially harmful mutations in your genome? How can we reconstruct the Tree of Life? How do we compare similar genes from different species? These are just three of the many central questions of modern biology that can only be answered using computational approaches. This 12-unit course will delve into some of the fundamental computational ideas used in biology and let students apply existing resources that are used in practice every day by thousands of biologists. The course offers an opportunity for students who possess an introductory programming background to become more experienced coders within a biological setting. As such, it presents a natural next course for students who have completed 02-601.
The objective of this course is to study general computational problems, with a focus on the principles used to design those algorithms. Efficient data structures will be discussed to support these algorithmic concepts. Topics include: Run time analysis, divide-and-conquer algorithms, dynamic programming algorithms, network flow algorithms, linear and integer programming, large-scale search algorithms and heuristics, efficient data storage and query, and NP-completeness. Although this course will have several programming assignments, it is primarily not a programming course. Instead, it will focus on the design and analysis of algorithms for general classes of problems. This course is not open to CS graduate students who should consider taking 15-651 instead.
02-700 M.S. Research
This course is for M.S. students who wish to do supervised research for academic credit with a Computational Biology faculty member. Interested students should first contact the Professor with whom they would like to work. If there is mutual interest, the Professor will direct you to the Academic Programs Coordinator, who will enroll you in the course.
The course consists of weekly presentations by students and faculty on current topics in computational biology.
This course consists of weekly invited presentations on current computational biology research topics by leading scientists.
02-703 Special Topics in Bioinformatics and Computational Biology
This is a mini Special Topics course taught on an occasional basis to cover different topics in computational biology.
02-710 Computational Genomics
Dramatic advances in experimental technology and computational analysis are fundamentally transforming the basic nature and goal of biological research. The emergence of new frontiers in biology, such as evolutionary genomics and systems biology is demanding new methodologies that can confront quantitative issues of substantial computational and mathematical sophistication. In this course we will discuss classical approaches and latest methodological advances in the context of the following biological problems: 1) sequence analysis, focusing on gene finding and motifs detection, 2) analysis of high throughput molecular data, such as gene expression data, including normalization, clustering, pattern recognition and classification, 3) molecular and regulatory evolution, focusing on phylogenetic inference and regulatory network evolution, 4) population genetics, focusing on how genomes within a population evolve through recombination, mutation, and selection to create various structures in modern genomes and 5) systems biology, concerning how to combine diverse data types to make mechanistic inferences about biological processes. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, data integration, time series analysis, active learning, etc.
02-711 Computational Molecular Biology and Genomics
An advanced introduction to computational molecular biology, using an applied algorithms approach. The first part of the course will cover established algorithmic methods, including pairwise sequence alignment and dynamic programming, multiple sequence alignment, fast database search heuristics, hidden Markov models for molecular motifs and phylogeny reconstruction. The second part of the course will explore emerging computational problems driven by the newest genomic research. Course work includes four to six problem sets, one midterm and final exam.
This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problems sets with significant programming components and independent or group final projects.
02-714 String Algorithms
Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Burrows-Wheeler transform. Applications of these techniques in biology will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed, and the topics covered will be of use in other fields involving large collections of strings. Programming proficiency is required.
Research in biology and medicine is undergoing a revolution due to the availability of high-throughput technology for probing various aspects of a cell at a genome-wide scale. The next-generation sequencing technology is allowing researchers to inexpensively generate a large volume of genome sequence data. In combination with various other high-throughput techniques for epigenome, transcriptome, and proteome, we have unprecedented opportunities to answer fundamental questions in cell biology and understand the disease processes with the goal of finding treatments in medicine. The challenge in this new genomic era is to develop computational methods for integrating different data types and extracting complex patterns accurately and efficiently from a large volume of data. This course will discuss computational issues arising from high-throughput techniques recently introduced in biology, and cover very recent developments in computational genomics and population genetics, including genome structural variant discovery, association mapping, epigenome analysis, cancer genomics, and transcriptome analysis. The course material will be drawn from very recent literature. Grading will be based on weekly write-ups for ciritiques of the papers to be discussed in the class, class participation, and a final project. It assumes a basic knowledge of machine learning and computational genomics.
02-716 Cross-Species Systems Modeling
Model organisms have longed played an important role in basic science studies and in the pharmaceutical industry. These organisms, ranging from yeast to worms to flies, share many processes that are similar to those active in humans which have made these and other animals the focus of many lab studies. Similarly, almost all drugs are initially tested on mice making cross species studies a key issue in drug development. However, many of the drugs that work well for mice fail in late stage human trials. Similarly, many interactions between highly conserved proteins in one species are not conserved, even between very close species. In this class we will discuss recent studies that try to compare and contrast genomics and functional genomics data across species with the goal of identifying the conserved and divergent processes that are active in each of the species being studied. The class will be divided into three parts. The first will focus on sequence analysis and comparative genomics covering issues related to whole genome sequence alignment, motif discovery using conservation data and miRNA identification using sequence data from multiple species. The second will focus on comparisons of a single type of functional genomics data including gene expression, protein interactions and protein-DNA interactions. This part will rely on recent studies regarding the integration of expression data across species, combining, comparing and aligning protein interaction networks in multiple species and experimental studies that compare protein-DNA interactions across species and in hybrids. In the final part of the class we will discuss methods that attempt to combine multiple functional genomics datasets for a systems biology comparison of interactions across species. Students would be required to present one or two papers and to complete a class project in which they compare or contrast genomics data across species.
02-717 Algorithms in Nature
Computer systems and biological processes often rely on networks of interacting entities to reach joint decisions, coordinate and respond to inputs. There are many similarities in the goals and strategies of biological and computational systems which suggest that each can learn from the other. These include the distributed nature of the networks (in biology molecules, cells, or organisms often operate without central control), the ability to successfully handle failures and attacks on a subset of the nodes, modularity and the ability to reuse certain components or sub-networks in multiple applications and the use of stochasticity in biology and randomized algorithms in computer science.
These observations, some dating back to the 60’s, have inspired the development of several computational methods and more recently led to several bi-directional studies. These studies have demonstrated that thinking computationally about the settings, requirements and goals of information processing in biological networks can both, improve our understanding of the underlying biology and lead to the development of novel computational methods providing solutions to decades old problems.
In this course we will start by discussing classic biologically motivated algorithms including neural networks (inspired by the brain), genetic algorithms (sequence evolution), non-negative matrix factorization (signal processing in the brain), and search optimization (ant colony formation). We will then continue to discuss more recent bi-directional studies that have relied on biological processes to solve routing and synchronization problems, discover Maximal Independent Sets (MIS), and design robust and fault tolerant networks. In the second part of the class students will read and present new research in this area. Students will also work in groups on a final project in which they develop and test a new biologically inspired algorithm.
See also the website below for examples of recent research in this area: www.algorithmsinnature.org
Pre-requisite: 15-210, no prior biological knowledge required.
02-718 Computational Medicine
Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning methodologies such as regression, classification, clustering, semi-supervised learning, probabilistic modeling, and time-series modeling are being used to analyze a variety of datasets collected by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Grading will be based on presentations, assignments, participation, and a project.
This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in genomic technology are in the process of revolutionizing how we conduct molecular biology research. These new techniques have given us an appreciation for the role that epigenetics modifications of the genome play in gene regulation, development, and inheritance. In this course, we will cover the biological basis of genomics and epigenetics, the basic computational tools to analyze genomic data, and the application of those tools to neuroscience. Through programming assignments and reading primary literature, the material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.
02-721 Algorithms for Computational Structural Biology
Some of the most interesting algorithmic challenges in Biology and Bioengineering arise from the modeling, simulation, and engineering of biological macromolecules at, or near atomic resolution. This course covers a variety of algorithms used to study and engineer the structure, dynamics, and function of proteins, nucleic acids, and other molecules. It is intended for graduates and advanced undergraduates who are interested in topics such as protein folding, protein interactions, and computer-aided design of drugs and proteins. Students should have some experience with programming as well as introductory coursework in the design and analysis of algorithms. The course begins with a review of the necessary Biology, Chemistry, and Physics for those who haven’t seen these topics since high school. The topics covered will include algorithms for solving optimization, inference, simulation, and sampling problems that arise in the fields of structural and synthetic biology. Coursework will include 4 to 5 problems sets and an independent or group final project. Open to students with backgrounds in computer science or the life sciences, or by permission of the instructor.
02-722 Advanced Algorithms for Computational Structural Biology
This is a seminar-style course on the current literature in computational structural biology. Topics will include algorithms for designing drugs and proteins, as well as protein structure prediction and simulation. Students will be expected to read and discuss papers and complete a project of their own design. Open to students with backgrounds in computer science and structural biology, or by permission of the instructor.
02-730 Cell and Systems Modeling
This course will introduce students to the theory and practice of modeling biological systems from the molecular to the organism level with an emphasis on intracellular processes. Topics covered include kinetic and equilibrium descriptions of biological processes, systematic approaches to model building and parameter estimation, analysis of biochemical circuits modeled as differential equations, modeling the effects of noise using stochastic methods, modeling spatial effects, and modeling at higher levels of abstraction or scale using logical or agent-based approaches. A range of biological models and applications will be considered including gene regulatory networks, cell signaling, and cell cycle regulation. Weekly lab sessions will provide students hands-on experience with methods and models presented in class. Course requirements include regular class participation, bi-weekly homework assignments, a take-home exam, and a final project. Prerequisites: The course is designed for graduate and upper-level undergraduate students with a wide variety of backgrounds. The course is intended to be self-contained but students may need to do some additional work to gain fluency in core concepts. Students should have a basic knowledge of calculus, differential equations, and chemistry as well as some previous exposure to molecular biology and biochemistry. Experience with programming and numerical computation is useful but not mandatory. Laboratory exercises will use Matlab as the primary modeling and computational tool augmented by additional software as needed.
02-740 Bioimage Informatics
With the rapid advance of bioimaging techniques and fast accumulation of bioimage data, computational bioimage analysis and modeling are playing an increasingly important role in understanding of complex biological systems. The goals of this course are to provide students with the ability to understand a broad set of practical and cutting-edge computational techniques to extract knowledge from bioimages. Such techniques include image filtering, image feature detection, image classification, image segmentation, object detection, object tracking, image retrieval, image mining and image modeling using both traditional and deep learning methods. Upon successful completion of this course, the student will be able to: explain the importance and understand the principles and uses of both geometrical and machine learning-based bioimage analysis techniques; understand how these techniques can be combined for various applications; develop code to implement basic techniques; and solve specific bioimage analysis tasks using image-processing libraries. Coursework will include homework, two in-class examinations, and doing an independent project on a practical bioimaging problem. Students are expected to have some experience with programming in python.
Biology is increasingly becoming a “big data” science, as biomedical research has been revolutionized by automated methods for generating large amounts of data on diverse biological processes. Integration of data from many types of experiments is required to construct detailed, predictive models of cell, tissue or organism behaviors, and the complexity of the systems suggests that need for these models to be constructed automatically. This requires iterative cycles of acquisition, analysis, modeling, and experimental design, since it is not feasible to do all possible biological experiments. This course will cover a range of automated biological research methods (especially high-throughput, robotic methods for protein structure determination, gene sequencing, cell-based drug screening, and nanoassays), and a range of computational methods for automating the acquisition and interpretation of the data (especially active learning, proactive learning, compressed sensing and model structure learning). It assumes a basic knowledge of machine learning. Class sessions will consist of a combination of lectures and discussions of important research papers. Grading will be based on class participation, homeworks, and a final project.
Computational biologists frequently focus on analyzing and modeling large amounts of biological data, often from high-throughput assays or diverse sources. It is therefore critical that students training in computational biology be familiar with the paradigms and methods of experimentation and measurement that lead to the production of these data. This one-semester laboratory course has been developed to give students a deep appreciation of the principles and challenges of biological experimentation. Students will explore a range of topics, including structural biology, genomics, proteomics, and bioimaging. Each broad topic is covered over a period of 3-4 weeks. Many lectures and labs are hosted by faculty who are experts in the field. Students are required to keep a detailed laboratory notebook, summarizing the goals of the experiment, critical steps, and analysis of the resulting data. With an emphasis on instrumentation and high-throughput data collection, this course is appropriate for students who have never taken a traditional undergraduate biology lab course, as well as those who have. Grading: Letter grade based on class participation, take-home exams, and a final project.
In order to rapidly generate reproducible experimental data, many modern biology labs leverage some form of laboratory automation to execute experiments. In the not so distant future, the use of laboratory automation will continue to increase in the biological lab to the point where many labs will be fully automated. Therefore, it is critical for automation scientists to be familiar with the principles, experimental paradigms, and techniques for automating biological experimentation with an eye toward the fully automated laboratory. In this laboratory course, students will learn about various automatable experimental methods, design of experiments, hardware for preparing samples and executing automated experiments, and software for controlling that hardware. These topics will be taught in lectures as well as through laboratory experience using multi-purpose laboratory robotics. During weekly laboratory time, students will complete and integrate parts of two larger projects. The first project will be focused on liquid handling, plate control, plate reading, and remote control of the automated system based on experimental data. The second project will be focused on the design, implementation, and analysis of a high content screening campaign using fluorescence microscopy, image analysis, and tissue culture methods. Grading will be based on lab and project completion and quality.
This course is for students participating in an internship or co-op.
02-900 Ph.D. Thesis Research
This course is for students enrolled in the Ph.D. program working on research.