Undergraduate Courses Offered
02-201 Programming for Scientists
Provides a practical introduction to programming for students with little or no prior programming experience who are interested in science. Fundamental scientific algorithms will be introduced, and extensive programming assignments will be based on analytical tasks that might be faced by scientists, such as parsing, simulation, and optimization. Principles of good software engineering will also be stressed. The course will introduce students to the Go programming language, an industry-supported, modern programming language, the syntax of which will be covered in depth. Other assignments will be given in other programming languages such as Python and Java to highlight the commonalities and differences between languages. No prior programming experience is assumed, and no biology background is needed. Analytical skills and mathematical maturity are required. Course not open to CS majors.
02-223 Personalized Medicine: Understanding Your Own Genome
Do you want to know how to discover the tendencies hidden in your genome? Since the first draft of a human genome sequence became available about a decade ago, the cost of genome sequencing has decreased dramatically. It is expected that personal genome sequencing will become a routine part of medical examinations for patients in clinics for prognostic and diagnostic purposes. Personal genome information will also play an increasing role in lifestyle choices, as people take into account their own genetic tendencies. Commercial services such as 23andMe have already taken first steps in this direction. Computational methods for mining large-scale genome data are being developed to unravel the genetic basis of diseases and to assist doctors in clinics.
This course will introduce students to the biological, computational, and ethical issues that concern the use of personal genome information in health maintenance, medical practice, biomedical research, and policymaking. The course will focus on practical issues, using individual genome sequences (such as that of Nobel prize winner James Watson) and other population-level genome data. Without requiring any background in biological or computational sciences, the course will begin with an overview of topics from genetics, molecular biology, statistics, and machine learning that are relevant to the modern personal genome era. The class will then cover scientific issues such as how to discover your genetic ancestry, how to learn from genomes about the migration and evolution of the human population, and how natural selection shaped our genomes. The class will then discuss medical aspects such as how to predict whether you will develop diseases such as diabetes based on your own genome, how to discover disease-causing genetic mutations, and how the genetic information can be used to recommend clinical treatments. It will close with consideration of the complex policy issues that our society will face as this personal genomics revolution unfolds. The grading will be based on weekly homework, a midterm, a final exam, and class participation.
There are no prerequisites for this course. This course counts as a CSD Science and Engineering requirement, and a Dietrich College Modeling/Other Gen Ed requirement.
This 12-unit class provides a general introduction to computational tools for biology. The course is divided into two modules, which may be taken individually as courses 02-251 and 02-252. Module 1 covers computational molecular biology/genomics. It examines important sources of biological data, how they are archived and made available to researchers, and what computational tools are available to use them effectively in research. In the process, it covers basic concepts in statistics, mathematics, and computer science needed to effectively use these resources and understand their results. Specific topics covered include sequence data, searching and alignment, structural data, genome sequencing, genome analysis, genetic variation, gene and protein expression, and biological networks and pathways. Module 2 covers computational cell biology, including biological modeling and image analysis. It includes homeworks requiring modification of scripts to perform computational analyses. The modeling component includes computer models of population dynamics, biochemical kinetics, cell pathways, neuron behavior, and stochastic simulations. The imaging component includes basics of machine vision, morphological image analysis, image classification and image-derived models. Lectures and examinations are joint with 03-250 but recitations are separate. Recitations for this course are intended primarily for computer science, statistics or engineering majors at the undergraduate or graduate level who have had significant prior experience with computer science or programming. Students may not take both 02-250/03-250 and either 02-251/03-251 or 02-252/03-252 for credit.
This 12-unit course provides an introduction to many of the great ideas that have formed the foundation for the recent transformation of life sciences into a fully-fledged computational discipline. Extracting biological understanding from both large and small data sets now requires the use and design of novel algorithms, developed in the field of computational biology. This gateway course is intended as a first exposure to computational biology for first-year undergraduates in the School of Computer Science, although it is open to other computationally minded students who are interested in exploring the field. Students will learn fundamental algorithmic and machine learning techniques that are used in modern biological investigations, including algorithms to process string, graph, and image data. They will use these techniques to answer questions such as “How do we reconstruct the sequence of a genome?”, “How do we infer evolutionary relationships among many species?”, and “How can we predict each gene’s biological role?” on biological data. Previous exposure to molecular biology is not required, as the instructors will provide introductory materials as needed. After completion of the course, students will be well equipped to tackle advanced computational challenges in biology.
This is an introductory laboratory-based course designed to teach basic biological laboratory skills used in exploring the quantitative nature of biological systems and the computational reasoning required for performing research in computational biology. Over the course of the semester, students will perform various experiments and computationally analyze the results of these experiments. Students will also use computation to design experiments based on the data they collect. During this course students will be using traditional, well-developed techniques as well as automated lab equipment to answer scientific questions: How should different sources of DNA in a specimen be identified? What changes do cells undergo during apoptosis? Understanding the results of these experiments will require students to think critically about the data they generate, the appropriate controls required to support their conclusions, and the biological context within which these results were obtained. During this course students will gain experience in many aspects of scientific research, including: designing and executing protocols for traditional and automated experiments, computational processing and analysis of collected results and communicating results to peers and colleagues.
Course Outline: (1) 3-hour lab per week (1) 1-hour lecture per week. 9 units (12 units for CB majors). This course counts as a CSD Science and Engineering requirement as well as the lab requirement, and Dietrich College’s Modeling/Science Gen Ed requirement.
02-317 Algorithms in Nature
Computer systems and biological processes often rely on networks of interacting entities to reach joint decisions, coordinate and respond to inputs. There are many similarities in the goals and strategies of biological and computational systems which suggest that each can learn from the other. These include the distributed nature of the networks (in biology molecules, cells, or organisms often operate without central control), the ability to successfully handle failures and attacks on a subset of the nodes, modularity and the ability to reuse certain components or sub-networks in multiple applications and the use of stochasticity in biology and randomized algorithms in computer science. In this course we will start by discussing classic biologically motivated algorithms including neural networks (inspired by the brain), genetic algorithms (sequence evolution), non-negative matrix factorization (signal processing in the brain), and search optimization (ant colony formation). We will then continue to discuss more recent bi-directional studies that have relied on biological processes to solve routing and synchronization problems, discover Maximal Independent Sets (MIS), and design robust and fault tolerant networks. In the second part of the class students will read and present new research in this area. Students will also work in groups on a final project in which they develop and test a new biologically inspired algorithm. See also: algorithmsinnature.org. Pre-requisite: 15-210, no prior biological knowledge required.
02-318 Intro to Computational Medicine
This course is an introduction to computational methods relevant to the diagnosis and treatment of human diseases. It is the microcourse version of 02-518, Computational Medicine. The course begins with an introduction to the field of Medicine, and an overview of the primary clinical tasks associated with Computational Medicine (phenotyping; biomarker discovery; predictive modeling). Next, we provide an introduction to several Machine Learning techniques, and how those techniques can be used to perform the clinical tasks. For the remainder of the course, students will be guided through the analysis of a clinical data set to gain experience with these techniques. No prior experience with Medicine, Machine Learning, or computer programming is required. Students will be graded based on quizzes and one homework.
02-319 Genomics and Epigenetics of the Brain
This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in genomic technology are in the process of revolutionizing how we conduct molecular biology research. These new techniques have given us an appreciation for the role that epigenetics modifications of the genome play in gene regulation, development, and inheritance. In this course, we will cover the biological basis of genomics and epigenetics, the basic computational tools to analyze genomic data, and the application of those tools to neuroscience. Through programming assignments and reading primary literature, the material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.
02-331 Modeling Evolution
Some of the most serious public health problems we face today, from drug-resistant bacteria, to cancer, all arise from a fundamental property of living systems—their ability to evolve. Since Darwin’s theory of natural selection was first proposed, we have begun to understand how heritable differences in reproductive success drive the adaptation of living systems. This makes it intuitive and tempting to view evolution from an optimization perspective. However, genetic drift, phenotypic trade-offs, constraints, and changing environments, are among the many factors that may limit the optimizing force of natural selection. This tug-of-war between selection and drift, between the forces that produce variation in a population, and the forces suppressing this variation, make evolutionary processes much more complex to model and understand than previously thought.
The aim of this class is to provide an introduction into the theoretical formalism necessary to understand how biological systems are shaped by the forces and constraints driving evolutionary dynamics. I will introduce population genetic theory as a lens for the understanding and interpretation of modern datasets, such as datasets of human world-wide genomic and epigenomic variation or tumor genomic heterogeneity. By the end of the course, you should have learned to build evolutionary models, as well as the basic differences between idealized models and the data you might encounter in real life. The class is group-project based and you will work together to explore open questions in evolution.
This course consists of weekly invited presentations on current computatonal biology research topics by leading scientists.
02-403 Special Topics in Bioinformatics and Computational Biology
This is a mini Special Topics course taught on an occasional basis to cover different topics in computational biology.
02-414 String Algorithms
Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Burrows-Wheeler transform. Applications of these techniques in biology will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed, and the topics covered will be of use in other fields involving large collections of strings. Programming proficiency is required.
02-421 Algorithms for Computational Structural Biology
This course will introduce students to algorithms used in the determination, simulation, and engineering of molecular structures. Topics covered include: molecular dynamics simulations, protein structure prediction, and computer-aided design of drugs and proteins. Course requirements include regular homework assignments and a final project. Students should have some background in algorithms and programming, as well as to molecular biology and physics.
02-422 Advanced Algorithms for Computational Structural Biology
This is a seminar-style course on the current literature in computational structural biology. Topics will include algorithms for designing drugs and proteins, as well as protein structure prediction and simulation. Students will be expected to read and discuss papers and complete a project of their own design. Open to students with backgrounds in computer science and structural biology, or by permission of the instructor.
Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.
Automated scientific instruments are used widely in research and engineering. Robots dramatically increase the reproducibility of scientific experiments, and are often cheaper and faster than humans, but are most often used to execute brute-force sweeps over experimental conditions. The result is that many experiments are “wasted” on conditions where the effect could have been predicted. Thus, there is a need for computational techniques capable of selecting the most informative experiments.
This course will introduce students to techniques from Artificial Intelligence and Machine Learning for automatically selecting experiments to accelerate the pace of discovery and to reduce the overall cost of research. Real-world applications from Biology, Bioengineering, and Medicine will be studied. Grading will be based on homeworks and two exams. The course is intended to be self-contained, but students should have a basic knowledge of biology, programming, statistics, and machine learning.
This course is for undergraduate students who wish to do supervised research for academic credit with a Computational Biology faculty member. Interested students should first contact the Professor with whom they would like to work. If there is mutual interest, the student should email the professor a proposal with their research project, along with the number of units proposed. If the professor approves the request, they will contact the Academic Programs Coordinator to enroll you in the course.
02-510 Computational Genomics
Dramatic advances in experimental technology and computational analysis are fundamentally transforming the basic nature and goal of biological research. The emergence of new frontiers in biology, such as evolutionary genomics and systems biology is demanding new methodologies that can confront quantitative issues of substantial computational and mathematical sophistication. In this course we will discuss classical approaches and latest methodological advances in the context of the following biological problems: 1) sequence analysis, focusing on gene finding and motifs detection, 2) analysis of high throughput molecular data, such as gene expression data, including normalization, clustering, pattern recognition and classification, 3) molecular and regulatory evolution, focusing on phylogenetic inference and regulatory network evolution, 4) population genetics, focusing on how genomes within a population evolve through recombination, mutation, and selection to create various structures in modern genomes and 5) systems biology, concerning how to combine diverse data types to make mechanistic inferences about biological processes. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, data integration, time series analysis, active learning, etc.
This course counts as a CSD Applications elective.
02-511 Computational Molecular Biology and Genomics
An advanced introduction to computational molecular biology, using an applied algorithms approach. The first part of the course will cover established algorithmic methods, including pairwise sequence alignment and dynamic programming, multiple sequence alignment, fast database search heuristics, hidden Markov models for molecular motifs and phylogeny reconstruction. The second part of the course will explore emerging computational problems driven by the newest genomic research. Course work includes four to six problem sets, one midterm and final exam.
This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Coursework will include problems sets with significant programming components and independent or group final projects.
02-518 Computational Medicine
Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning methodologies such as regression, classification, clustering, semi-supervised learning, probabilistic modeling, and time-series modeling are being used to analyze a variety of datasets collected by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Grading will be based on presentations, assignments, participation, and a project.
02-530 Cell and Systems Modeling
This course will introduce students to the theory and practice of modeling biological systems from the molecular to the organism level with an emphasis on intracellular processes. Topics covered include kinetic and equilibrium descriptions of biological processes, systematic approaches to model building and parameter estimation, analysis of biochemical circuits modeled as differential equations, modeling the effects of noise using stochastic methods, modeling spatial effects, and modeling at higher levels of abstraction or scale using logical or agent-based approaches. A range of biological models and applications will be considered including gene regulatory networks, cell signaling, and cell cycle regulation. Weekly lab sessions will provide students hands-on experience with methods and models presented in class. Course requirements include regular class participation, bi-weekly homework assignments, a take-home exam, and a final project. Prerequisites: The course is designed for graduate and upper-level undergraduate students with a wide variety of backgrounds. The course is intended to be self-contained but students may need to do some additional work to gain fluency in core concepts. Students should have a basic knowledge of calculus, differential equations, and chemistry as well as some previous exposure to molecular biology and biochemistry. Experience with programming and numerical computation is useful but not mandatory. Laboratory exercises will use Matlab as the primary modeling and computational tool augmented by additional software as needed.