Pre-College Program in Computational Biology
The Computational Biology Department offers a three-week Pre-College Program in Computational Biology. This program, which launched in July 2019, is the first and only computational biology educational program in the United States designed for high school students.
In the program, our students (most of whom will be rising high school seniors) learn both the computational and laboratory skills needed in modern biology. Traditionally, these skills have been taught as part of disjoint courses, but our pre-college program highlights the vital interplay between generating biological datasets in the lab and analyzing these datasets computationally. On a typical day, students spend half the day in a wet lab, and the other half of the day programming algorithms to analyze biological data, including the data that they generate!
Details on the curriculum are provided below and will be updated as the program evolves. If you’re interested in joining us, we would love to have you! We’re looking for students who love math and science, especially biology (although not the memorization kind). If you’ve done some coding before, that’s a plus, but by no means do we expect expert programmers. For more information, including how to apply, see the program homepage. For specific questions about the program, feel free to shoot us an email at email@example.com and we would love to start a conversation.
Application opens – November 2019 (for updates and to join the Carnegie Mellon Pre-College mailing list, visit https://www.cmu.edu/pre-college/admission/index.html)
Monday, June 29 – Friday, July 17 (9 AM – 6 PM Monday-Friday for three weeks)
Celebration of Student Work: Saturday, July 18
Pre-College Program in Computational Biology Curriculum
2019 Program Overview
The following is a description of the inaugural Pre-College Program held in July 2019. We are continually looking for new ideas to make our program fresh, so this curriculum is always subject to change.
The 2019 Pre-College Program in Computational Biology began on the first day of the program with computational and laboratory bootcamps getting them up to speed in programming and basic “wet lab” techniques. On the second day of the program, students undertook an exciting day-long adventure onto Pittsburgh’s Three Rivers with our partner Rivers of Steel not only to sample water but also to learn about ecology (and of course take in the city’s beautiful bridges and architecture); see photo above.
Why are Pittsburgh’s Three Rivers an interesting biological environment? The Allegheny and Monongahela Rivers flow from somewhat rural landscapes into an urban environment with a history of industrial run-off, before merging into the Ohio River and continuing westward to its eventual confluence with the Mississippi. In even a small sample of river water lives an invisible ecosystem of microorganisms (bacteria and viruses). Only recently have researchers developed methods that can be used to start to understand, for each river, what these microbes are, what they do, and how they have evolved.
What is so interesting about bacteria? A landmark paper by Hug et al. and published in 2016 in Nature Biotechnology provided the evolutionary tree below. In it, we see that of the three domains of life, the eukaryotes (i.e., everything you have ever seen that is alive, and some things that you haven’t) make up the smallest component of the tree, meaning that they have the least genetic diversity. By far the most genetic diversity, and the largest part of the tree, is found in bacteria. This makes sense! Bacteria have been around a lot longer than we have, and they replicate and mutate quickly, so they have been able to move into environments that we could never dream of living – such as oil wells, deep sea ocean vents, and polluted rivers 🙂 — as well as produce a host of interesting compounds. For example, every antibiotic ever used to stop an infection was borrowed from a bacterium that had evolved to use this compound to kill its enemies.
But how is an evolutionary tree like this produced? We must sequence DNA from the same gene in many species. What is the lab method we can use to sequence this DNA from a biological sample (like river water)? And once we obtain the DNA, how do we train a computer to build this evolutionary tree?
These questions are just the beginning of the inquiries that we can make about this particular question in computational biology. The 2019 week-by-week syllabus is detailed below.
2019 Week-by-Week Curriculum
- Coding bootcamp: How will programming help us solve biological problems that cannot be solved in the lab alone?
- River sampling: How can we collect biological samples from the rivers while minimizing contamination and maximizing biological material yield? What other features of the rivers (e.g., ambient temperature/recent precipitation) are important?
- DNA Extraction: In a sample of various biological specimens (river water), how can we extract all of the DNA present (and eliminate everything else)?
- 16S sequencing: How can we use a conserved gene to help determine the relative abundances of different species of bacteria in our extracted river water DNA sample?
- 16S sequencing analysis: Given the sequence of a strand of DNA, how can we determine the species from which it came?
- Bacterial Isolation for whole genome sequencing: If we want to sequence the genome of a single bacterial cell, how can we isolate one cell from a river water sample?
- Predicting Replication Origins: Using sequencing data, how can we predict bacterial replication origin?
- Testing Replication Origin Predictions: How can we modify the genome bacteria to help us test our prediction?
- Bioimaging: How can we identify bacterial colonies using microscopes to capture images?
- Whole Genome Sequencing: How can we read a short fragment of DNA excised from a bacterial genome? Why can sequencing machines only read short fragments of DNA and not entire genomes?
- Whole Genome Reconstruction: After producing many DNA fragments that we can read, can we reconstruct the full genome from thousands of relatively short sequencing reads?
- Mass Spectrometry: How can we determine what else is in the water samples that may be affecting microbial diversity?
- Bacteria Identification:How can we use computational techniques to understand and characterize images of bacterial colonies?
- Building Phylogenies: How can we determine evolutionary relationships between organisms? Specifically, given genes from a host of different species, how can we construct an evolutionary tree for these species to determine how they have evolved?
- Presenting Scientific Results: What are good strategies for conveying the results of scientific experiments? What are the fundamentals of giving a good scientific talk?
- Fluorescence Microscopy: How can we use fluorescence to image eukaryotes and prokaryotes?
- Fluorescence Microscopy Image Analysis: How can we analyze fluorescence images to help to classify eukaryotes and prokaryotes?
Students presented their scientific work to their parents/guardians and families.