James Taylor, Johns Hopkins University @ CMU
Ralph S. O’Connor Associate Professor of Biology
Associate Professor of Computer Science
Johns Hopkins University
Making large-scale genomic analysis accessible, transparent, and reproducible.
Abstract: It is now easy to obtain genomic sequence data at massive scales, whether produced by an individual in their own lab, or the consortium projects that are now sequencing hundreds of thousands of individual genomes. However, analysis of these data is still difficult for typical researchers, due to challenges including the need to move large amounts of data, the need for substantial compute infrastructure, the need to provide security and privacy, and lack of specialized computational training. Here I will discuss our efforts to address these challenges through two projects: Galaxy (galaxyproject.org) and the AnVIL (anvilproject.org).
Galaxy is a platform for making complex data intensive analysis available to as many researchers as possible, and to facilitate interaction and collaboration among researchers. Galaxy’s goals are to make analysis 1) accessible – making methods usable for researchers at any level of informatics expertise, 2) transparent – easily shared with others in a form that captures all important details, and 3) reproducible – analyses should be easily reproduced exactly, even in a different compute environment. Galaxy has been used by tens of thousands of researchers and enabled thousands of publications since its inception in 2005.
The AnVIL (The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space) is a new project to implement s scalable cloud environment for the genomics community that provides a secure and compliant environment for the analysis of human subjects data, along with a wealth of different tools and analysis environments, to democratize genomic data access and serve the needs of both users with limited computational expertise and expert data scientists.
About: James Taylor is the Ralph S. O’Connor Associate Professor of Biology and associate professor of computer science at Johns Hopkins University. Until 2014, he was an associate professor in the departments of biology and mathematics and computer science at Emory University. He is one of the original developers of the Galaxy platform for data analysis, and his group continues to work on extending the Galaxy platform. His group also works on understanding genomic and epigenomic regulation of gene transcription through integrated analysis of functional genomic data. James received a Ph.D. in computer science from Penn State University, where he was involved in several vertebrate genome projects and the ENCODE project.