Researchers in the Computational Biology Department have implemented many successful software packages used for biological data analysis and modeling. Links to software, organized by principal investigator, are found below.
- DECOnvolved Discriminative motif discovery (DECOD) – DECOD is a tool for finding discriminative DNA motifs, i.e. motifs that are over-represented in one set of sequences but are depleted from another.
DECOD uses a k-mer count table and so its running time is independent of the size of the input set. By deconvolving the k-mers DECOD considers context information without using the sequences directly. DECOD is written in Java and therefore can be run on multiple platforms. It has an easy-to-use GUI interface, and it can also be run in command line mode for batch processing.
Most recent reference: P. Huggins*, S. Zhong*, I. Shiff*, R. Beckerman, O. Laptenko, C. Prives, M.H. Schulz, I. Simon, Z. Bar-Joseph. DECOD: Fast and Accurate Discriminative DNA Motif Finding. Bioinformatics, 27(17):2361-7, 2011.
- Dynamic Regulatory Events Miner (DREM 2.0) – Method for integrating time series and static data to reconstruct dynamic regulatory networks.
The Dynamic Regulatory Events Miner (DREM) allows one to model, analyze, and visualize transcriptional gene regulation dynamics. The method of DREM takes as input time series gene expression data and static or dynamic transcription factor-gene interaction data (e.g. ChIP-chip data), and produces as output a dynamic regulatory map. The dynamic regulatory map highlights major bifurcation events in the time series expression data and transcription factors potentially responsible for them. See the manual and paper below for more details.
Most recent reference: M.H. Schulz, W.E. Devanny, A. Gitter, S. Zhong, J. Ernst, Z. Bar-Joseph. DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data. BMC Systems Biology, 6:1, 2012
- SEquencing Error CorrEction for Rna reads (SEECER) – A method for error correction o de novo RNA Seq data that does not require a reference genome.
SEECER is a sequencing error correction algorithm for RNA-seq data sets. It takes the raw read sequences produced by a next generation sequencing platform like machines from Illumina or Roche. SEECER removes mismatch and indel errors from the raw reads and significantly improves downstream analysis of the data. Especially if the RNA-Seq data is used to produce a de novo transcriptome assembly, running SEECER can have tremendous impact on the quality of the assembly.
Most recent reference: H.S. Le*, M. H. Schulz*, B.M. McCauley, V.F. Hinman, Bar-Joseph. Probabilistic error correction for RNA sequencing. Nucleic Acids Research, nar/gkt215, 2013.
- Short Time-series Expression Miner (STEM) – Cluster and visualize short time series gene expression data.
The Short Time-series Expression Miner (STEM) is a Java program for clustering, comparing, and visualizing short time series gene expression data from microarray experiments (~8 time points or fewer). STEM allows researchers to identify significant temporal expression profiles and the genes associated with these profiles and to compare the behavior of these genes across multiple conditions. STEM is fully integrated with the Gene Ontology (GO) database supporting GO category gene enrichment analyses for sets of genes having the same temporal expression pattern. STEM also supports the ability to easily determine and visualize the behavior of genes belonging to a given GO category or user defined gene set, identifying which temporal expression profiles were enriched for these genes. (Note: While STEM is designed primarily to analyze data from short time course experiments it can be used to analyze data from any small set of experiments which can naturally be ordered sequentially including dose response experiments.)
Most recent reference: J. Ernst, Z. Bar-Joseph. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics, 7:191, 2006
- GFLasso – Graph-guided fused lasso estimates a sparse multi-response regression model, while leveraging a weighted network structure over response variables to find covariates that are jointly relevant to multiple correlated responses.
- Tree-Guided Group Lasso – Tree-guided group lasso estimates a sparse multi-response regression model, while leveraging a hierarhical clustering tree structure over response variables
- CORAL – an integrated suite of visualizations for comparing clusterings.
- GHOST – Global network alignment using multiscale spectral signatures.
- GIRAF – Computational identification of influenza reassortments via graph mining.
- JELLYFISH – Fast, Parallel k-mer Counting for DNA.
- STARFISH – Identifying rigid components with the pebble game and a body-bar-and-hinge reduction.
- TransTermHP – finds rho-independent transcription terminators in bacterial genomes.
- CellOrganizer – Open source system for learning and using generative models of cells from images.
- OMERO.searcher – Open source addon to OMERO to enable content-based image searching.
- PatternUnmixer – Open-source Matlab program for learning mixture models of subcellular patterns.
- SLIF – Structured Literature Image Finder – open source tools for extracting, analyzing and searching images from primary biomedical literature; additional information about SLIF can be found at http://murphylab.web.cmu.edu/services/SLIF/.
- GPU-BLAST: Speeding up the Basic Local Alignment Search Tool (BLAST) with GPUs.
- SAS-Pro: Sequential and non-sequential structure alignment of proteins.