Software from JHU Data Science Lab.
By Jeff Leek @jtleek
Find me on Github
- [papr] – A tindr like web-app for rating preprints from bioRxiv. Written by Lucy D’Agostino McGowan, Nicholas Strayer, and Jeff Leek.
- [Rail-RNA] Rail-RNA is software for alignment of multiple RNA-seq samples jointly that can be run on a local computer, on a local cluster, or on the AWS cloud.
- [ballgown] Statistical backend for analyzing population level data from Cufflinks. Written by Alyssa Frazee, Geo Pertea, and Jeff Leek.
- [derfinder] Fast differential expression analysis of RNA-seq data at base-pair resolution. Written by Leo Collado Torres, Alyssa Frazee, Andrew Jaffe, Rafa Irizarry, and Jeff Leek.
- [regionReport] Software for creating interactive, reproducible HTML reports for derfinder. Written by Leo Collado Torres.
- [polyester] R package for simulating RNA seq reads. Written by Alyssa Frazee and Andrew Jaffe.
- [sva] It has been shown that genome-wide expression may be affected by environmental, demographic, genetic and technical factors, creating what we call expression heterogeneity. Surrogate variable analysis (SVA) is designed to identify, estimate, and incorporate into an analysis the sources of expression heterogeneity that are not captured by variables included in the model. SVA has been shown to reduce dependence across genes, stablize false discovery rate estimates, and improve reproducibility of analyses. Written by Jeff Leek, Evan Johnson, Hilary Parker, Andrew Jaffe and John Storey.
- [validate] A key component of performing any genomics experiment is validation of significant features (genes, proteins, etc.). This software can be used to assess the statistical evidence for validation of a particular analysis/technology on the basis of a random sample of significant results. Written by Jeff Leek.
- [tspreg] Top scoring pairs regression models. For building multiclass and survival top scoring pairs, written by Jeff Leek and Prasad Patil.
- [dks] The explosive growth of high-dimensional data has resulted in an equally explosive growth in methods for analyzing high-dimensional data. Almost all of these methods rely on p-values, corrected p-values, or false discovery rate estimates for ranking and significance calculation. However, there is no clear standard for determining whether the p-values from a new multiple testing procedure are correct. The double Kolmogorov-Smirnov package consists of a set of R functions for diagnosing whether a multiple testing procedure gives correct null p-values using simulated data.Written by Jeff Leek.
- [myrna] Myrna is a cloud computing tool for calculating differential gene expression in large RNA-seq datasets. Myrna uses Bowtie for short read alignment and R/Bioconductor for interval calculations, normalization, and statistical testing. These tools are combined in an automatic, parallel pipeline that runs in the cloud (Elastic MapReduce in this case) on a local Hadoop cluster, or on a single computer, exploiting multiple computers and CPUs wherever possible. Written by Ben Langmead, Kasper Hansen, and Jeff Leek.
- [edge] A comprehensive software package for the significance analysis of DNA microarray experiments – for both standard and time course experiments – based on our new optimal discovery procedure and time course methodology. Written by John Storey, Jeff Leek,and Andrew Bass.
- [tspair] A top scoring pair is a pair of genes whose relative ranks can be used to classify arrays according to a binary phenotype. A top scoring pair classifier has three advantages over standard classifiers: (1) the classifier is based on the relative ranks of genes and is more robust to normalization and preprocessing, (2) the classifier is based on a pair of genes and is likely to be more interpretable than a more complicated classifier, and (3) a classfier based on a small number of genes lends itself to diagnostic tests based on PCR that are both more rapid and cheaper than classifiers based on a large number of genes. Written by Jeff Leek.
- [Set] This package implements a decision-theoretic approach to gene-set analysis, considering a number of different loss functions introduced in the companion Boca et al. (2013) Biometrics paper. Written by Simina Boca, Hector Corrada Bravo, Jeff Leek.