R is a language and environment for statistical computing and graphics. It provides a variety of statistical techniques, such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification and clustering. It also provides robust support for producing publication-quality data plots. A large amount of 3rd-party packages are available, greatly increasing its breadth and scope.
High-throughput sequencing experiments followed by differential expression analysis is a widely used approach to detect genomic biomarkers. A fundamental step in differential expression analysis is to model the association between gene counts and covariates of interest. NBAMSeq a flexible statistical model based on the generalized additive model and allows for information sharing across genes in variance estimation.
Filtering of lowly expressed features (e.g. genes) is a common step before performing statistical analysis, but an arbitrary threshold is generally chosen. SeqGate implements a method that rationalize this step by the analysis of the distibution of counts in replicate samples. The gate is the threshold above which sequenced features can be considered as confidently quantified.
This package Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting point estimates from interaction terms in regressions (and multiply imputed regressions). NOTE: Weighted partial correlation calculations pulled to address a bug.
For tree ensembles such as random forests, regularized random forests and gradient boosted trees, this package provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner; calculating frequent variable interactions; formatting rules in latex code. Reference: Interpreting tree ensembles with inTrees (Houtao Deng, 2019, <doi:10.1007/s41060-018-0144-8>).
Messina is a collection of algorithms for constructing optimally robust single-gene classifiers, and for identifying differential expression in the presence of outliers or unknown sample subgroups. The methods have application in identifying lead features to develop into clinical tests (both diagnostic and prognostic), and in identifying differential expression when a fraction of samples show unusual patterns of expression.
The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.
This package enables automated selection of group specific signature, especially for rare population. The package is developed for generating specifc lists of signature genes based on Term Frequency-Inverse Document Frequency (TF-IDF) modified methods. It can also be used as a new gene-set scoring method or data transformation method. Multiple visualization functions are implemented in this package.
R is a language and environment for statistical computing and graphics. It provides a variety of statistical techniques, such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification and clustering. It also provides robust support for producing publication-quality data plots. A large amount of 3rd-party packages are available, greatly increasing its breadth and scope.
The GNU Privacy Guard is a complete implementation of the OpenPGP standard. It is used to encrypt and sign data and communication. It features powerful key management and the ability to access public key servers. It includes several libraries: libassuan (IPC between GnuPG components), libgpg-error (centralized GnuPG error values), and libskba (working with X.509 certificates and CMS data).
The affyPLM provides a package that extends and improves the functionality of the base affy package. For speeding up the runs, it includes routines that make heavy use of compiled code. The central focus is on implementation of methods for fitting probe-level models and tools using these models. PLM based quality assessment tools are also provided.
This package provides the Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software in a standard R data frame with key-value pairs. Included are the original human gene symbols and Entrez IDs as well as the equivalents for various frequently studied model organisms such as mouse, rat, pig, fly, and yeast.
This package provides estimators for multinomial logit models in their conditional logit and baseline logit variants, with or without random effects, with or without overdispersion. Random effects models are estimated using the PQL technique (based on a Laplace approximation) or the MQL technique (based on a Solomon-Cox approximation). Estimates should be treated with caution if the group sizes are small.
Ggplot2 is an implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes.
RadeonTop monitors resource consumption on supported AMD Radeon Graphics Processing Units (GPUs), either in real time as bar graphs on a terminal or saved to a file for further processing. It measures both the activity of the GPU as a whole, which is also accurate during OpenCL computations, as well as separate component statistics that are only meaningful under OpenGL graphics workloads.
This package provides a collection of algorithms and functions to aid statistical modeling. It includes growth curve comparisons, limiting dilution analysis (aka ELDA), mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. It also includes advanced generalized linear model functions that implement secure convergence, dispersion modeling and Tweedie power-law families.
This package generates ROC plots. Most ROC curve plots obscure the cutoff values and inhibit interpretation and comparison of multiple curves. This attempts to address those shortcomings by providing plotting and interactive tools. Functions are provided to generate an interactive ROC curve plot for web use, and print versions. A Shiny application implementing the functions is also included.
Network Security Services (NSS) is a set of libraries designed to support cross-platform development of security-enabled client and server applications. Applications built with NSS can support SSL v2 and v3, TLS, PKCS #5, PKCS #7, PKCS #11, PKCS #12, S/MIME, X.509 v3 certificates, and other security standards.
This package tracks the Rapid Release channel, which updates frequently.
The goal of MineICA is to perform Independent Component Analysis (ICA) on multiple transcriptome datasets, integrating additional data (e.g molecular, clinical and pathological). This Integrative ICA helps the biological interpretation of the components by studying their association with variables (e.g sample annotations) and gene sets, and enables the comparison of components from different datasets using correlation-based graph.
This package lets you replace the standard x-axis in ggplots with a combination matrix to visualize complex set overlaps. UpSet has introduced a new way to visualize the overlap of sets as an alternative to Venn diagrams. This package provides a simple way to produce such plots using ggplot2. In addition it can convert any categorical axis into a combination matrix axis.
svaNUMT contains functions for detecting NUMT events from structural variant calls. It takes structural variant calls in GRanges of breakend notation and identifies NUMTs by nuclear-mitochondrial breakend junctions. The main function reports candidate NUMTs if there is a pair of valid insertion sites found on the nuclear genome within a certain distance threshold. The candidate NUMTs are reported by events.
This package generates pathway scores from expression data for single samples after training on a reference cohort. The score is generated by taking the expression of a gene set (pathway) from a reference cohort and performing linear discriminant analysis to distinguish samples in the cohort that have the pathway augmented and not. The separating hyperplane is then used to score new samples.
IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.
The bayNorm package is used for normalizing single-cell RNA-seq data. The main function is bayNorm, which is a wrapper function for gene specific prior parameter estimation and normalization. The input is a matrix of scRNA-seq data with rows different genes and columns different cells. The output is either point estimates from posterior (2D array) or samples from posterior (3D array).