This package provides functions for fitting continuous-time Markov and hidden Markov multi-state models to longitudinal data. It was designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewise-constant in time.
Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the QFeatures package and relies on SingleCellExpirement to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.
The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.
This package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. It also contains functions for identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data like gene expression/RNA sequencing/methylation/brain imaging data that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise.
This package includes tools for marginal maximum likelihood estimation and joint maximum likelihood estimation for unidimensional and multidimensional item response models. The package functionality covers the Rasch model, 2PL model, 3PL model, generalized partial credit model, multi-faceted Rasch model, nominal item response model, structured latent class model, mixture distribution IRT models, and located latent class models. Latent regression models and plausible value imputation are also supported.
With this tool, a user should be able to quickly implement complex random effect models through simple C++ templates. The package combines CppAD (C++ automatic differentiation), Eigen (templated matrix-vector library) and CHOLMOD (sparse matrix routines available from R) to obtain an efficient implementation of the applied Laplace approximation with exact derivatives. Key features are: Automatic sparseness detection, parallelism through BLAS and parallel user templates.
The mzR package provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a wrapper for the ISB random access parser for mass spectrometry mzXML, mzData and mzML files. The package contains the original code written by the ISB, and a subset of the proteowizard library for mzML and mzIdentML. The netCDF reading code has previously been used in XCMS.
Call job::job(<code here>) to run R code as an RStudio job and keep your console free in the meantime. This allows for a productive workflow while testing (multiple) long-running chunks of code. It can also be used to organize results using the RStudio Jobs GUI or to test code in a clean environment. Two RStudio Addins can be used to run selected code as a job.
Pathway analysis based on p-values associated to genes from a genes expression analysis of interest. Utility functions enable to extract pathways from the Gene Ontology Biological Process (GOBP), Molecular Function (GOMF) and Cellular Component (GOCC), Kyoto Encyclopedia of Genes of Genomes (KEGG) and Reactome databases. Methodology, and helper functions to display the results as a table, barplot of pathway significance, Gene Ontology graph and pathway significance are available.
The fst package for R provides a fast, easy and flexible way to serialize data frames. With access speeds of multiple GB/s, fst is specifically designed to unlock the potential of high speed solid state disks. Data frames stored in the fst format have full random access, both in column and rows. The fst format allows for random access of stored data and compression with the LZ4 and ZSTD compressors.
This package provides drop-in replacements for the base system2() function with fine control and consistent behavior across platforms. It supports clean interruption, timeout, background tasks, and streaming STDIN / STDOUT / STDERR over binary or text connections. The package also provides functions for evaluating expressions inside a temporary fork. Such evaluations have no side effects on the main R process, and support reliable interrupts and timeouts. This provides the basis for a sandboxing mechanism.
Alternating least squares is often used to resolve components contributing to data with a bilinear structure; the basic technique may be extended to alternating constrained least squares. This package provides an implementation of multivariate curve resolution alternating least squares (MCR-ALS).
Commonly applied constraints include unimodality, non-negativity, and normalization of components. Several data matrices may be decomposed simultaneously by assuming that one of the two matrices in the bilinear decomposition is shared between datasets.
This package uses a Bayesian hierarchical model to detect enriched regions from ChIP-chip experiments. The common goal in analyzing this ChIP-chip data is to detect DNA-protein interactions from ChIP-chip experiments. The BAC package has mainly been tested with Affymetrix tiling array data. However, we expect it to work with other platforms (e.g. Agilent, Nimblegen, cDNA, etc.). Note that BAC does not deal with normalization, so you will have to normalize your data beforehand.
Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by the Prokka program) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. Roary is not intended for metagenomics or for comparing extremely diverse sets of genomes.
This package is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Independent hypothesis weighting (IHW) is a multiple testing procedure that increases power compared to the method of Benjamini and Hochberg by assigning data-driven weights to each hypothesis. The input to IHW is a two-column table of p-values and covariates. The covariate can be any continuous-valued or categorical variable that is thought to be informative on the statistical properties of each hypothesis test, while it is independent of the p-value under the null hypothesis.
This package provides tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods.
R-dsb improves protein expression analysis in droplet-based single-cell studies. The package specifically addresses noise in raw protein UMI counts from methods like CITE-seq. It identifies and removes two main sources of noise—protein-specific noise from unbound antibodies and droplet/cell-specific noise. The package is applicable to various methods, including CITE-seq, REAP-seq, ASAP-seq, TEA-seq, and Mission Bioplatform data. Check the vignette for tutorials on integrating dsb with Seurat and Bioconductor, and using dsb in Python.
This package provides functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data, in particular representation, manipulation and simulation of multistate data - the Lexis suite of functions, which includes interfaces to the mstate, etm and cmprsk packages. It also contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
This package provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census, including SF1 (Decennial short-form), SF3 (Decennial long-form), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs objects. The package provides new methods to conduct standard operations on acs objects and present/plot data in statistically appropriate ways.
The method implemented in this package performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. This avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric. This implementation accepts multinomial (i.e. discrete, with 2+ categories) or time-series data. This version also includes a randomised algorithm which is more efficient for larger data sets.
This package provides functions useful in the design and ANOVA of experiments. The content falls into the following groupings:
data,
factor manipulation functions,
design functions,
ANOVA functions,
matrix functions,
projector and canonical efficiency functions, and
miscellaneous functions.
There is a vignette called DesignNotes describing how to use the design functions for randomizing and assessing designs. The ANOVA functions facilitate the extraction of information when the Error function has been used in the call to aov.
The Well-Plate Maker (WPM) is a shiny application deployed as an R package. Functions for a command-line/script use are also available. The WPM allows users to generate well plate maps to carry out their experiments while improving the handling of batch effects. In particular, it helps controlling the "plate effect" thanks to its ability to randomize samples over multiple well plates. The algorithm for placing the samples is inspired by the backtracking algorithm: the samples are placed at random while respecting specific spatial constraints.