R is a language and environment for statistical computing and graphics. It provides a variety of statistical techniques, such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification and clustering. It also provides robust support for producing publication-quality data plots. A large amount of 3rd-party packages are available, greatly increasing its breadth and scope.
The MIDASim package is a microbiome data simulator for generating realistic microbiome datasets by adapting a user-provided template. It supports the controlled introduction of experimental signals-such as shifts in taxon relative abundances, prevalence, and sample library sizes-to create distinct synthetic populations under diverse simulation scenarios. For more details, see He et al. (2024) <doi:10.1186/s40168-024-01822-z>.
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.
An implementation of 14 parsimonious mixture models for model-based clustering or model-based classification. Gaussian, Student's t, generalized hyperbolic, variance-gamma or skew-t mixtures are available. All approaches work with missing data. Celeux and Govaert (1995) <doi:10.1016/0031-3203(94)00125-6>, Browne and McNicholas
(2014) <doi:10.1007/s11634-013-0139-1>, Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>.
This package provides tools to help convert credit risk data at two timepoints into traditional credit state migration (aka, "transition") matrices. At a higher level, migrate is intended to help an analyst understand how risk moved in their credit portfolio over a time interval. References to this methodology include: 1. Schuermann, T. (2008) <doi:10.1002/9780470061596.risk0409>. 2. Perederiy, V. (2017) <doi:10.48550/arXiv.1708.00062>
.
This package provides a set of tools for testing networks. It includes functions for univariate and multivariate conditional uniform graph and quadratic assignment procedure testing, and network regression. The package is a complement to Multimodal Political Networks (2021, ISBN:9781108985000), and includes various datasets used in the book. Built on the manynet package, all functions operate with matrices, edge lists, and igraph', network', and tidygraph objects, and on one-mode and two-mode (bipartite) networks.
The Self-Organizing Maps with Built-in Missing Data Imputation. Missing values are imputed and regularly updated during the online Kohonen algorithm. Our method can be used for data visualisation, clustering or imputation of missing data. It is an extension of the online algorithm of the kohonen package. The method is described in the article "Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values" by S. Rejeb, C. Duveau, T. Rebafka (2022) <arXiv:2202.07963>
.
When a network is partially observed (here, NAs in the adjacency matrix rather than 1 or 0 due to missing information between node pairs), it is possible to account for the underlying process that generates those NAs. missSBM
', presented in Barbillon, Chiquet and Tabouy (2022) <doi:10.18637/jss.v101.i12>, adjusts the popular stochastic block model from network data sampled under various missing data conditions, as described in Tabouy, Barbillon and Chiquet (2019) <doi:10.1080/01621459.2018.1562934>.
This package creates and runs Bayesian mixing models to analyze biological tracer data (i.e. stable isotopes, fatty acids), which estimate the proportions of source (prey) contributions to a mixture (consumer). MixSIAR
is not one model, but a framework that allows a user to create a mixing model based on their data structure and research questions, via options for fixed/ random effects, source data types, priors, and error terms. MixSIAR
incorporates several years of advances since MixSIR
and SIAR'.
This package is deprecated. Please use redatamx instead. Provides an API to work with Redatam (see <https://redatam.org>) databases in both formats: RXDB (new format) and DICX (old format) and running Redatam programs written in SPC language. It's a wrapper around Redatam core and provides functions to open/close a database (redatam_open()/redatam_close()
), list entities and variables from the database (redatam_entities()
, redatam_variables()
) and execute a SPC program and gets the results as data frames (redatam_query()
, redatam_run()
).
This package provides tools to generate HTML interfaces for adaptive and non-adaptive tests using the shiny package (Chalmers (2016) <doi:10.18637/jss.v071.i05>). Suitable for applying unidimensional and multidimensional computerized adaptive tests (CAT) using item response theory methodology and for creating simple questionnaires forms to collect response data directly in R. Additionally, optimal test designs (e.g., "shadow testing") are supported for tests that contain a large number of item selection constraints. Finally, package contains tools useful for performing Monte Carlo simulations for studying test item banks.
Implementation of a framework for cluster analysis with selection of the final number of clusters and an optional variable selection procedure. The package is designed to integrate the results of multiple imputed datasets while accounting for the uncertainty that the imputations introduce in the final results. In addition, the package can also be used for a cluster analysis of the complete cases of a single dataset. The package also includes specific methods to summarize and plot the results. The methods are described in Basagana et al. (2013) <doi:10.1093/aje/kws289>.
Utilizing model-based clustering (unsupervised) for functional magnetic resonance imaging (fMRI
) data. The developed methods (Chen and Maitra (2023) <doi:10.1002/hbm.26425>) include 2D and 3D clustering analyses (for p-values with voxel locations) and segmentation analyses (for p-values alone) for fMRI
data where p-values indicate significant level of activation responding to stimulate of interesting. The analyses are mainly identifying active voxel/signal associated with normal brain behaviors. Analysis pipelines (R scripts) utilizing this package (see examples in inst/workflow/') is also implemented with high performance techniques.
This package provides a procedure for comparing multivariate samples associated with different groups. It uses principal component analysis to convert multivariate observations into a set of linearly uncorrelated statistical measures, which are then compared using a number of statistical methods. The procedure is independent of the distributional properties of samples and automatically selects features that best explain their differences, avoiding manual selection of specific points or summary statistics. It is appropriate for comparing samples of time series, images, spectrometric measures or similar multivariate observations. This package is described in Fachada et al. (2016) <doi:10.32614/RJ-2016-055>.
The midasml package implements estimation and prediction methods for high-dimensional mixed-frequency (MIDAS) time-series and panel data regression models. The regularized MIDAS models are estimated using orthogonal (e.g. Legendre) polynomials and sparse-group LASSO (sg-LASSO) estimator. For more information on the midasml approach see Babii, Ghysels, and Striaukas (2021, JBES forthcoming) <doi:10.1080/07350015.2021.1899933>. The package is equipped with the fast implementation of the sg-LASSO estimator by means of proximal block coordinate descent. High-dimensional mixed frequency time-series data can also be easily manipulated with functions provided in the package.
This package provides a facility to generate various classes of fractional designs for order-of-addition experiments namely fractional order-of-additions orthogonal arrays, see Voelkel, Joseph G. (2019). "The design of order-of-addition experiments." Journal of Quality Technology 51:3, 230-241, <doi:10.1080/00224065.2019.1569958>. Provides facility to construct component orthogonal arrays, see Jian-Feng Yang, Fasheng Sun and Hongquan Xu (2020). "A Component Position Model, Analysis and Design for Order-of-Addition Experiments." Technometrics, <doi:10.1080/00401706.2020.1764394>. Supports generation of fractional designs for order-of-addition mixture experiments. Analysis of data from order-of-addition mixture experiments is also supported.
Machine learning method specifically designed for pre-miRNA
prediction. It takes advantage of unlabeled sequences to improve the prediction rates even when there are just a few positive examples, when the negative examples are unreliable or are not good representatives of its class. Furthermore, the method can automatically search for negative examples if the user is unable to provide them. MiRNAss
can find a good boundary to divide the pre-miRNAs
from other groups of sequences; it automatically optimizes the threshold that defines the classes boundaries, and thus, it is robust to high class imbalance. Each step of the method is scalable and can handle large volumes of data.
Deconvolution of thermal decay curves allows you to quantify proportions of biomass components in plant litter. Thermal decay curves derived from thermogravimetric analysis (TGA) are imported, modified, and then modelled in a three- or four- part mixture model using the Fraser-Suzuki function. The output is estimates for weights of pseudo-components corresponding to hemicellulose, cellulose, and lignin. For more information see: Müller-Hagedorn, M. and Bockhorn, H. (2007) <doi:10.1016/j.jaap.2006.12.008>, à rfão, J. J. M. and Figueiredo, J. L. (2001) <doi:10.1016/S0040-6031(01)00634-7>, and Yang, H. and Yan, R. and Chen, H. and Zheng, C. and Lee, D. H. and Liang, D. T. (2006) <doi:10.1021/ef0580117>.
Logistic-normal Multinomial (LNM) models are common in problems with multivariate count data. This package gives a simple implementation with a 30 line Stan script. This lightweight implementation makes it an easy starting point for other projects, in particular for downstream tasks that require analysis of "compositional" data. It can be applied whenever a multinomial probability parameter is thought to depend linearly on inputs in a transformed, log ratio space. Additional utilities make it easy to inspect, create predictions, and draw samples using the fitted models. More about the LNM can be found in Xia et al. (2013) "A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis" <doi:10.1111/biom.12079> and Sankaran and Holmes (2023) "Generative Models: An Interdisciplinary Perspective" <doi:10.1146/annurev-statistics-033121-110134>.
This package provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing OpenMP
API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2019). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. Journal of the American Statistical Association, <doi: 10.1080/01621459.2019.1635485>.
Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The mildsvm package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames. The core methods from mildsvm come from the following references: Kent and Yu (2022) <arXiv:2206.14704>
; Xiao, Liu, and Hao (2018) <doi:10.1109/TNNLS.2017.2766164>; Muandet et al. (2012) <https://proceedings.neurips.cc/paper/2012/file/9bf31c7ff062936a96d3c8bd1f8f2ff3-Paper.pdf>; Chu and Keerthi (2007) <doi:10.1162/neco.2007.19.3.792>; and Andrews et al. (2003) <https://papers.nips.cc/paper/2232-support-vector-machines-for-multiple-instance-learning.pdf>. Many functions use the Gurobi optimization back-end to improve the optimization problem speed; the gurobi R package and associated software can be downloaded from <https://www.gurobi.com> after obtaining a license.
This package provides a collection of R functions for analyzing finite mixture models.
This package provides a collection of functions for computations and visualizations of microbial pan-genomes.
Randomization schedules are generated in the schemes with k (k>=2) treatment groups and any allocation ratios by minimization algorithms.