Toolset that enriches mlr with a diverse set of preprocessing operators. Composable Preprocessing Operators ("CPO"s) are first-class R objects that can be applied to data.frames and mlr "Task"s to modify data, can be attached to mlr "Learner"s to add preprocessing to machine learning algorithms, and can be composed to form preprocessing pipelines.
This package implements order selection for Vector Autoregressive (VAR) models using the Mean Square Information Criterion (MIC). Unlike standard methods such as AIC and BIC, MIC is likelihood-free. This method consistently estimates VAR order and has robust performance under model misspecification. For more details, see Hellstern and Shojaie (2025) <doi:10.48550/arXiv.2511.19761>.
Implement surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models. Background and details about the methods can be found at Zhang et al. (2019) <doi:10.1038/s41596-019-0227-6>, Yu et al. (2017) <doi:10.1093/jamia/ocw135>, and Liao et al. (2015) <doi:10.1136/bmj.h1885>.
Quantile regression (QR) for Nonlinear Mixed-Effects Models via the asymmetric Laplace distribution (ALD). It uses the Stochastic Approximation of the EM (SAEM) algorithm for deriving exact maximum likelihood estimates and full inference result is for the fixed-effects and variance components. It also provides prediction and graphical summaries for assessing the algorithm convergence and fitting results.
This package implements moving-blocks bootstrap and extended tapered-blocks bootstrap, as well as smooth versions of each, for quantile regression in time series. This package accompanies the paper: Gregory, K. B., Lahiri, S. N., & Nordman, D. J. (2018). A smooth block bootstrap for quantile regression with time series. The Annals of Statistics, 46(3), 1138-1166.
This package provides tools for the simulation of data in the context of small area estimation. Combine all steps of your simulation - from data generation over drawing samples to model fitting - in one object. This enables easy modification and combination of different scenarios. You can store your results in a folder or start the simulation in parallel.
Greedy optimal subset selection for transformation models (Hothorn et al., 2018, <doi:10.1111/sjos.12291> ) based on the abess algorithm (Zhu et al., 2020, <doi:10.1073/pnas.2014241117> ). Applicable to models from packages tram and cotram'. Application to shift-scale transformation models are described in Siegfried et al. (2024, <doi:10.1080/00031305.2023.2203177>).
Topological data analysis studies structure and shape of the data using topological features. We provide a variety of algorithms to learn with persistent homology of the data based on functional summaries for clustering, hypothesis testing, visualization, and others. We refer to Wasserman (2018) <doi:10.1146/annurev-statistics-031017-100045> for a statistical perspective on the topic.
It proposes a novel variable selection approach in classification problem that takes into account the correlations that may exist between the predictors of the design matrix in a high-dimensional logistic model. Our approach consists in rewriting the initial high-dimensional logistic model to remove the correlation between the predictors and in applying the generalized Lasso criterion.
This package is an R package designed for QC, analysis, and exploration of single cell RNA-seq data. It easily enables widely-used analytical techniques, including the identification of highly variable genes, dimensionality reduction; PCA, ICA, t-SNE, standard unsupervised clustering algorithms; density clustering, hierarchical clustering, k-means, and the discovery of differentially expressed genes and markers.
Analysis of DNA mixtures involving relatives by computation of likelihood ratios that account for dropout and drop-in, mutations, silent alleles and population substructure. This is useful in kinship cases, like non-invasive prenatal paternity testing, where deductions about individuals relationships rely on DNA mixtures, and in criminal cases where the contributors to a mixed DNA stain may be related. Relationships are represented by pedigrees and can include kinship between more than two individuals. The main function is relMix() and its graphical user interface relMixGUI(). The implementation and method is described in Dorum et al. (2017) <doi:10.1007/s00414-016-1526-x>, Hernandis et al. (2019) <doi:10.1016/j.fsigss.2019.09.085> and Kaur et al. (2016) <doi:10.1007/s00414-015-1276-1>.
Helps users in quickly visualizing risk-of-bias assessments performed as part of a systematic review. It allows users to create weighted bar-plots of the distribution of risk-of-bias judgments within each bias domain, in addition to traffic-light plots of the specific domain-level judgments for each study. The resulting figures are of publication quality and are formatted according the risk-of-bias assessment tool use to perform the assessments. Currently, the supported tools are ROB2.0 (for randomized controlled trials; Sterne et al (2019) <doi:10.1136/bmj.l4898>), ROBINS-I (for non-randomised studies of interventions; Sterne et al (2016) <doi:10.1136/bmj.i4919>), and QUADAS-2 (for diagnostic accuracy studies; Whiting et al (2011) <doi:10.7326/0003-4819-155-8-201110180-00009>).
This package provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.
This package provides the exponential integrals E_1(x), E_2(x), E_n(x) and Ei(x), and the incomplete gamma function G(a, x) defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details.
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. The package provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of ggplot2 and tidy data.
This method identifies topological domains in genomes from Hi-C sequence data. The authors published an implementation of their method as an R script. This package originates from those original TopDom R scripts and provides help pages adopted from the original TopDom PDF documentation. It also provides a small number of bug fixes to the original code.
This package is a model building aid for nonlinear mixed-effects (population) model analysis using NONMEM, facilitating data set checkout, exploration and visualization, model diagnostics, candidate covariate identification and model comparison. The methods are described in Keizer et al. (2013) <doi:10.1038/psp.2013.24>, and Jonsson et al. (1999) <doi:10.1016/s0169-2607(98)00067-4>.
MEIGOR provides a comprehensive environment for performing global optimization tasks in bioinformatics and systems biology. It leverages advanced metaheuristic algorithms to efficiently search the solution space and is specifically tailored to handle the complexity and high-dimensionality of biological datasets. This package supports various optimization routines and is integrated with Bioconductor's infrastructure for a seamless analysis workflow.
An R package for multiple-group comparison to detect tissue/cell-specific marker genes among subtypes. It provides functions to compute OVESEG-test statistics, derive component weights in the mixture null distribution model and estimate p-values from weightedly aggregated permutations. Obtained posterior probabilities of component null hypotheses can also portrait all kinds of upregulation patterns among subtypes.
Implementation of the technique of Lleonart et al. (2000) <doi:10.1006/jtbi.2000.2043> to scale body measurements that exhibit an allometric growth. This procedure is a theoretical generalization of the technique used by Thorpe (1975) <doi:10.1111/j.1095-8312.1975.tb00732.x> and Thorpe (1976) <doi:10.1111/j.1469-185X.1976.tb01063.x>.
Typically, models in R exist in memory and can be saved via regular R serialization. However, some models store information in locations that cannot be saved using R serialization alone. The goal of bundle is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.
Bayesian analysis of luminescence data and C-14 age estimates. Bayesian models are based on the following publications: Combes, B. & Philippe, A. (2017) <doi:10.1016/j.quageo.2017.02.003> and Combes et al. (2015) <doi:10.1016/j.quageo.2015.04.001>. This includes, amongst others, data import, export, application of age models and palaeodose model.
Convert text into synthesized speech and get a list of supported voices for a region. Microsoft's Cognitive Services Text to Speech REST API <https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech?tabs=streaming> supports neural text to speech voices, which support specific languages and dialects that are identified by locale.
The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.