This package provides an interface to the FORCIS database (Chaabane et al. (2024) <doi:10.5281/zenodo.7390791>) on global foraminifera distribution. This package allows to download and to handle FORCIS data. It is part of the FRB-CESAB working group FORCIS. <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/forcis/>.
This package implements a new multiple imputation method that draws imputations from a latent joint multivariate normal model which underpins generally structured data. This model is constructed using a sequence of flexible conditional linear models that enables the resulting procedure to be efficiently implemented on high dimensional datasets in practice. See Robbins (2021) <arXiv:2008.02243>.
This package provides tools for sparse regression modelling with grouped predictors using the group subset selection penalty. Uses coordinate descent and local search algorithms to rapidly deliver near optimal estimates. The group subset penalty can be combined with a group lasso or ridge penalty for added shrinkage. Linear and logistic regression are supported, as are overlapping groups.
This package implements the Hierarchical Incremental GRAdient Descent (HiGrad) algorithm, a first-order algorithm for finding the minimizer of a function in online learning just like stochastic gradient descent (SGD). In addition, this method attaches a confidence interval to assess the uncertainty of its predictions. See Su and Zhu (2018) <arXiv:1802.04876> for details.
Estimating the mean and variance of a covariate for the complier, never-taker and always-taker subpopulation in the context of instrumental variable estimation. This package implements the method described in Marbach and Hangartner (2020) <doi:10.1017/pan.2019.48> and Hangartner, Marbach, Henckel, Maathuis, Kelz and Keele (2021) <doi:10.48550/arXiv.2103.06328>.
The "Manual on Low-flow Estimation and Prediction" (Gustard & Demuth (2009, ISBN:978-92-63-11029-9)), published by the World Meteorological Organisation, gives a comprehensive summary on how to analyse stream flow data focusing on low-flows. This packages provides functions to compute the described statistics and produces plots similar to the ones in the manual.
Toolset that enriches mlr with a diverse set of preprocessing operators. Composable Preprocessing Operators ("CPO"s) are first-class R objects that can be applied to data.frames and mlr "Task"s to modify data, can be attached to mlr "Learner"s to add preprocessing to machine learning algorithms, and can be composed to form preprocessing pipelines.
This package implements order selection for Vector Autoregressive (VAR) models using the Mean Square Information Criterion (MIC). Unlike standard methods such as AIC and BIC, MIC is likelihood-free. This method consistently estimates VAR order and has robust performance under model misspecification. For more details, see Hellstern and Shojaie (2025) <doi:10.48550/arXiv.2511.19761>.
Implement surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models. Background and details about the methods can be found at Zhang et al. (2019) <doi:10.1038/s41596-019-0227-6>, Yu et al. (2017) <doi:10.1093/jamia/ocw135>, and Liao et al. (2015) <doi:10.1136/bmj.h1885>.
Quantile regression (QR) for Nonlinear Mixed-Effects Models via the asymmetric Laplace distribution (ALD). It uses the Stochastic Approximation of the EM (SAEM) algorithm for deriving exact maximum likelihood estimates and full inference result is for the fixed-effects and variance components. It also provides prediction and graphical summaries for assessing the algorithm convergence and fitting results.
This package implements moving-blocks bootstrap and extended tapered-blocks bootstrap, as well as smooth versions of each, for quantile regression in time series. This package accompanies the paper: Gregory, K. B., Lahiri, S. N., & Nordman, D. J. (2018). A smooth block bootstrap for quantile regression with time series. The Annals of Statistics, 46(3), 1138-1166.
This package provides tools for the simulation of data in the context of small area estimation. Combine all steps of your simulation - from data generation over drawing samples to model fitting - in one object. This enables easy modification and combination of different scenarios. You can store your results in a folder or start the simulation in parallel.
This package provides a comprehensive resource for data on Taylor Swift songs. Data is included for all officially released studio albums, extended plays (EPs), and individual singles are included. Data comes from Genius (lyrics) and Spotify (song characteristics). Additional functions are included for easily creating data visualizations with color palettes inspired by Taylor Swift's album covers.
Greedy optimal subset selection for transformation models (Hothorn et al., 2018, <doi:10.1111/sjos.12291> ) based on the abess algorithm (Zhu et al., 2020, <doi:10.1073/pnas.2014241117> ). Applicable to models from packages tram and cotram'. Application to shift-scale transformation models are described in Siegfried et al. (2024, <doi:10.1080/00031305.2023.2203177>).
Topological data analysis studies structure and shape of the data using topological features. We provide a variety of algorithms to learn with persistent homology of the data based on functional summaries for clustering, hypothesis testing, visualization, and others. We refer to Wasserman (2018) <doi:10.1146/annurev-statistics-031017-100045> for a statistical perspective on the topic.
It proposes a novel variable selection approach in classification problem that takes into account the correlations that may exist between the predictors of the design matrix in a high-dimensional logistic model. Our approach consists in rewriting the initial high-dimensional logistic model to remove the correlation between the predictors and in applying the generalized Lasso criterion.
This package is an R package designed for QC, analysis, and exploration of single cell RNA-seq data. It easily enables widely-used analytical techniques, including the identification of highly variable genes, dimensionality reduction; PCA, ICA, t-SNE, standard unsupervised clustering algorithms; density clustering, hierarchical clustering, k-means, and the discovery of differentially expressed genes and markers.
Analysis of DNA mixtures involving relatives by computation of likelihood ratios that account for dropout and drop-in, mutations, silent alleles and population substructure. This is useful in kinship cases, like non-invasive prenatal paternity testing, where deductions about individuals relationships rely on DNA mixtures, and in criminal cases where the contributors to a mixed DNA stain may be related. Relationships are represented by pedigrees and can include kinship between more than two individuals. The main function is relMix() and its graphical user interface relMixGUI(). The implementation and method is described in Dorum et al. (2017) <doi:10.1007/s00414-016-1526-x>, Hernandis et al. (2019) <doi:10.1016/j.fsigss.2019.09.085> and Kaur et al. (2016) <doi:10.1007/s00414-015-1276-1>.
Helps users in quickly visualizing risk-of-bias assessments performed as part of a systematic review. It allows users to create weighted bar-plots of the distribution of risk-of-bias judgments within each bias domain, in addition to traffic-light plots of the specific domain-level judgments for each study. The resulting figures are of publication quality and are formatted according the risk-of-bias assessment tool use to perform the assessments. Currently, the supported tools are ROB2.0 (for randomized controlled trials; Sterne et al (2019) <doi:10.1136/bmj.l4898>), ROBINS-I (for non-randomised studies of interventions; Sterne et al (2016) <doi:10.1136/bmj.i4919>), and QUADAS-2 (for diagnostic accuracy studies; Whiting et al (2011) <doi:10.7326/0003-4819-155-8-201110180-00009>).
This package provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.
This package provides the exponential integrals E_1(x), E_2(x), E_n(x) and Ei(x), and the incomplete gamma function G(a, x) defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details.
This method identifies topological domains in genomes from Hi-C sequence data. The authors published an implementation of their method as an R script. This package originates from those original TopDom R scripts and provides help pages adopted from the original TopDom PDF documentation. It also provides a small number of bug fixes to the original code.
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. The package provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of ggplot2 and tidy data.
This package is a model building aid for nonlinear mixed-effects (population) model analysis using NONMEM, facilitating data set checkout, exploration and visualization, model diagnostics, candidate covariate identification and model comparison. The methods are described in Keizer et al. (2013) <doi:10.1038/psp.2013.24>, and Jonsson et al. (1999) <doi:10.1016/s0169-2607(98)00067-4>.