This package implements proper and so-called Maximum Likelihood Multiple Imputation as described by von Hippel and Bartlett (2021) <doi:10.1214/20-STS793>. A number of different imputation methods are available, by utilising the norm', cat and mix packages. Inferences can be performed either using Rubin's rules (for proper imputation), or a modified version for maximum likelihood imputation. For maximum likelihood imputations a likelihood score based approach based on theory by Wang and Robins (1998) <doi:10.1093/biomet/85.4.935> is also available.
We consider the network structure detection for variables Y with auxiliary variables X accommodated, which are possibly subject to measurement error. The following three functions are designed to address various structures by different methods : one is NP_Graph() that is used for handling the nonlinear relationship between the responses and the covariates, another is Joint_Gaussian() that is used for correction in linear regression models via the Gaussian maximum likelihood, and the other Cond_Gaussian() is for linear regression models via conditional likelihood function.
Implementation of prediction and inference procedures for Synthetic Control methods using least square, lasso, ridge, or simplex-type constraints. Uncertainty is quantified with prediction intervals as developed in Cattaneo, Feng, and Titiunik (2021) <doi:10.1080/01621459.2021.1979561> for a single treated unit and in Cattaneo, Feng, Palomba, and Titiunik (2025) <doi:10.1162/rest_a_01588> for multiple treated units and staggered adoption. More details about the software implementation can be found in Cattaneo, Feng, Palomba, and Titiunik (2025) <doi:10.18637/jss.v113.i01>.
Flexible implementation of the Standardized Ranking Performance Index (sRPI) for model selection based on multiple evaluation criteria. The package combines multiple statistical measures into a single index to provide an objective and robust ranking of models across calibration, validation, and combined scenarios. It supports evaluation of statistical, machine learning, and other predictive models using user-defined performance criteria. For more details see Aschonitis et al. (2019) <doi:10.1016/j.envsoft.2019.01.005> and Singh et al. (2023) <doi:10.1016/j.ecoinf.2022.101933>.
Access and manipulate spatial tracking data, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from tracking data. There are coercion methods to convert between trip and ltraj from adehabitatLT', and between trip and psp and ppp from spatstat'. Trip objects can be created from raw or grouped data frames, and from types in the sp', sf', amt', trackeR', mousetrap', and other packages, Sumner, MD (2011) <https://figshare.utas.edu.au/articles/thesis/The_tag_location_problem/23209538>.
This package provides functions for the Bayesian analysis of some simple commonly-used models, without using Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling. The rust package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution, using the generalized ratio-of-uniforms method. See Wakefield, Gelfand and Smith (1991) <DOI:10.1007/BF01889987> for details. At the moment three conjugate hierarchical models are available: beta-binomial, gamma-Poisson and a 1-way analysis of variance (ANOVA).
Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://www.epa.gov/comptox-tools/computational-toxicology-and-exposure-apis>. ctxR was developed to streamline the process of accessing the information available through the CTX APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.
Compares distributions with one another in terms of their fit to each sample in a dataset that contains multiple samples, as described in Joo, Aguinis, and Bradley (in press). Users can examine the fit of seven distributions per sample: pure power law, lognormal, exponential, power law with an exponential cutoff, normal, Poisson, and Weibull. Automation features allow the user to compare all distributions for all samples with a single command line, which creates a separate row containing results for each sample until the entire dataset has been analyzed.
This package provides a dynamic programming algorithm for the fast segmentation of univariate signals into piecewise constant profiles. The fpop package is a wrapper to a C++ implementation of the fpop (Functional Pruning Optimal Partioning) algorithm described in Maidstone et al. 2017 <doi:10.1007/s11222-016-9636-3>. The problem of detecting changepoints in an univariate sequence is formulated in terms of minimising the mean squared error over segmentations. The fpop algorithm exactly minimizes the mean squared error for a penalty linear in the number of changepoints.
This package provides a convenient interface with the OpenAI ChatGPT API <https://openai.com/api>. gptr allows you to interact with ChatGPT', a powerful language model, for various natural language processing tasks. The gptr R package makes talking to ChatGPT in R super easy. It helps researchers and data folks by simplifying the complicated stuff, like asking questions and getting answers. With gptr', you can use ChatGPT in R without any hassle, making it simpler for everyone to do cool things with language!
This package provides functions for specifying and fitting marginal models for contingency tables proposed by Bergsma and Rudas (2002) <doi:10.1214/aos/1015362188> here called hierarchical multinomial marginal models (hmmm) and their extensions presented by Bartolucci, Colombi and Forcina (2007) <https://www.jstor.org/stable/24307737>; multinomial Poisson homogeneous (mph) models and homogeneous linear predictor (hlp) models for contingency tables proposed by Lang (2004) <doi:10.1214/aos/1079120140> and Lang (2005) <doi:10.1198/016214504000001042>. Inequality constraints on the parameters are allowed and can be tested.
Fast and accurate inference of gene-environment associations (GEA) in genome-wide studies (Caye et al., 2019, <doi:10.1093/molbev/msz008>). We developed a least-squares estimation approach for confounder and effect sizes estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several times faster than the existing GEA approaches, then our previous version of the LFMM program present in the LEA package (Frichot and Francois, 2015, <doi:10.1111/2041-210X.12382>).
Analysis of multivariate functional spatial data, including spectral multivariate functional principal component analysis and related statistical procedures (Si-Ahmed, Idris, et al. "Principal component analysis of multivariate spatial functional data." Big Data Research 39 (2025) 100504). (Kuenzer, T., Hörmann, S., & Kokoszka, P. (2021). "Principal component analysis of spatially indexed functions." Journal of the American Statistical Association, 116(535), 1444-1456.) (Happ, C., & Greven, S. (2018). "Multivariate functional principal component analysis for data observed on different (dimensional) domains." Journal of the American Statistical Association, 113(522), 649-659.).
This package provides a PC Algorithm with the Principle of Mendelian Randomization. This package implements the MRPC (PC with the principle of Mendelian randomization) algorithm to infer causal graphs. It also contains functions to simulate data under a certain topology, to visualize a graph in different ways, and to compare graphs and quantify the differences. See Badsha and Fu (2019) <doi:10.3389/fgene.2019.00460>, Badsha, Martin and Fu (2021) <doi:10.3389/fgene.2021.651812>, Kvamme and Badsha, et al. (2025) <doi:10.1093/genetics/iyaf064>.
Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and <https://sccs-studies.info/index.html>.
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
This package facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R, a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file. It also may be converted into other popular R objects. This package provides a link between VCF data and familiar R software.
Computes approximated adjusted fractional Bayes factors for equality, inequality, and about equality constrained hypotheses. For a tutorial on this method, see Hoijtink, Mulder, van Lissa, & Gu, (2019) <doi:10.1037/met0000201>. For applications in structural equation modeling, see: Van Lissa, Gu, Mulder, Rosseel, Van Zundert, & Hoijtink, (2021) <doi:10.1080/10705511.2020.1745644>. For the statistical underpinnings, see Gu, Mulder, and Hoijtink (2018) <doi:10.1111/bmsp.12110>; Hoijtink, Gu, & Mulder, J. (2019) <doi:10.1111/bmsp.12145>; Hoijtink, Gu, Mulder, & Rosseel, (2019) <doi:10.31234/osf.io/q6h5w>.
This package implements the regression approach of Zuber and Strimmer (2011) "High-dimensional regression and variable selection using CAR scores" SAGMB 10: 34, <DOI:10.2202/1544-6115.1730>. CAR scores measure the correlation between the response and the Mahalanobis-decorrelated predictors. The squared CAR score is a natural measure of variable importance and provides a canonical ordering of variables. This package provides functions for estimating CAR scores, for variable selection using CAR scores, and for estimating corresponding regression coefficients. Both shrinkage as well as empirical estimators are available.
Some EM-type algorithms to estimate parameters for the well-known Heckman selection model are provided in the package. Such algorithms are as follow: ECM(Expectation/Conditional Maximization), ECM(NR)(the Newton-Raphson method is adapted to the ECM) and ECME(Expectation/Conditional Maximization Either). Since the algorithms are based on the EM algorithm, they also have EMâ s main advantages, namely, stability and ease of implementation. Further details and explanations of the algorithms can be found in Zhao et al. (2020) <doi: 10.1016/j.csda.2020.106930>.
An implementation of the generalized graded unfolding model (GGUM) in R, see Roberts, Donoghue, and Laughlin (2000) <doi:10.1177/01466216000241001>). It allows to simulate data sets based on the GGUM. It fits the GGUM and the GUM, and it retrieves item and person parameter estimates. Several plotting functions are available (item and test information functions; item and test characteristic curves; item category response curves). Additionally, there are some functions that facilitate the communication between R and GGUM2004'. Finally, a model-fit checking utility, MODFIT(), is also available.
This package implements Cumulative Sum (CUSUM) control charts specifically designed for monitoring processes following a Gamma distribution. Provides functions to estimate distribution parameters, simulate control limits, and apply cautious learning schemes for adaptive thresholding. It supports upward and downward monitoring with guaranteed performance evaluated via Monte Carlo simulations. It is useful for quality control applications in industries where data follows a Gamma distribution. Methods are based on Madrid-Alvarez et al. (2024) <doi:10.1002/qre.3464> and Madrid-Alvarez et al. (2024) <doi:10.1080/08982112.2024.2440368>.
This package provides a lightweight, dependency-free, and simplified implementation of the Pseudo-Expectation Gauss-Seidel (PEGS) algorithm. It fits the multivariate ridge regression model for genomic prediction Xavier and Habier (2022) <doi:10.1186/s12711-022-00730-w> and Xavier et al. (2025) <doi:10.1093/genetics/iyae179>, providing heritability estimates, genetic correlations, breeding values, and regression coefficient estimates for prediction. This package provides an alternative to the bWGR package by Xavier et al. (2019) <doi:10.1093/bioinformatics/btz794> by using LAPACK for its algebraic operations.
Calculate Predictive Moran's Eigenvector Maps (pMEM) for spatially-explicit prediction of environmental variables, as defined by Guénard and Legendre (2024) <doi:10.1111/2041-210X.14413>. pMEM extends classical MEM by enabling interpolation and prediction at unsampled locations using spatial weighting functions parameterized by range (and optionally shape). The package implements multiple pMEM types (e.g., exponential, Gaussian, linear) and features a modular architecture that allows programmers to define custom weighting functions. Designed for ecologists, geographers, and spatial analysts working with spatially-structured data.