Fit a full or subsampling bagging survival tree on a mixture of population (susceptible and nonsusceptible) using either a pseudo R2 criterion or an adjusted Logrank criterion. The predictor is evaluated using the Out Of Bag Integrated Brier Score (IBS) and several scores of importance are computed for variable selection. The thresholds values for variable selection are computed using a nonparametric permutation test. See Cyprien Mbogning and Philippe Broet (2016)<doi:10.1186/s12859-016-1090-x> for an overview about the methods implemented in this package.
This package implements an algorithm for fitting a generative model with an intractable likelihood using only box constraints on the parameters. The implemented algorithm consists of two phases. The first phase (global search) aims to identify the region containing the best solution, while the second phase (local search) refines this solution using a trust-region version of the Fisher scoring method to solve a quasi-likelihood equation. See Guido Masarotto (2025) <doi:10.48550/arXiv.2511.08180> for the details of the algorithm and supporting results.
This package implements proper and so-called Maximum Likelihood Multiple Imputation as described by von Hippel and Bartlett (2021) <doi:10.1214/20-STS793>. A number of different imputation methods are available, by utilising the norm', cat and mix packages. Inferences can be performed either using Rubin's rules (for proper imputation), or a modified version for maximum likelihood imputation. For maximum likelihood imputations a likelihood score based approach based on theory by Wang and Robins (1998) <doi:10.1093/biomet/85.4.935> is also available.
This package contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation (Ibrahim et al., 2005; <doi:10.1198/016214504000001844>; Luedtke, Robitzsch, & West, 2020a, 2020b; <doi:10.1080/00273171.2019.1640104><doi:10.1037/met0000233>). The regression model can be nonlinear (e.g., interaction effects, quadratic effects or B-spline functions). Multilevel models with missing data in predictors are available for Bayesian estimation. Substantive-model compatible multiple imputation can be also conducted.
We consider the network structure detection for variables Y with auxiliary variables X accommodated, which are possibly subject to measurement error. The following three functions are designed to address various structures by different methods : one is NP_Graph() that is used for handling the nonlinear relationship between the responses and the covariates, another is Joint_Gaussian() that is used for correction in linear regression models via the Gaussian maximum likelihood, and the other Cond_Gaussian() is for linear regression models via conditional likelihood function.
Flexible implementation of the Standardized Ranking Performance Index (sRPI) for model selection based on multiple evaluation criteria. The package combines multiple statistical measures into a single index to provide an objective and robust ranking of models across calibration, validation, and combined scenarios. It supports evaluation of statistical, machine learning, and other predictive models using user-defined performance criteria. For more details see Aschonitis et al. (2019) <doi:10.1016/j.envsoft.2019.01.005> and Singh et al. (2023) <doi:10.1016/j.ecoinf.2022.101933>.
Implementation of prediction and inference procedures for Synthetic Control methods using least square, lasso, ridge, or simplex-type constraints. Uncertainty is quantified with prediction intervals as developed in Cattaneo, Feng, and Titiunik (2021) <doi:10.1080/01621459.2021.1979561> for a single treated unit and in Cattaneo, Feng, Palomba, and Titiunik (2025) <doi:10.1162/rest_a_01588> for multiple treated units and staggered adoption. More details about the software implementation can be found in Cattaneo, Feng, Palomba, and Titiunik (2025) <doi:10.18637/jss.v113.i01>.
Access and manipulate spatial tracking data, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from tracking data. There are coercion methods to convert between trip and ltraj from adehabitatLT', and between trip and psp and ppp from spatstat'. Trip objects can be created from raw or grouped data frames, and from types in the sp', sf', amt', trackeR', mousetrap', and other packages, Sumner, MD (2011) <https://figshare.utas.edu.au/articles/thesis/The_tag_location_problem/23209538>.
This package provides functions for the Bayesian analysis of some simple commonly-used models, without using Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling. The rust package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution, using the generalized ratio-of-uniforms method. See Wakefield, Gelfand and Smith (1991) <DOI:10.1007/BF01889987> for details. At the moment three conjugate hierarchical models are available: beta-binomial, gamma-Poisson and a 1-way analysis of variance (ANOVA).
Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://www.epa.gov/comptox-tools/computational-toxicology-and-exposure-apis>. ctxR was developed to streamline the process of accessing the information available through the CTX APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.
Compares distributions with one another in terms of their fit to each sample in a dataset that contains multiple samples, as described in Joo, Aguinis, and Bradley (in press). Users can examine the fit of seven distributions per sample: pure power law, lognormal, exponential, power law with an exponential cutoff, normal, Poisson, and Weibull. Automation features allow the user to compare all distributions for all samples with a single command line, which creates a separate row containing results for each sample until the entire dataset has been analyzed.
This package provides a dynamic programming algorithm for the fast segmentation of univariate signals into piecewise constant profiles. The fpop package is a wrapper to a C++ implementation of the fpop (Functional Pruning Optimal Partioning) algorithm described in Maidstone et al. 2017 <doi:10.1007/s11222-016-9636-3>. The problem of detecting changepoints in an univariate sequence is formulated in terms of minimising the mean squared error over segmentations. The fpop algorithm exactly minimizes the mean squared error for a penalty linear in the number of changepoints.
This package provides a convenient interface with the OpenAI ChatGPT API <https://openai.com/api>. gptr allows you to interact with ChatGPT', a powerful language model, for various natural language processing tasks. The gptr R package makes talking to ChatGPT in R super easy. It helps researchers and data folks by simplifying the complicated stuff, like asking questions and getting answers. With gptr', you can use ChatGPT in R without any hassle, making it simpler for everyone to do cool things with language!
This package provides functions for specifying and fitting marginal models for contingency tables proposed by Bergsma and Rudas (2002) <doi:10.1214/aos/1015362188> here called hierarchical multinomial marginal models (hmmm) and their extensions presented by Bartolucci, Colombi and Forcina (2007) <https://www.jstor.org/stable/24307737>; multinomial Poisson homogeneous (mph) models and homogeneous linear predictor (hlp) models for contingency tables proposed by Lang (2004) <doi:10.1214/aos/1079120140> and Lang (2005) <doi:10.1198/016214504000001042>. Inequality constraints on the parameters are allowed and can be tested.
Fast and accurate inference of gene-environment associations (GEA) in genome-wide studies (Caye et al., 2019, <doi:10.1093/molbev/msz008>). We developed a least-squares estimation approach for confounder and effect sizes estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several times faster than the existing GEA approaches, then our previous version of the LFMM program present in the LEA package (Frichot and Francois, 2015, <doi:10.1111/2041-210X.12382>).
This package provides a PC Algorithm with the Principle of Mendelian Randomization. This package implements the MRPC (PC with the principle of Mendelian randomization) algorithm to infer causal graphs. It also contains functions to simulate data under a certain topology, to visualize a graph in different ways, and to compare graphs and quantify the differences. See Badsha and Fu (2019) <doi:10.3389/fgene.2019.00460>, Badsha, Martin and Fu (2021) <doi:10.3389/fgene.2021.651812>, Kvamme and Badsha, et al. (2025) <doi:10.1093/genetics/iyaf064>.
Analysis of multivariate functional spatial data, including spectral multivariate functional principal component analysis and related statistical procedures (Si-Ahmed, Idris, et al. "Principal component analysis of multivariate spatial functional data." Big Data Research 39 (2025) 100504). (Kuenzer, T., Hörmann, S., & Kokoszka, P. (2021). "Principal component analysis of spatially indexed functions." Journal of the American Statistical Association, 116(535), 1444-1456.) (Happ, C., & Greven, S. (2018). "Multivariate functional principal component analysis for data observed on different (dimensional) domains." Journal of the American Statistical Association, 113(522), 649-659.).
Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and <https://sccs-studies.info/index.html>.
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
This package facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R, a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file. It also may be converted into other popular R objects. This package provides a link between VCF data and familiar R software.
Computes approximated adjusted fractional Bayes factors for equality, inequality, and about equality constrained hypotheses. For a tutorial on this method, see Hoijtink, Mulder, van Lissa, & Gu, (2019) <doi:10.1037/met0000201>. For applications in structural equation modeling, see: Van Lissa, Gu, Mulder, Rosseel, Van Zundert, & Hoijtink, (2021) <doi:10.1080/10705511.2020.1745644>. For the statistical underpinnings, see Gu, Mulder, and Hoijtink (2018) <doi:10.1111/bmsp.12110>; Hoijtink, Gu, & Mulder, J. (2019) <doi:10.1111/bmsp.12145>; Hoijtink, Gu, Mulder, & Rosseel, (2019) <doi:10.31234/osf.io/q6h5w>.
This package implements the regression approach of Zuber and Strimmer (2011) "High-dimensional regression and variable selection using CAR scores" SAGMB 10: 34, <DOI:10.2202/1544-6115.1730>. CAR scores measure the correlation between the response and the Mahalanobis-decorrelated predictors. The squared CAR score is a natural measure of variable importance and provides a canonical ordering of variables. This package provides functions for estimating CAR scores, for variable selection using CAR scores, and for estimating corresponding regression coefficients. Both shrinkage as well as empirical estimators are available.
Some EM-type algorithms to estimate parameters for the well-known Heckman selection model are provided in the package. Such algorithms are as follow: ECM(Expectation/Conditional Maximization), ECM(NR)(the Newton-Raphson method is adapted to the ECM) and ECME(Expectation/Conditional Maximization Either). Since the algorithms are based on the EM algorithm, they also have EMâ s main advantages, namely, stability and ease of implementation. Further details and explanations of the algorithms can be found in Zhao et al. (2020) <doi: 10.1016/j.csda.2020.106930>.
An implementation of the generalized graded unfolding model (GGUM) in R, see Roberts, Donoghue, and Laughlin (2000) <doi:10.1177/01466216000241001>). It allows to simulate data sets based on the GGUM. It fits the GGUM and the GUM, and it retrieves item and person parameter estimates. Several plotting functions are available (item and test information functions; item and test characteristic curves; item category response curves). Additionally, there are some functions that facilitate the communication between R and GGUM2004'. Finally, a model-fit checking utility, MODFIT(), is also available.