Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting. A description of the underlying methods and code interface is described in Huling and Chien (2022) <doi:10.18637/jss.v104.i06>.
Smart Adaptive Recommendations (SAR) is the name of a fast, scalable, adaptive algorithm for personalized recommendations based on user transactions and item descriptions. It produces easily explainable/interpretable recommendations and handles "cold item" and "semi-cold user" scenarios. This package provides two implementations of SAR': a standalone implementation, and an interface to a web service in Microsoft's Azure cloud: <https://github.com/Microsoft/Product-Recommendations/blob/master/doc/sar.md>. The former allows fast and easy experimentation, and the latter provides robust scalability and extra features for production use.
Estimation for longitudinal data following outcome dependent sampling using the sequential offsetted regression technique. Includes support for binary, count, and continuous data. The first regression is a logistic regression, which uses a known ratio (the probability of being sampled given that the subject/observation was referred divided by the probability of being sampled given that the subject/observation was no referred) as an offset to estimate the probability of being referred given outcome and covariates. The second regression uses this estimated probability to calculate the mean population response given covariates.
Analyze count time series with excess zeros. Two types of statistical models are supported: Markov regression and state-space models. They are also known as observation-driven and parameter-driven models respectively in the time series literature. The functions used for Markov regression or observation-driven models can also be used to fit ordinary regression models with independent data under the zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) assumption. The package also contains miscellaneous functions to compute density, distribution, quantile, and generate random numbers from ZIP and ZINB distributions.
SNM is a modeling strategy especially designed for normalizing high-throughput genomic data. The underlying premise of our approach is that your data is a function of what we refer to as study-specific variables. These variables are either biological variables that represent the target of the statistical analysis, or adjustment variables that represent factors arising from the experimental or biological setting the data is drawn from. The SNM approach aims to simultaneously model all study-specific variables in order to more accurately characterize the biological or clinical variables of interest.
This package implements the Beta Kernel Process (BKP) for nonparametric modeling of spatially varying binomial probabilities, together with its extension, the Dirichlet Kernel Process (DKP), for categorical or multinomial data. The package provides functions for model fitting, predictive inference with uncertainty quantification, posterior simulation, and visualization in one-and two-dimensional input spaces. Multiple kernel functions (Gaussian, Matern 5/2, and Matern 3/2) are supported, with hyperparameters optimized through multi-start gradient-based search. For more details, see Zhao, Qing, and Xu (2025) <doi:10.48550/arXiv.2508.10447>.
This package provides functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.
Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>.
Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. See Gaure (2013) <doi:10.1016/j.csda.2013.03.024> Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multi-way clustered standard errors, as well as limited mobility bias correction (Gaure 2014 <doi:10.1002/sta4.68>). Since version 3.0, it provides dedicated functions to estimate Poisson models.
The Moving Epidemic Method, created by T Vega and JE Lozano (2012, 2015) <doi:10.1111/j.1750-2659.2012.00422.x>, <doi:10.1111/irv.12330>, allows the weekly assessment of the epidemic and intensity status to help in routine respiratory infections surveillance in health systems. Allows the comparison of different epidemic indicators, timing and shape with past epidemics and across different regions or countries with different surveillance systems. Also, it gives a measure of the performance of the method in terms of sensitivity and specificity of the alert week.
Likelihood-based estimation of conditional transformation models via the most likely transformation approach described in Hothorn et al. (2018) <DOI:10.1111/sjos.12291> and Hothorn (2020) <DOI:10.18637/jss.v092.i01>. Shift-scale (Siegfried et al, 2023, <DOI:10.1080/00031305.2023.2203177>) and multivariate (Klein et al, 2022, <DOI:10.1111/sjos.12501>) transformation models are part of this package. A package vignette is available from <DOI:10.32614/CRAN.package.mlt.docreg> and more convenient user interfaces to many models from <DOI:10.32614/CRAN.package.tram>.
This package provides a computationally efficient way of fitting weighted linear fixed effects estimators for causal inference with various weighting schemes. Weighted linear fixed effects estimators can be used to estimate the average treatment effects under different identification strategies. This includes stratified randomized experiments, matching and stratification for observational studies, first differencing, and difference-in-differences. The package implements methods described in Imai and Kim (2017) "When should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?", available at <https://imai.fas.harvard.edu/research/FEmatch.html>.
Calculates a number of valuation adjustments including CVA, DVA, FBA, FCA, MVA and KVA. A two-way margin agreement has been implemented. For the KVA calculation four regulatory frameworks are supported: CEM, (simplified) SA-CCR, OEM and IMM. The probability of default is implied through the credit spreads curve. The package supports an exposure calculation based on SA-CCR which includes several trade types and a simulated path which is currently available only for Interest Rate Swaps. The latest regulatory capital charge methodologies have been implementing including BA-CVA & SA-CVA.
Set of functions to calculate Benthic Biotic Indices from composition data, obtained whether from morphotaxonomic inventories or sequencing data. Based on reference ecological weights publicly available for a set of commonly used marine biotic indices, such as AMBI (A Marine Biotic Index, Borja et al., 2000) <doi:10.1016/S0025-326X(00)00061-8> NSI (Norwegian Sensitivity Index) and ISI (Indicator Species Index) (Rygg 2013, <ISBN:978-82-577-6210-0>). It provides the ecological quality status of the samples based on each BBI as well as the normalized Ecological Quality Ratio.
Estimate different types of cluster robust standard errors (CR0, CR1, CR2) with degrees of freedom adjustments. Standard errors are computed based on Liang and Zeger (1986) <doi:10.1093/biomet/73.1.13> and Bell and McCaffrey <https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2002002/article/9058-eng.pdf?st=NxMjN1YZ>. Functions used in Huang and Li <doi:10.3758/s13428-021-01627-0>, Huang, Wiedermann', and Zhang <doi:10.1080/00273171.2022.2077290>, and Huang, Zhang', and Li (forthcoming: Journal of Research on Educational Effectiveness).
Create stunning network experiences powered by the G6 graph visualisation engine JavaScript library <https://g6.antv.antgroup.com/en>. In shiny mode, modify your graph directly from the server function to dynamically interact with nodes and edges. Select your favorite layout among 20 choices. 15 behaviors are available such as interactive edge creation, collapse-expand and brush select. 17 plugins designed to improve the user experience such as a mini-map, toolbars and grid lines. Customise the look and feel of your graph with comprehensive options for nodes, edges and more.
An implementation of the Ordered Forest estimator as developed in Lechner & Okasa (2019) <arXiv:1907.02436>. The Ordered Forest flexibly estimates the conditional probabilities of models with ordered categorical outcomes (so-called ordered choice models). Additionally to common machine learning algorithms the orf package provides functions for estimating marginal effects as well as statistical inference thereof and thus provides similar output as in standard econometric models for ordered choice. The core forest algorithm relies on the fast C++ forest implementation from the ranger package (Wright & Ziegler, 2017) <arXiv:1508.04409>.
Numerical derivatives through finite-difference approximations can be calculated using the pnd package with parallel capabilities and optimal step-size selection to improve accuracy. These functions facilitate efficient computation of derivatives, gradients, Jacobians, and Hessians, allowing for more evaluations to reduce the mathematical and machine errors. Designed for compatibility with the numDeriv package, which has not received updates in several years, it introduces advanced features such as computing derivatives of arbitrary order, improving the accuracy of Hessian approximations by avoiding repeated differencing, and parallelising slow functions on Windows, Mac, and Linux.
Plant ecologists often need to collect "traits" data about plant species which are often scattered among various databases: TR8 contains a set of tools which take care of automatically retrieving some of those functional traits data for plant species from publicly available databases (The Ecological Flora of the British Isles, LEDA traitbase, Ellenberg values for Italian Flora, Mycorrhizal intensity databases, BROT, PLANTS, Jepson Flora Project). The TR8 name, inspired by "car plates" jokes, was chosen since it both reminds of the main object of the package and is extremely short to type.
This package provides tools for Topological Data Analysis. The package focuses on statistical analysis of persistent homology and density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries GUDHI <https://project.inria.fr/gudhi/software/>, Dionysus <https://www.mrzv.org/software/dionysus/>, and PHAT <https://bitbucket.org/phat-code/phat/>. This package also implements methods from Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2015) <doi:10.20382/jocg.v6i2a8> for analyzing the statistical significance of persistent homology features.
This package provides an implementation of efficient approximate leave-one-out (LOO) cross-validation for Bayesian models fit using Markov chain Monte Carlo, as described in doi:10.1007/s11222-016-9696-4. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.
Parametric time warping aligns patterns. It aims to put corresponding features at the same locations. The algorithm searches for an optimal polynomial describing the warping. It is possible to align one sample to a reference, several samples to the same reference, or several samples to several references. One can choose between calculating individual warpings, or one global warping for a set of samples and one reference. Two optimization criteria are implemented: RMS error and WCC. Both warping of peak profiles and of peak lists are supported.
This package provides methods for learning causal relationships among a set of foreground variables X based on signals from a (potentially much larger) set of background variables Z, which are known non-descendants of X. The confounder blanket learner (CBL) uses sparse regression techniques to simultaneously perform many conditional independence tests, with complementary pairs stability selection to guarantee finite sample error control. CBL is sound and complete with respect to a so-called "lazy oracle", and works with both linear and nonlinear systems. For details, see Watson & Silva (2022) <arXiv:2205.05715>.
This package contains different algorithms and construction methods for optimal Latin hypercube designs (LHDs) with flexible sizes. Our package is comprehensive since it is capable of generating maximin distance LHDs, maximum projection LHDs, and orthogonal and nearly orthogonal LHDs. Detailed comparisons and summary of all the algorithms and construction methods in this package can be found at Hongzhi Wang, Qian Xiao and Abhyuday Mandal (2021) <doi:10.48550/arXiv.2010.09154>. This package is particularly useful in the area of Design and Analysis of Experiments (DAE). More specifically, design of computer experiments.