Extends the Changes-in-Changes model a la Athey and Imbens (2006) <doi:10.1111/j.1468-0262.2006.00668.x> to multiple cohorts and time periods, which generalizes difference-in-differences estimation techniques to the entire distribution. Computes quantile treatment effects for every possible two-by-two combination in ecic(). Then, aggregating all bootstrap runs adds the standard errors in summary_ecic(). Results can be plotted with plot_ecic() aggregated over all cohort-group combinations or in an event-study style for either individual periods or individual quantiles.
The functions provided in the FADA (Factor Adjusted Discriminant Analysis) package aim at performing supervised classification of high-dimensional and correlated profiles. The procedure combines a decorrelation step based on a factor modeling of the dependence among covariates and a classification method. The available methods are Lasso regularized logistic model (see Friedman et al. (2010)), sparse linear discriminant analysis (see Clemmensen et al. (2011)), shrinkage linear and diagonal discriminant analysis (see M. Ahdesmaki et al. (2010)). More methods of classification can be used on the decorrelated data provided by the package FADA.
Distance metrics for mixed-type data consisting of continuous, nominal, and ordinal variables. This methodology uses additive and product kernels to calculate similarity functions and metrics, and selects variables relevant to the underlying distance through bandwidth selection via maximum similarity cross-validation. These methods can be used in any distance-based algorithm, such as distance-based clustering. For further details, we refer the reader to Ghashti and Thompson (2024) <doi:10.1007/s00357-024-09493-z> for dkps() methodology, and Ghashti (2024) <doi:10.14288/1.0443975> for dkss() methodology.
This package provides a variety of functions for the best known and most innovative approaches to nonparametric boundary estimation. The selected methods are concerned with empirical, smoothed, unrestricted as well as constrained fits under both separate and multiple shape constraints. They cover robust approaches to outliers as well as data envelopment techniques based on piecewise polynomials, splines, local linear fitting, extreme values and kernel smoothing. The package also seamlessly allows for Monte Carlo comparisons among these different estimation methods. Its use is illustrated via a number of empirical applications and simulated examples.
The Open Bodem Index (OBI) is a method to evaluate the quality of soils of agricultural fields in The Netherlands and the sustainability of the current agricultural practices. The OBI score is based on four main criteria: chemical, physical, biological and management, which consist of more than 21 indicators. By providing results of a soil analysis and management info the OBIC package can be use to calculate he scores, indicators and derivatives that are used by the OBI. More information about the Open Bodem Index can be found at <https://openbodemindex.nl/>.
The queueing model of visual search models the accuracy and response time data in a visual search experiment using queueing models with finite customer population and stopping criteria of completing the service for finite number of customers. It implements the conceptualization of a hybrid model proposed by Moore and Wolfe (2001), in which visual stimuli enter the processing one after the other and then are identified in parallel. This package provides functions that simulate the specified queueing process and calculate the Wasserstein distance between the empirical response times and the model prediction.
Four methods for mediation analysis with missing data: Listwise deletion, Pairwise deletion, Multiple imputation, and Two Stage Maximum Likelihood algorithm. For MI and TS-ML, auxiliary variables can be included. Bootstrap confidence intervals for mediation effects are obtained. The robust method is also implemented for TS-ML. Since version 1.4, bmem adds the capability to conduct power analysis for mediation models. Details about the methods used can be found in these articles. Zhang and Wang (2003) <doi:10.1007/s11336-012-9301-5>. Zhang (2014) <doi:10.3758/s13428-013-0424-0>.
Fit latent space network cluster models using an expectation-maximization algorithm. Enables flexible modeling of unweighted or weighted network data (with or without noise edges), supporting both directed and undirected networks (with or without degree and strength heterogeneity). Designed to handle large networks efficiently, it allows users to explore network structure through latent space representations, identify clusters (i.e., community detection) within network data, and simulate networks with varying clustering, connectivity patterns, and noise edges. Methodology for the implementation is described in Arakkal and Sewell (2025) <doi:10.1016/j.csda.2025.108228>.
Calculation of Predictive Moran's eigenvector maps (pMEM), as defined by Guénard and Legendre (In Press) "Spatially-explicit predictions using spatial eigenvector maps" <doi:10.5281/zenodo.13356457>. Methods in Ecology and Evolution. This method enables scientists to predict the values of spatially-structured environmental variables. Multiple types of pMEM are defined, each one implemented on the basis of spatial weighting function taking a range parameter, and sometimes also a shape parameter. The code's modular nature enables programers to implement new pMEM by defining new spatial weighting functions.
Implement a GAM-based (Generalized Additive Models) spatial surplus production model (spatial SPM), aimed at modeling northern shrimp population in Atlantic Canada but potentially to any stock in any location. The package is opinionated in its implementation of SPMs as it internally makes the choice to use penalized spatial gams with time lags. However, it also aims to provide options for the user to customize their model. The methods are described in Pedersen et al. (2022, <https://www.dfo-mpo.gc.ca/csas-sccs/Publications/ResDocs-DocRech/2022/2022_062-eng.html>).
We provide functionality to implement penalized PCA with an option to smooth the objective function using Nesterov smoothing. Two functions are available to compute a user-specified number of eigenvectors. The function unsmoothed_penalized_EV() computes a penalized PCA without smoothing and has three parameters (the input matrix, the Lasso penalty, and the number of desired eigenvectors). The function smoothed_penalized_EV() computes a smoothed penalized PCA using the same parameters and additionally requires the specification of a smoothing parameter. Both functions return a matrix having the desired eigenvectors as columns.
Monte Carlo sampling algorithms for semiparametric Bayesian regression analysis. These models feature a nonparametric (unknown) transformation of the data paired with widely-used regression models including linear regression, spline regression, quantile regression, and Gaussian processes. The transformation enables broader applicability of these key models, including for real-valued, positive, and compactly-supported data with challenging distributional features. The samplers prioritize computational scalability and, for most cases, Monte Carlo (not MCMC) sampling for greater efficiency. Details of the methods and algorithms are provided in Kowal and Wu (2024) <doi:10.1080/01621459.2024.2395586>.
Using The Free Evocation of Words Technique method with some functions, this package will make a social representation and other analysis. The Free Evocation of Words Technique consists of collecting a number of words evoked by a subject facing exposure to an inducer term. The purpose of this technique is to understand the relationships created between words evoked by the individual and the inducer term. This technique is included in the theory of social representations, therefore, on the information transmitted by an individual, seeks to create a profile that define a social group.
Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.
Protein Group Code Algorithm (PGCA) is a computationally inexpensive algorithm to merge protein summaries from multiple experimental quantitative proteomics data. The algorithm connects two or more groups with overlapping accession numbers. In some cases, pairwise groups are mutually exclusive but they may still be connected by another group (or set of groups) with overlapping accession numbers. Thus, groups created by PGCA from multiple experimental runs (i.e., global groups) are called "connected" groups. These identified global protein groups enable the analysis of quantitative data available for protein groups instead of unique protein identifiers.
This package provides methods for estimating the area under the concentration versus time curve (AUC) and its standard error in the presence of Below the Limit of Quantification (BLOQ) observations. Two approaches are implemented: direct estimation using censored maximum likelihood, and a two-step approach that first imputes BLOQ values using various methods and then computes the AUC using the imputed data. Technical details are described in Barnett et al. (2020), "Methods for Non-Compartmental Pharmacokinetic Analysis With Observations Below the Limit of Quantification," Statistics in Biopharmaceutical Research. <doi:10.1080/19466315.2019.1701546>.
Fits a constrained regression model for an ordinal response with ordinal predictors and possibly others, Espinosa and Hennig (2019) <DOI:10.1007/s11222-018-9842-2>. The parameter estimates associated with an ordinal predictor are constrained to be monotonic. If a monotonicity direction (isotonic or antitonic) is not specified for an ordinal predictor by the user, then one of the available methods will either establish it or drop the monotonicity assumption. Two monotonicity tests are also available to test the null hypothesis of monotonicity over a set of parameters associated with an ordinal predictor.
This package provides a comprehensive suite of genome-wide association study (GWAS) methods specifically designed for biobank-scale data. The package offers computationally efficient and robust association tests for time-to-event traits (e.g., Bi et al., 2020 <doi:10.1016/j.ajhg.2020.06.003>), ordinal categorical traits (e.g., Bi et al., 2021 <doi:10.1016/j.ajhg.2021.03.019>), and longitudinal traits (Xu et al., 2025 <doi:10.1038/s41467-025-56669-1>). Additionally, it includes functions for simulating genotype and phenotype data to support research and method development.
Train a Gaussian stochastic process model of an unknown function, possibly observed with error, via maximum likelihood or maximum a posteriori (MAP) estimation, run model diagnostics, and make predictions, following Sacks, J., Welch, W.J., Mitchell, T.J., and Wynn, H.P. (1989) "Design and Analysis of Computer Experiments", Statistical Science, <doi:10.1214/ss/1177012413>. Perform sensitivity analysis and visualize low-order effects, following Schonlau, M. and Welch, W.J. (2006), "Screening the Input Variables to a Computer Model Via Analysis of Variance and Visualization", <doi:10.1007/0-387-28014-6_14>.
This package provides a streamlined tool for eplet analysis of donor and recipient HLA (human leukocyte antigen) mismatch. Messy, low-resolution HLA typing data is cleaned, and imputed to high-resolution using the NMDP (National Marrow Donor Program) haplotype reference database <https://haplostats.org/haplostats>. High resolution data is analyzed for overall or single antigen eplet mismatch using a reference table (currently supporting HLAMatchMaker <http://www.epitopes.net> versions 2 and 3). Data can enter or exit the workflow at different points depending on the user's aims and initial data quality.
The oblique decision tree (ODT) uses linear combinations of predictors as partitioning variables in a decision tree. Oblique Decision Random Forest (ODRF) is an ensemble of multiple ODTs generated by feature bagging. Oblique Decision Boosting Tree (ODBT) applies feature bagging during the training process of ODT-based boosting trees to ensemble multiple boosting trees. All three methods can be used for classification and regression, and ODT and ODRF serve as supplements to the classical CART of Breiman (1984) <DOI:10.1201/9781315139470> and Random Forest of Breiman (2001) <DOI:10.1023/A:1010933404324> respectively.
Set of tools to fit a semi-parametric regression model suitable for analysis of data sets in which the response variable is continuous, strictly positive, asymmetric and possibly, censored. Under this setup, both the median and the skewness of the response variable distribution are explicitly modeled by using semi-parametric functions, whose non-parametric components may be approximated by natural cubic splines or P-splines. Supported distributions for the model error include log-normal, log-Student-t, log-power-exponential, log-hyperbolic, log-contaminated-normal, log-slash, Birnbaum-Saunders and Birnbaum-Saunders-t distributions.
Elaboration of vehicular emissions inventories, consisting in four stages, pre-processing activity data, preparing emissions factors, estimating the emissions and post-processing of emissions in maps and databases. More details in Ibarra-Espinosa et al (2018) <doi:10.5194/gmd-11-2209-2018>. Before using VEIN you need to know the vehicular composition of your study area, in other words, the combination of of type of vehicles, size and fuel of the fleet. Then, it is recommended to start with the project to download a template to create a structure of directories and scripts.
Estimates the standard and weighted Elo (WElo, Angelini et al., 2022 <doi:10.1016/j.ejor.2021.04.011>) rates. The current version provides Elo and WElo rates for tennis, according to different systems of weights (games or sets) and scale factors (constant, proportional to the number of matches, with more weight on Grand Slam matches or matches played on a specific surface). Moreover, the package gives the possibility of estimating the (bootstrap) standard errors for the rates. Finally, the package includes betting functions that automatically select the matches on which place a bet.