The production of certified reference materials (CRMs) requires various statistical tests depending on the task and recorded data to ensure that reported values of CRMs are appropriate. Often these tests are performed according to the procedures described in ISO GUIDE 35:2017'. The eCerto package contains a Shiny app which provides functionality to load, process, report and backup data recorded during CRM production and facilitates following the recommended procedures. It is described in Lisec et al (2023) <doi:10.1007/s00216-023-05099-3> and can also be accessed online <https://apps.bam.de/shn00/eCerto/> without package installation.
Cellular responses to perturbations are highly heterogeneous and depend largely on the initial state of cells. Connecting post-perturbation cells via cellular trajectories to untreated cells (e.g. by leveraging metabolic labeling information) enables exploitation of intercellular heterogeneity as a combined knock-down and overexpression screen to identify pathway modulators, termed Heterogeneity-seq (see Berg et al <doi:10.1101/2024.10.28.620481>). This package contains functions to generate cellular trajectories based on scSLAM-seq (single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing) time courses, functions to identify pathway modulators and to visualize the results.
Combine probabilistic forecasts using CRPS learning algorithms proposed in Berrisch, Ziel (2021) <doi:10.48550/arXiv.2102.00968> <doi:10.1016/j.jeconom.2021.11.008>. The package implements multiple online learning algorithms like Bernstein online aggregation; see Wintenberger (2014) <doi:10.48550/arXiv.1404.1356>. Quantile regression is also implemented for comparison purposes. Model parameters can be tuned automatically with respect to the loss of the forecast combination. Methods like predict(), update(), plot() and print() are available for convenience. This package utilizes the optim C++ library for numeric optimization <https://github.com/kthohr/optim>.
This package provides a collection of functions useful in learning and practicing Item Response Theory (IRT), which can be combined into larger programs. It provides basic CTT analysis, a simple common interface to the estimation of item parameters in IRT models for binary responses with three different programs (ICL, BILOG-MG, and ltm), ability estimation (MLE, BME, EAP, WLE, plausible values), item and person fit statistics, scaling methods (MM, MS, Stocking-Lord, and the complete Hebaera method), and a rich array of parametric and non-parametric (kernel) plots. It estimates and plots Haberman's interaction model when all items are dichotomously scored.
The FBED and mmpc variable selection algorithms have been implemented using the distance correlation. The references include: Tsamardinos I., Aliferis C. F. and Statnikov A. (2003). "Time and sample efficient discovery of Markovblankets and direct causal relations". In Proceedings of the ninth ACM SIGKDD international Conference. <doi:10.1145/956750.956838>. Borboudakis G. and Tsamardinos I. (2019). "Forward-backward selection with early dropping". Journal of Machine Learning Research, 20(8): 1--39. <doi:10.48550/arXiv.1705.10770>. Huo X. and Szekely G.J. (2016). "Fast computing for distance covariance". Technometrics, 58(4): 435--447. <doi:10.1080/00401706.2015.1054435>.
Distributed Online Mean Tests is a powerful tool designed to efficiently process and analyze distributed datasets. It enables users to perform mean tests in an online, distributed manner, making it highly suitable for large-scale data analysis. By leveraging advanced computational techniques, Domean ensures robust and scalable solutions for statistical analysis, particularly in scenarios where data is dispersed across multiple nodes or sources. This package is ideal for researchers and practitioners working with high-dimensional data, providing a flexible and efficient framework for mean testing. The philosophy of Domean is described in Guo G.(2025) <doi:10.1016/j.physa.2024.130308>.
This package contains two functions that are intended to make tuning supervised learning methods easy. The eztune function uses a genetic algorithm or Hooke-Jeeves optimizer to find the best set of tuning parameters. The user can choose the optimizer, the learning method, and if optimization will be based on accuracy obtained through validation error, cross validation, or resubstitution. The function eztune.cv will compute a cross validated error rate. The purpose of eztune_cv is to provide a cross validated accuracy or MSE when resubstitution or validation data are used for optimization because error measures from both approaches can be misleading.
Implementation of the Stochastic Expectation Maximisation (StEM) approach to Record Linkage described in the paper by K. Robach, S. L. van der Pas, M. A. van de Wiel and M. H. Hof (2024, <doi:10.1093/jrsssc/qlaf016>); see citation("FlexRL") for details. This is a record linkage method, for finding the common set of records among 2 data sources based on Partially Identifying Variables (PIVs) available in both sources. It includes modelling of dynamic Partially Identifying Variables (e.g. postal code) that may evolve over time and registration errors (missing values and mistakes in the registration). Low memory footprint.
The kernelized version of principal component analysis (KPCA) has proven to be a valid nonlinear alternative for tackling the nonlinearity of biological sample spaces. However, it poses new challenges in terms of the interpretability of the original variables. kpcaIG aims to provide a tool to select the most relevant variables based on the kernel PCA representation of the data as in Briscik et al. (2023) <doi:10.1186/s12859-023-05404-y>. It also includes functions for 2D and 3D visualization of the original variables (as arrows) into the kernel principal components axes, highlighting the contribution of the most important ones.
Performing impulse-response function (IRF) analysis of relevant variables of agent-based simulation models, in particular for models described in LSD format. Based on the data produced by the simulation model, it performs both linear and state-dependent IRF analysis, providing the tools required by the Counterfactual Monte Carlo (CMC) methodology (Amendola and Pereira (2024) <doi:10.2139/ssrn.4740360>), including state identification and sensitivity. CMC proposes retrieving the causal effect of shocks by exploiting the opportunity to directly observe the counterfactual in a fully controlled experimental setup. LSD (Laboratory for Simulation Development) is free software available at <https://www.labsimdev.org/>).
This package provides a framework that boosts the imputation of missForest by Stekhoven, D.J. and Bühlmann, P. (2012) <doi:10.1093/bioinformatics/btr597> by harnessing parallel processing and through the fast Gradient Boosted Decision Trees (GBDT) implementation LightGBM by Ke, Guolin et al.(2017) <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision>. misspi has the following main advantages: 1. Allows embrassingly parallel imputation on large scale data. 2. Accepts a variety of machine learning models as methods with friendly user portal. 3. Supports multiple initializations methods. 4. Supports early stopping that prohibits unnecessary iterations.
This package contains functions that help to determine event boundaries in event segmentation experiments by bootstrapping a critical segmentation magnitude under the null hypothesis that all key presses were randomly distributed across the experiment. Segmentation magnitude is defined as the sum of Gaussians centered at the times of the segmentation key presses performed by the participants. Within a participant, the maximum of the overlaid Gaussians is used to prevent an excessive influence of a single participant on the overall outcome (e.g. if a participant is pressing the key multiple times in succession). Further functions are included, such as plotting the results.
Clustering and classification inference for high dimension low sample size (HDLSS) data with U-statistics. The package contains implementations of nonparametric statistical tests for sample homogeneity, group separation, clustering, and classification of multivariate data. The methods have high statistical power and are tailored for data in which the dimension L is much larger than sample size n. See Gabriela B. Cybis, Marcio Valk and SÃ lvia RC Lopes (2018) <doi:10.1080/00949655.2017.1374387>, Marcio Valk and Gabriela B. Cybis (2020) <doi:10.1080/10618600.2020.1796398>, Debora Z. Bello, Marcio Valk and Gabriela B. Cybis (2021) <arXiv:2106.09115>.
This is an add-on to the cna package <https://CRAN.R-project.org/package=cna> comprising various functions for optimizing consistency and coverage scores of models of configurational comparative methods as Coincidence Analysis (CNA) and Qualitative Comparative Analysis (QCA). The function conCovOpt() calculates con-cov optima, selectMax() selects con-cov maxima among the con-cov optima, DNFbuild() can be used to build models actually reaching those optima, and findOutcomes() identifies those factor values in analyzed data that can be modeled as outcomes. For a theoretical introduction to these functions see Baumgartner and Ambuehl (2021) <doi:10.1177/0049124121995554>.
With this package you can run ConMET locally in R. ConMET is an R-shiny application that facilitates performing and evaluating confirmatory factor analyses (CFAs) and is useful for running and reporting typical measurement models in applied psychology and management journals. ConMET automatically creates, compares and summarizes CFA models. Most common fit indices (E.g., CFI and SRMR) are put in an overview table. ConMET also allows to test for common method variance. The application is particularly useful for teaching and instruction of measurement issues in survey research. The application uses the lavaan package (Rosseel, 2012) to run CFAs.
The D-score summarizes a child's performance on developmental milestones into a single number. Its key feature is its generic nature. The method does not depend on a specific measurement instrument. The statistical method underlying the D-score is described in van Buuren et al. (2025) <doi:10.1177/01650254241294033>. This package implements model keys to convert milestone scores to D-scores; maps instrument-specific item names to a generic 9-position naming convention; computes D-scores and their precision from a child's milestone scores; and converts D-scores to Development-for-Age Z-scores (DAZ) using age-conditional reference standards.
Define a spatial Area of Interest (AOI) around a constructed dam using hydrology data. Dams have environmental and social impacts, both positive and negative. Current analyses of dams have no consistent way to specify at what spatial extent we should evaluate these impacts. damAOI implements methods to adjust reservoir polygons to match satellite-observed surface water areas, plot upstream and downstream rivers using elevation data and accumulated river flow, and draw buffers clipped by river basins around reservoirs and relevant rivers. This helps to consistently determine the areas which could be impacted by dam construction, facilitating comparative analysis and informed infrastructure investments.
The Jalaali calendar, also known as the Persian or Solar Hijri calendar, is the official calendar of Iran and Afghanistan. It starts on Nowruz, the spring equinox, and follows an astronomical system for determining leap years. Each year consists of 365 or 366 days, divided into 12 months. This package provides functions for converting dates between the Jalaali and Gregorian calendars. The conversion calculations are based on the work of Kazimierz M. Borkowski (1996) (<doi:10.1007/BF00055188>), who used an analytical model of Earth's motion to compute equinoxes from AD 550 to 3800 and determine leap years based on Tehran time.
This package provides computational tools for nonlinear longitudinal models, in particular the intrinsically nonlinear models, in four scenarios: (1) univariate longitudinal processes with growth factors, with or without covariates including time-invariant covariates (TICs) and time-varying covariates (TVCs); (2) multivariate longitudinal processes that facilitate the assessment of correlation or causation between multiple longitudinal variables; (3) multiple-group models for scenarios (1) and (2) to evaluate differences among manifested groups, and (4) longitudinal mixture models for scenarios (1) and (2), with an assumption that trajectories are from multiple latent classes. The methods implemented are introduced in Jin Liu (2023) <arXiv:2302.03237v2>.
Implementation of Probabilistic Regression Trees (PRTree), providing functions for model fitting and prediction, with specific adaptations to handle missing values. The main computations are implemented in Fortran for high efficiency. The package is based on the PRTree methodology described in Alkhoury et al. (2020), "Smooth and Consistent Probabilistic Regression Trees" <https://proceedings.neurips.cc/paper_files/paper/2020/file/8289889263db4a40463e3f358bb7c7a1-Paper.pdf>. Details on the treatment of missing data and implementation aspects are presented in Prass, T.S.; Neimaier, A.S.; Pumi, G. (2025), "Handling Missing Data in Probabilistic Regression Trees: Methods and Implementation in R" <doi:10.48550/arXiv.2510.03634>.
Functions, classes and methods for time series modelling with ARIMA and related models. The aim of the package is to provide consistent interface for the user. For example, a single function autocorrelations() computes various kinds of theoretical and sample autocorrelations. This is work in progress, see the documentation and vignettes for the current functionality. Function sarima() fits extended multiplicative seasonal ARIMA models with trends, exogenous variables and arbitrary roots on the unit circle, which can be fixed or estimated (for the algebraic basis for this see <doi:10.48550/arXiv.2208.05055>, a paper on the methodology is being prepared).
Regression context for the Partial Least Squares framework for Extreme values. Estimations of the Shrinkage for Extreme Partial Least-Squares (SEPaLS) estimators, an adaptation of the original Partial Least Squares (PLS) method tailored to the extreme-value framework. The SEPaLS project is a joint work by Stephane Girard, Hadrien Lorenzo and Julyan Arbel. R code to replicate the results of the paper is available at <https://github.com/hlorenzo/SEPaLS_simus>. Extremes within PLS was already studied by one of the authors, see M Bousebeta, G Enjolras, S Girard (2023) <doi:10.1016/j.jmva.2022.105101>.
Slack <https://slack.com/> provides a service for teams to collaborate by sharing messages, images, links, files and more. Functions are provided that make it possible to interact with the Slack platform API'. When you need to share information or data from R, rather than resort to copy/ paste in e-mails or other services like Skype <https://www.skype.com/en/>, you can use this package to send well-formatted output from multiple R objects and expressions to all teammates at the same time with little effort. You can also send images from the current graphics device, R objects, and upload files.
An easy-to-use and efficient tool to estimate infectious diseases parameters using serological data. Implemented models include SIR models (basic_sir_model(), static_sir_model(), mseir_model(), sir_subpops_model()), parametric models (polynomial_model(), fp_model()), nonparametric models (lp_model()), semiparametric models (penalized_splines_model()), hierarchical models (hierarchical_bayesian_model()). The package is based on the book "Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective" (Hens, Niel & Shkedy, Ziv & Aerts, Marc & Faes, Christel & Damme, Pierre & Beutels, Philippe., 2013) <doi:10.1007/978-1-4614-4072-7>.