The FBED and mmpc variable selection algorithms have been implemented using the distance correlation. The references include: Tsamardinos I., Aliferis C. F. and Statnikov A. (2003). "Time and sample efficient discovery of Markovblankets and direct causal relations". In Proceedings of the ninth ACM SIGKDD international Conference. <doi:10.1145/956750.956838>. Borboudakis G. and Tsamardinos I. (2019). "Forward-backward selection with early dropping". Journal of Machine Learning Research, 20(8): 1--39. <doi:10.48550/arXiv.1705.10770>
. Huo X. and Szekely G.J. (2016). "Fast computing for distance covariance". Technometrics, 58(4): 435--447. <doi:10.1080/00401706.2015.1054435>.
Distributed Online Mean Tests is a powerful tool designed to efficiently process and analyze distributed datasets. It enables users to perform mean tests in an online, distributed manner, making it highly suitable for large-scale data analysis. By leveraging advanced computational techniques, Domean ensures robust and scalable solutions for statistical analysis, particularly in scenarios where data is dispersed across multiple nodes or sources. This package is ideal for researchers and practitioners working with high-dimensional data, providing a flexible and efficient framework for mean testing. The philosophy of Domean is described in Guo G.(2025) <doi:10.1016/j.physa.2024.130308>.
This package contains two functions that are intended to make tuning supervised learning methods easy. The eztune function uses a genetic algorithm or Hooke-Jeeves optimizer to find the best set of tuning parameters. The user can choose the optimizer, the learning method, and if optimization will be based on accuracy obtained through validation error, cross validation, or resubstitution. The function eztune.cv will compute a cross validated error rate. The purpose of eztune_cv is to provide a cross validated accuracy or MSE when resubstitution or validation data are used for optimization because error measures from both approaches can be misleading.
The kernelized version of principal component analysis (KPCA) has proven to be a valid nonlinear alternative for tackling the nonlinearity of biological sample spaces. However, it poses new challenges in terms of the interpretability of the original variables. kpcaIG
aims to provide a tool to select the most relevant variables based on the kernel PCA representation of the data as in Briscik et al. (2023) <doi:10.1186/s12859-023-05404-y>. It also includes functions for 2D and 3D visualization of the original variables (as arrows) into the kernel principal components axes, highlighting the contribution of the most important ones.
Performing impulse-response function (IRF) analysis of relevant variables of agent-based simulation models, in particular for models described in LSD format. Based on the data produced by the simulation model, it performs both linear and state-dependent IRF analysis, providing the tools required by the Counterfactual Monte Carlo (CMC) methodology (Amendola and Pereira (2024) <doi:10.2139/ssrn.4740360>), including state identification and sensitivity. CMC proposes retrieving the causal effect of shocks by exploiting the opportunity to directly observe the counterfactual in a fully controlled experimental setup. LSD (Laboratory for Simulation Development) is free software available at <https://www.labsimdev.org/>).
This package provides a framework that boosts the imputation of missForest
by Stekhoven, D.J. and Bühlmann, P. (2012) <doi:10.1093/bioinformatics/btr597> by harnessing parallel processing and through the fast Gradient Boosted Decision Trees (GBDT) implementation LightGBM
by Ke, Guolin et al.(2017) <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision>. misspi has the following main advantages: 1. Allows embrassingly parallel imputation on large scale data. 2. Accepts a variety of machine learning models as methods with friendly user portal. 3. Supports multiple initializations methods. 4. Supports early stopping that prohibits unnecessary iterations.
This package contains functions that help to determine event boundaries in event segmentation experiments by bootstrapping a critical segmentation magnitude under the null hypothesis that all key presses were randomly distributed across the experiment. Segmentation magnitude is defined as the sum of Gaussians centered at the times of the segmentation key presses performed by the participants. Within a participant, the maximum of the overlaid Gaussians is used to prevent an excessive influence of a single participant on the overall outcome (e.g. if a participant is pressing the key multiple times in succession). Further functions are included, such as plotting the results.
Clustering and classification inference for high dimension low sample size (HDLSS) data with U-statistics. The package contains implementations of nonparametric statistical tests for sample homogeneity, group separation, clustering, and classification of multivariate data. The methods have high statistical power and are tailored for data in which the dimension L is much larger than sample size n. See Gabriela B. Cybis, Marcio Valk and SÃ lvia RC Lopes (2018) <doi:10.1080/00949655.2017.1374387>, Marcio Valk and Gabriela B. Cybis (2020) <doi:10.1080/10618600.2020.1796398>, Debora Z. Bello, Marcio Valk and Gabriela B. Cybis (2021) <arXiv:2106.09115>
.
This package implements a new RNA-Seq analysis method and integrates two modules: a basic model for pairwise comparison and a linear model for complex design. RNA-Seq quantifies gene expression with reads count, which usually consists of conditions (or treatments) and several replicates for each condition. This software infers differential expression directly by the counts difference between conditions. It assumes that the sum counts difference between conditions follow a negative binomial distribution. In addition, ABSSeq
moderates the fold-changes by two steps: the expression level and gene-specific dispersion, that might facilitate the gene ranking by fold-change and visualization.
Interpretation of time series data is affected by model choices. Different models can give different or even contradicting estimates of patterns, trends, and mechanisms for the same data--a limitation alleviated by the Bayesian estimator of abrupt change,seasonality, and trend (BEAST) of this package. BEAST seeks to improve time series decomposition by forgoing the "single-best-model" concept and embracing all competing models into the inference via a Bayesian model averaging scheme. It is a flexible tool to uncover abrupt changes (i.e., change-points, breakpoints, structural breaks, or join-points), cyclic variations (e.g., seasonality), and nonlinear trends in time-series observations. BEAST not just tells when changes occur but also quantifies how likely the detected changes are true. It detects not just piecewise linear trends but also arbitrary nonlinear trends. BEAST is applicable to real-valued time series data of all kinds, be it for remote sensing, economics, climate sciences, ecology, and hydrology. Example applications include its use to identify regime shifts in ecological data, map forest disturbance and land degradation from satellite imagery, detect market trends in economic data, pinpoint anomaly and extreme events in climate data, and unravel system dynamics in biological data. Details on BEAST are reported in Zhao et al. (2019) <doi:10.1016/j.rse.2019.04.034>.
With this package you can run ConMET
locally in R. ConMET
is an R-shiny application that facilitates performing and evaluating confirmatory factor analyses (CFAs) and is useful for running and reporting typical measurement models in applied psychology and management journals. ConMET
automatically creates, compares and summarizes CFA models. Most common fit indices (E.g., CFI and SRMR) are put in an overview table. ConMET
also allows to test for common method variance. The application is particularly useful for teaching and instruction of measurement issues in survey research. The application uses the lavaan package (Rosseel, 2012) to run CFAs.
This is an add-on to the cna package <https://CRAN.R-project.org/package=cna> comprising various functions for optimizing consistency and coverage scores of models of configurational comparative methods as Coincidence Analysis (CNA) and Qualitative Comparative Analysis (QCA). The function conCovOpt()
calculates con-cov optima, selectMax()
selects con-cov maxima among the con-cov optima, DNFbuild()
can be used to build models actually reaching those optima, and findOutcomes()
identifies those factor values in analyzed data that can be modeled as outcomes. For a theoretical introduction to these functions see Baumgartner and Ambuehl (2021) <doi:10.1177/0049124121995554>.
This package performs emulation of dynamic simulators using Gaussian process via one-step ahead approach. The package implements a flexible framework for approximating time-dependent outputs from computationally expensive dynamic systems. It is specifically designed for nonlinear dynamic systems where full simulations may be costly. The underlying Gaussian process model accounts for temporal dependency through the one-step-ahead formulation, allowing for accurate emulation of complex dynamics. Hyperparameters are estimated via maximum likelihood. See Heo (2025, <doi:10.48550/arXiv.2503.20250>
) for exact method, and Mohammadi, Challenor, and Goodfellow (2019, <doi:10.1016/j.csda.2019.05.006>) for methodological details.
Define a spatial Area of Interest (AOI) around a constructed dam using hydrology data. Dams have environmental and social impacts, both positive and negative. Current analyses of dams have no consistent way to specify at what spatial extent we should evaluate these impacts. damAOI
implements methods to adjust reservoir polygons to match satellite-observed surface water areas, plot upstream and downstream rivers using elevation data and accumulated river flow, and draw buffers clipped by river basins around reservoirs and relevant rivers. This helps to consistently determine the areas which could be impacted by dam construction, facilitating comparative analysis and informed infrastructure investments.
The Jalaali calendar, also known as the Persian or Solar Hijri calendar, is the official calendar of Iran and Afghanistan. It starts on Nowruz, the spring equinox, and follows an astronomical system for determining leap years. Each year consists of 365 or 366 days, divided into 12 months. This package provides functions for converting dates between the Jalaali and Gregorian calendars. The conversion calculations are based on the work of Kazimierz M. Borkowski (1996) (<doi:10.1007/BF00055188>), who used an analytical model of Earth's motion to compute equinoxes from AD 550 to 3800 and determine leap years based on Tehran time.
This package provides computational tools for nonlinear longitudinal models, in particular the intrinsically nonlinear models, in four scenarios: (1) univariate longitudinal processes with growth factors, with or without covariates including time-invariant covariates (TICs) and time-varying covariates (TVCs); (2) multivariate longitudinal processes that facilitate the assessment of correlation or causation between multiple longitudinal variables; (3) multiple-group models for scenarios (1) and (2) to evaluate differences among manifested groups, and (4) longitudinal mixture models for scenarios (1) and (2), with an assumption that trajectories are from multiple latent classes. The methods implemented are introduced in Jin Liu (2023) <arXiv:2302.03237v2>
.
This package implements a class of univariate and multivariate spatio-network generalised linear mixed models for areal unit and network data, with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation. The response variable can be binomial, Gaussian, or Poisson. Spatial autocorrelation is modelled by a set of random effects that are assigned a conditional autoregressive (CAR) prior distribution following the Leroux model (Leroux et al. (2000) <doi:10.1007/978-1-4612-1284-3_4>). Network structures are modelled by a set of random effects that reflect a multiple membership structure (Browne et al. (2001) <doi:10.1177/1471082X0100100202>).
Slack <https://slack.com/> provides a service for teams to collaborate by sharing messages, images, links, files and more. Functions are provided that make it possible to interact with the Slack platform API'. When you need to share information or data from R, rather than resort to copy/ paste in e-mails or other services like Skype <https://www.skype.com/en/>, you can use this package to send well-formatted output from multiple R objects and expressions to all teammates at the same time with little effort. You can also send images from the current graphics device, R objects, and upload files.
For surface energy models and estimation of solar positions and components with varying topography, time and locations. The functions calculate solar top-of-atmosphere, open, diffuse and direct components, atmospheric transmittance and diffuse factors, day length, sunrise and sunset, solar azimuth, zenith, altitude, incidence, and hour angles, earth declination angle, equation of time, and solar constant. Details about the methods and equations are explained in Seyednasrollah, Bijan, Mukesh Kumar, and Timothy E. Link. On the role of vegetation density on net snow cover radiation at the forest floor. Journal of Geophysical Research: Atmospheres 118.15 (2013): 8359-8374, <doi:10.1002/jgrd.50575>.
Regression context for the Partial Least Squares framework for Extreme values. Estimations of the Shrinkage for Extreme Partial Least-Squares (SEPaLS
) estimators, an adaptation of the original Partial Least Squares (PLS) method tailored to the extreme-value framework. The SEPaLS
project is a joint work by Stephane Girard, Hadrien Lorenzo and Julyan Arbel. R code to replicate the results of the paper is available at <https://github.com/hlorenzo/SEPaLS_simus>
. Extremes within PLS was already studied by one of the authors, see M Bousebeta, G Enjolras, S Girard (2023) <doi:10.1016/j.jmva.2022.105101>.
Functions, classes and methods for time series modelling with ARIMA and related models. The aim of the package is to provide consistent interface for the user. For example, a single function autocorrelations()
computes various kinds of theoretical and sample autocorrelations. This is work in progress, see the documentation and vignettes for the current functionality. Function sarima()
fits extended multiplicative seasonal ARIMA models with trends, exogenous variables and arbitrary roots on the unit circle, which can be fixed or estimated (for the algebraic basis for this see <doi:10.48550/arXiv.2208.05055>
, a paper on the methodology is being prepared).
An easy-to-use and efficient tool to estimate infectious diseases parameters using serological data. Implemented models include SIR models (basic_sir_model()
, static_sir_model()
, mseir_model()
, sir_subpops_model()
), parametric models (polynomial_model()
, fp_model()
), nonparametric models (lp_model()
), semiparametric models (penalized_splines_model()
), hierarchical models (hierarchical_bayesian_model()
). The package is based on the book "Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective" (Hens, Niel & Shkedy, Ziv & Aerts, Marc & Faes, Christel & Damme, Pierre & Beutels, Philippe., 2013) <doi:10.1007/978-1-4614-4072-7>.
It allows for mapping proportions and indicators defined on the unit interval. It implements Beta-based small area methods comprising the classical Beta regression models, the Flexible Beta model and Zero and/or One Inflated extensions (Janicki 2020 <doi:10.1080/03610926.2019.1570266>). Such methods, developed within a Bayesian framework through Stan <https://mc-stan.org/>, come equipped with a set of diagnostics and complementary tools, visualizing and exporting functions. A Shiny application with a user-friendly interface can be launched to further simplify the process. For further details, refer to De Nicolò and Gardini (2024 <doi:10.18637/jss.v108.i01>).
This package provides a collection of functions useful in learning and practicing Item Response Theory (IRT), which can be combined into larger programs. It provides basic CTT analysis, a simple common interface to the estimation of item parameters in IRT models for binary responses with three different programs (ICL, BILOG-MG, and ltm), ability estimation (MLE, BME, EAP, WLE, plausible values), item and person fit statistics, scaling methods (MM, MS, Stocking-Lord, and the complete Hebaera method), and a rich array of parametric and non-parametric (kernel) plots. It estimates and plots Haberman's interaction model when all items are dichotomously scored.