Structure mining from XGBoost and LightGBM
models. Key functionalities of this package cover: visualisation of tree-based ensembles models, identification of interactions, measuring of variable importance, measuring of interaction importance, explanation of single prediction with break down plots (based on xgboostExplainer
and iBreakDown
packages). To download the LightGBM
use the following link: <https://github.com/Microsoft/LightGBM>
. EIX is a part of the DrWhy.AI
universe.
R interface for H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection
, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML
).
These datasets and functions accompany Wolfe and Schneider (2017) - Intuitive Introductory Statistics (ISBN: 978-3-319-56070-0) <doi:10.1007/978-3-319-56072-4>. They are used in the examples throughout the text and in the end-of-chapter exercises. The datasets are meant to cover a broad range of topics in order to appeal to the diverse set of interests and backgrounds typically present in an introductory Statistics class.
Used for general multiple mediation analysis. The analysis method is described in Yu and Li (2022) (ISBN: 9780367365479) "Statistical Methods for Mediation, Confounding and Moderation Analysis Using R and SAS", published by Chapman and Hall/CRC; and Yu et al.(2017) <DOI:10.1016/j.sste.2017.02.001> "Exploring racial disparity in obesity: a mediation analysis considering geo-coded environmental factors", published on Spatial and Spatio-temporal Epidemiology, 21, 13-23.
Calibrate and apply multivariate bias correction algorithms for climate model simulations of multiple climate variables. Three methods described by Cannon (2016) <doi:10.1175/JCLI-D-15-0679.1> and Cannon (2018) <doi:10.1007/s00382-017-3580-6> are implemented â (i) MBC Pearson correlation (MBCp), (ii) MBC rank correlation (MBCr), and (iii) MBC N-dimensional PDF transform (MBCn) â as is the Rank Resampling for Distributions and Dependences (R2D2) method.
This package provides functionality for structural equation modeling for the social relations model (Kenny & La Voie, 1984; <doi:10.1016/S0065-2601(08)60144-6>; Warner, Kenny, & Soto, 1979, <doi:10.1037/0022-3514.37.10.1742>). Maximum likelihood estimation (Gill & Swartz, 2001, <doi:10.2307/3316080>; Nestler, 2018, <doi:10.3102/1076998617741106>) and least squares estimation is supported (Bond & Malloy, 2018, <doi:10.1016/B978-0-12-811967-9.00014-X>).
The TWN-list (Taxa Waterbeheer Nederland) is the Dutch standard for naming taxons in Dutch Watermanagement. This package makes it easier to use the TWN-list for ecological analyses. It consists of two parts. First it makes the TWN-list itself available in R. Second, it has a few functions that make it easy to perform some basic and often recurring tasks for checking and consulting taxonomic data from the TWN-list.
This package provides a collection of interactive shiny applications for performing comprehensive analyses in the field of tree breeding and genetics. The package is designed to assist users in visualizing and interpreting experimental data through a user-friendly interface. Each application is launched via a simple function, and users can upload data in Excel format for analysis. For more information, refer to Singh, R.K. and Chaudhary, B.D. (1977, ISBN:9788176633079).
An implementation of the additive heredity model for the mixture-of-mixtures experiments of Shen et al. (2019) in Technometrics <doi:10.1080/00401706.2019.1630010>. The additive heredity model considers an additive structure to inherently connect the major components with the minor components. The additive heredity model has a meaningful interpretation for the estimated model because of the hierarchical and heredity principles applied and the nonnegative garrote technique used for variable selection.
This package provides functions are provided to fit temporal lag models to dynamic networks. The models are build on top of exponential random graph models (ERGM) framework. There are functions for simulating or forecasting networks for future time points. Abhirup Mallik & Zack W. Almquist (2019) Stable Multiple Time Step Simulation/Prediction From Lagged Dynamic Network Regression Models, Journal of Computational and Graphical Statistics, 28:4, 967-979, <DOI: 10.1080/10618600.2019.1594834>.
Collection of datasets as prepared by Profs. A.P. Gore, S.A. Paranjape, and M.B. Kulkarni of Department of Statistics, Poona University, India. With their permission, first letter of their names forms the name of this package, the package has been built by me and made available for the benefit of R users. This collection requires a rich class of models and can be a very useful building block for a beginner.
This package provides interpretable high-dimensional mean comparison methods (HMC). For example, users can apply these methods to assess the difference in gene expression between two treatment groups. It is not a gene-by-gene comparison. Instead, the methods focus on the interplay between features and identify those that are predictive of the group label. The tests are valid frequentist procedures and yield sparse estimates indicating which features contribute to the group differences.
An interface to the algorithms of Interpretable AI <https://www.interpretable.ai> from the R programming language. Interpretable AI provides various modules, including Optimal Trees for classification, regression, prescription and survival analysis, Optimal Imputation for missing data imputation and outlier detection, and Optimal Feature Selection for exact sparse regression. The iai package is an open-source project. The Interpretable AI software modules are proprietary products, but free academic and evaluation licenses are available.
An adaptation of Kernelized Stein Discrepancy, this package provides a goodness-of-fit test of whether a given i.i.d. sample is drawn from a given distribution. It works for any distribution once its score function (the derivative of log-density) can be provided. This method is based on "A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation" by Liu, Lee, and Jordan, available at <arXiv:1602.03253>
.
Pattern Sequence Based Forecasting (PSF) takes univariate time series data as input and assist to forecast its future values. This algorithm forecasts the behavior of time series based on similarity of pattern sequences. Initially, clustering is done with the labeling of samples from database. The labels associated with samples are then used for forecasting the future behaviour of time series data. The further technical details and references regarding PSF are discussed in Vignette.
This package provides drop-in replacements for the base system2()
function with fine control and consistent behavior across platforms. It supports clean interruption, timeout, background tasks, and streaming STDIN / STDOUT / STDERR over binary or text connections. The package also provides functions for evaluating expressions inside a temporary fork. Such evaluations have no side effects on the main R process, and support reliable interrupts and timeouts. This provides the basis for a sandboxing mechanism.
Derived from the work of Kruschke (2015, <ISBN:9780124058880>), the present package aims to provide a framework for conducting Bayesian analysis using Markov chain Monte Carlo (MCMC) sampling utilizing the Just Another Gibbs Sampler ('JAGS', Plummer, 2003, <https://mcmc-jags.sourceforge.io>). The initial version includes several modules for conducting Bayesian equivalents of chi-squared tests, analysis of variance (ANOVA), multiple (hierarchical) regression, softmax regression, and for fitting data (e.g., structural equation modeling).
This package provides a new methodology for linear regression with both curve response and curve regressors, which is described in Cho, Goude, Brossat and Yao (2013) <doi:10.1080/01621459.2012.722900> and (2015) <doi:10.1007/978-3-319-18732-7_3>. The key idea behind this methodology is dimension reduction based on a singular value decomposition in a Hilbert space, which reduces the curve regression problem to several scalar linear regression problems.
This package implements various decision support tools related to the Econometrics & Technometrics. Subroutines include correlation reliability test, Mahalanobis distance measure for outlier detection, combinatorial search (all possible subset regression), non-parametric efficiency analysis measures: DDF (directional distance function), DEA (data envelopment analysis), HDF (hyperbolic distance function), SBM (slack-based measure), and SF (shortage function), benchmarking, Malmquist productivity analysis, risk analysis, technology adoption model, new product target setting, network DEA, dynamic DEA, intertemporal budgeting, etc.
Efficiently estimate shape parameters of periodic time series imagery with which a statistical seasonal trend analysis (STA) is subsequently performed. STA output can be exported in conventional raster formats. Methods to visualize STA output are also implemented as well as the calculation of additional basic statistics. STA is based on (R. Eastman, F. Sangermano, B. Ghimire, H. Zhu, H. Chen, N. Neeti, Y. Cai, E. Machado and S. Crema, 2009) <doi:10.1080/01431160902755338>.
This package provides a new measure of similarity between a pair of mass spectrometry (MS) experiments, called truncated rank correlation (TRC). To provide a robust metric of similarity in noisy high-dimensional data, TRC uses truncated top ranks (or top m-ranks) for calculating correlation. Truncated rank correlation as a robust measure of test-retest reliability in mass spectrometry data. For more details see Lim et al. (2019) <doi:10.1515/sagmb-2018-0056>.
Implementation of target-controlled infusion algorithms for compartmental pharmacokinetic and pharmacokinetic-pharmacodynamic models. Jacobs (1990) <doi:10.1109/10.43622>; Marsh et al. (1991) <doi:10.1093/bja/67.1.41>; Shafer and Gregg (1993) <doi:10.1007/BF01070999>; Schnider et al. (1998) <doi:10.1097/00000542-199805000-00006>; Abuhelwa, Foster, and Upton (2015) <doi:10.1016/j.vascn.2015.03.004>; Eleveld et al. (2018) <doi:10.1016/j.bja.2018.01.018>.
Variance function estimation for models proposed by W. Sadler in his variance function program ('VFP', www.aacb.asn.au/AACB/Resources/Variance-Function-Program). Here, the idea is to fit multiple variance functions to a data set and consequently assess which function reflects the relationship Var ~ Mean best. For in-vitro diagnostic ('IVD') assays modeling this relationship is of great importance when individual test-results are used for defining follow-up treatment of patients.
This package provides methods to calculate the expected value of information from a decision-analytic model. This includes the expected value of perfect information (EVPI), partial perfect information (EVPPI) and sample information (EVSI), and the expected net benefit of sampling (ENBS). A range of alternative computational methods are provided under the same user interface. See Heath et al. (2024) <doi:10.1201/9781003156109>, Jackson et al. (2022) <doi:10.1146/annurev-statistics-040120-010730>.