Perform robust inference based on applying Fast and Robust Bootstrap on robust estimators (Van Aelst and Willems (2013) <doi:10.18637/jss.v053.i03>). This method constitutes an alternative to ordinary bootstrap or asymptotic inference. procedures when using robust estimators such as S-, MM- or GS-estimators. The available methods are multivariate regression, principal component analysis and one-sample and two-sample Hotelling tests. It provides both the robust point estimates and uncertainty measures based on the fast and robust bootstrap.
This package contains the framework of the estimation, sampling, and hypotheses testing for two special distributions (Exponentiated Exponential-Pareto and Exponentiated Inverse Gamma-Pareto) within the family of Generalized Exponentiated Composite distributions. The detailed explanation and the applications of these two distributions were introduced in Bowen Liu, Malwane M.A. Ananda (2022) <doi:10.1080/03610926.2022.2050399>, Bowen Liu, Malwane M.A. Ananda (2022) <doi:10.3390/math10111895>, and Bowen Liu, Malwane M.A. Ananda (2022) <doi:10.3390/app13010645>.
Reproducible work requires a record of where every statistic originated. When writing reports, some data is too big to load in the same environment and some statistics take a while to compute. This package offers a way to keep notes on statistics, simple functions, and small objects. Notepads can be locked to avoid accidental updates. Notepads keep track of who added the notes and when the notes were added. A simple text representation is used to allow for clear version histories.
This package provides tools for cleaning, processing, and preparing microbiome sequencing data (e.g., 16S rRNA) for downstream analysis. Supports CSV, TXT, and Excel file formats. The main function, ezclean(), automates microbiome data transformation, including format validation, transposition, numeric conversion, and metadata integration. It also handles taxonomic levels efficiently, resolves duplicated taxa entries, and outputs a well-structured, analysis-ready dataset. The companion functions ezstat() run statistical tests and summarize results, while ezviz() produces publication-ready visualizations.
This package provides functions to select samples using PPS (probability proportional to size) sampling. The package also includes a function for stratified simple random sampling, a function to compute joint inclusion probabilities for Sampford's method of PPS sampling, and a few utility functions. The user's guide pps-ug.pdf is included in the .../pps/doc directory. The methods are described in standard survey sampling theory books such as Cochran's "Sampling Techniques"; see the user's guide for references.
Evaluation of control charts by means of the zero-state, steady-state ARL (Average Run Length) and RL quantiles. Setting up control charts for given in-control ARL. The control charts under consideration are one- and two-sided EWMA, CUSUM, and Shiryaev-Roberts schemes for monitoring the mean or variance of normally distributed independent data. ARL calculation of the same set of schemes under drift (in the mean) are added. Eventually, all ARL measures for the multivariate EWMA (MEWMA) are provided.
This package provides a collection of tools and functions to adjust a variety of stochastic blockmodels (SBM). Supports at the moment Simple, Bipartite, Multipartite and Multiplex SBM (undirected or directed with Bernoulli, Poisson or Gaussian emission laws on the edges, and possibly covariate for Simple and Bipartite SBM). See Léger (2016) <doi:10.48550/arXiv.1602.07587>, Barbillon et al. (2020) <doi:10.1111/rssa.12193> and Bar-Hen et al. (2020) <doi:10.48550/arXiv.1807.10138>.
Node centrality measures for temporal networks. Available measures are temporal degree centrality, temporal closeness centrality and temporal betweenness centrality defined by Kim and Anderson (2012) <doi:10.1103/PhysRevE.85.026107>. Applying the REN algorithm by Hanke and Foraita (2017) <doi:10.1186/s12859-017-1677-x> when calculating the centrality measures keeps the computational running time linear in the number of graph snapshots. Further, all methods can run in parallel up to the number of nodes in the network.
HGC (short for Hierarchical Graph-based Clustering) is an R package for conducting hierarchical clustering on large-scale single-cell RNA-seq (scRNA-seq) data. The key idea is to construct a dendrogram of cells on their shared nearest neighbor (SNN) graph. HGC provides functions for building graphs and for conducting hierarchical clustering on the graph. The users with old R version could visit https://github.com/XuegongLab/HGC/tree/HGC4oldRVersion to get HGC package built for R 3.6.
This package contains a robust set of tools designed for constructing deep neural networks, which are highly adaptable with user-defined loss function and probability models. It includes several practical applications, such as the (deepAFT) model, which utilizes a deep neural network approach to enhance the accelerated failure time (AFT) model for survival data. Another example is the (deepGLM) model that applies deep neural network to the generalized linear model (glm), accommodating data types with continuous, categorical and Poisson distributions.
In the context of data quality assessment, this package provides a number of functions for evaluating data quality across various dimensions, including completeness, plausibility, concordance, conformance, currency, timeliness, and correctness. It has been developed based on two well-known frameworksâ Michael G. Kahn (2016) <doi: 10.13063/2327-9214.1244> and Nicole G. Weiskopf (2017) <doi: 10.5334/egems.218>â for data quality assessment. Using this package, users can evaluate the quality of their datasets, provided that corresponding metadata are available.
Runs a Shiny App in the local machine for basic statistical and graphical analyses. The point-and-click interface of Shiny App enables obtaining the same analysis outputs (e.g., plots and tables) more quickly, as compared with typing the required code in R, especially for users without much experience or expertise with coding. Examples of possible analyses include tabulating descriptive statistics for a variable, creating histograms by experimental groups, and creating a scatter plot and calculating the correlation between two variables.
This package provides a framework to build and evaluate diagnosis or prognosis models using stacking, voting, and bagging ensemble techniques with various base learners. The package also includes tools for visualization and interpretation of models. The development version of the package is available on GitHub at <https://github.com/xiaojie0519/E2E>. The methods are based on the foundational work of Breiman (1996) <doi:10.1007/BF00058655> on bagging and Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> on stacking.
Access to the datasets and many of the functions used in "Statistics Using R: An Integrative Approach". These datasets include a subset of the National Education Longitudinal Study, the Framingham Heart Study, as well as several simulated datasets used in the examples throughout the textbook. The functions included in the package reproduce some of the functionality of Stata that is not directly available in R'. The package also contains a tutorial on basic data frame management, including how to handle missing data.
R-dsb improves protein expression analysis in droplet-based single-cell studies. The package specifically addresses noise in raw protein UMI counts from methods like CITE-seq. It identifies and removes two main sources of noise—protein-specific noise from unbound antibodies and droplet/cell-specific noise. The package is applicable to various methods, including CITE-seq, REAP-seq, ASAP-seq, TEA-seq, and Mission Bioplatform data. Check the vignette for tutorials on integrating dsb with Seurat and Bioconductor, and using dsb in Python.
This package provides functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data, in particular representation, manipulation and simulation of multistate data - the Lexis suite of functions, which includes interfaces to the mstate, etm and cmprsk packages. It also contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
APL is a package developed for computation of Association Plots (AP), a method for visualization and analysis of single cell transcriptomics data. The main focus of APL is the identification of genes characteristic for individual clusters of cells from input data. The package performs correspondence analysis (CA) and allows to identify cluster-specific genes using Association Plots. Additionally, APL computes the cluster-specificity scores for all genes which allows to rank the genes by their specificity for a selected cell cluster of interest.
This package provides a complete toolkit for connecting R environments with Large Language Models (LLMs). Provides utilities for describing R objects, package documentation, and workspace state in plain text formats optimized for LLM consumption. Supports multiple workflows: interactive copy-paste to external chat interfaces, programmatic tool registration with ellmer chat clients, batteries-included chat applications via shinychat', and exposure to external coding agents through the Model Context Protocol. Project configuration files enable stable, repeatable conversations with project-specific context and preferred LLM settings.
The Bayesian Federated Inference ('BFI') method combines inference results obtained from local data sets in the separate centers. In this version of the package, the BFI methodology is programmed for linear, logistic and survival regression models. For GLMs, see Jonker, Pazira and Coolen (2024) <doi:10.1002/sim.10072>; for survival models, see Pazira, Massa, Weijers, Coolen and Jonker (2025) <doi:10.48550/arXiv.2404.17464>; and for heterogeneous populations, see Jonker, Pazira and Coolen (2025) <doi:10.1017/rsm.2025.6>.
It provides functions to compute the values of different modifications of the Rand and Wallace indices. The indices are used to measure the stability or similarity of two partitions obtained on two different sets of units with a non-empty intercept. Splitting and merging of clusters can (depends on the selected index) have a different effect on the value of the indices. The indices are proposed in Cugmas and Ferligoj (2018) <http://ibmi.mf.uni-lj.si/mz/2018/no-1/Cugmas2018.pdf>.
Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <doi:10.2140/astat.2023.14.37>, Barnhill et al. (2023) <doi:10.48550/arXiv.2303.02539>, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <doi:10.1007/s11538-024-01327-8>, Yoshida et al. (2022) <doi:10.1109/TCBB.2024.3420815>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.
This package contains functions for calculating the Federal Highway Administration (FHWA) Transportation Performance Management (TPM) performance measures. Currently, the package provides methods for the System Reliability and Freight (PM3) performance measures calculated from travel time data provided by The National Performance Management Research Data Set (NPMRDS), including Level of Travel Time Reliability (LOTTR), Truck Travel Time Reliability (TTTR), and Peak Hour Excessive Delay (PHED) metric scores for calculating statewide reliability performance measures. Implements <https://www.fhwa.dot.gov/tpm/guidance/pm3_hpms.pdf>.
This package provides a model for the growth of self-limiting populations using three, four, or five parameter functions, which have wide applications in a variety of fields. The dependent variable in a dynamical modeling could be the population size at time x, where x is the independent variable. In the analysis of quantitative polymerase chain reaction (qPCR), the dependent variable would be the fluorescence intensity and the independent variable the cycle number. This package then would calculate the TWW cycle threshold.
Computes the exact observation weights for the Kalman filter and smoother, based on the method described in Koopman and Harvey (2003) <www.sciencedirect.com/science/article/pii/S0165188902000611>. The package supports in-depth exploration of state-space models, enabling researchers and practitioners to extract meaningful insights from time series data. This functionality is especially valuable in dynamic factor models, where the computed weights can be used to decompose the contributions of individual variables to the latent factors. See the README file for examples.