Reproducible work requires a record of where every statistic originated. When writing reports, some data is too big to load in the same environment and some statistics take a while to compute. This package offers a way to keep notes on statistics, simple functions, and small objects. Notepads can be locked to avoid accidental updates. Notepads keep track of who added the notes and when the notes were added. A simple text representation is used to allow for clear version histories.
This package provides tools for cleaning, processing, and preparing microbiome sequencing data (e.g., 16S rRNA
) for downstream analysis. Supports CSV, TXT, and Excel file formats. The main function, ezclean()
, automates microbiome data transformation, including format validation, transposition, numeric conversion, and metadata integration. It also handles taxonomic levels efficiently, resolves duplicated taxa entries, and outputs a well-structured, analysis-ready dataset. The companion functions ezstat()
run statistical tests and summarize results, while ezviz()
produces publication-ready visualizations.
This package provides functions to select samples using PPS (probability proportional to size) sampling. The package also includes a function for stratified simple random sampling, a function to compute joint inclusion probabilities for Sampford's method of PPS sampling, and a few utility functions. The user's guide pps-ug.pdf is included in the .../pps/doc directory. The methods are described in standard survey sampling theory books such as Cochran's "Sampling Techniques"; see the user's guide for references.
Evaluation of control charts by means of the zero-state, steady-state ARL (Average Run Length) and RL quantiles. Setting up control charts for given in-control ARL. The control charts under consideration are one- and two-sided EWMA, CUSUM, and Shiryaev-Roberts schemes for monitoring the mean or variance of normally distributed independent data. ARL calculation of the same set of schemes under drift (in the mean) are added. Eventually, all ARL measures for the multivariate EWMA (MEWMA) are provided.
This package provides a collection of tools and functions to adjust a variety of stochastic blockmodels (SBM). Supports at the moment Simple, Bipartite, Multipartite and Multiplex SBM (undirected or directed with Bernoulli, Poisson or Gaussian emission laws on the edges, and possibly covariate for Simple and Bipartite SBM). See Léger (2016) <doi:10.48550/arXiv.1602.07587>
, Barbillon et al. (2020) <doi:10.1111/rssa.12193> and Bar-Hen et al. (2020) <doi:10.48550/arXiv.1807.10138>
.
Node centrality measures for temporal networks. Available measures are temporal degree centrality, temporal closeness centrality and temporal betweenness centrality defined by Kim and Anderson (2012) <doi:10.1103/PhysRevE.85.026107>
. Applying the REN algorithm by Hanke and Foraita (2017) <doi:10.1186/s12859-017-1677-x> when calculating the centrality measures keeps the computational running time linear in the number of graph snapshots. Further, all methods can run in parallel up to the number of nodes in the network.
Runs a Shiny App in the local machine for basic statistical and graphical analyses. The point-and-click interface of Shiny App enables obtaining the same analysis outputs (e.g., plots and tables) more quickly, as compared with typing the required code in R, especially for users without much experience or expertise with coding. Examples of possible analyses include tabulating descriptive statistics for a variable, creating histograms by experimental groups, and creating a scatter plot and calculating the correlation between two variables.
Access to the datasets and many of the functions used in "Statistics Using R: An Integrative Approach". These datasets include a subset of the National Education Longitudinal Study, the Framingham Heart Study, as well as several simulated datasets used in the examples throughout the textbook. The functions included in the package reproduce some of the functionality of Stata that is not directly available in R'. The package also contains a tutorial on basic data frame management, including how to handle missing data.
The Bayesian Federated Inference ('BFI') method combines inference results obtained from local data sets in the separate centers. In this version of the package, the BFI methodology is programmed for linear, logistic and survival regression models. For GLMs, see Jonker, Pazira and Coolen (2024) <doi:10.1002/sim.10072>; for survival models, see Pazira, Massa, Weijers, Coolen and Jonker (2025) <doi:10.48550/arXiv.2404.17464>
; and for heterogeneous populations, see Jonker, Pazira and Coolen (2025) <doi:10.1017/rsm.2025.6>.
It provides functions to compute the values of different modifications of the Rand and Wallace indices. The indices are used to measure the stability or similarity of two partitions obtained on two different sets of units with a non-empty intercept. Splitting and merging of clusters can (depends on the selected index) have a different effect on the value of the indices. The indices are proposed in Cugmas and Ferligoj (2018) <http://ibmi.mf.uni-lj.si/mz/2018/no-1/Cugmas2018.pdf>.
This package provides a model for the growth of self-limiting populations using three, four, or five parameter functions, which have wide applications in a variety of fields. The dependent variable in a dynamical modeling could be the population size at time x, where x is the independent variable. In the analysis of quantitative polymerase chain reaction (qPCR
), the dependent variable would be the fluorescence intensity and the independent variable the cycle number. This package then would calculate the TWW cycle threshold.
Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <doi:10.2140/astat.2023.14.37>, Barnhill et al. (2023) <doi:10.48550/arXiv.2303.02539>
, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <doi:10.1007/s11538-024-01327-8>, Yoshida et al. (2022) <doi:10.1109/TCBB.2024.3420815>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.
This package contains functions for calculating the Federal Highway Administration (FHWA) Transportation Performance Management (TPM) performance measures. Currently, the package provides methods for the System Reliability and Freight (PM3) performance measures calculated from travel time data provided by The National Performance Management Research Data Set (NPMRDS), including Level of Travel Time Reliability (LOTTR), Truck Travel Time Reliability (TTTR), and Peak Hour Excessive Delay (PHED) metric scores for calculating statewide reliability performance measures. Implements <https://www.fhwa.dot.gov/tpm/guidance/pm3_hpms.pdf>.
Computes the exact observation weights for the Kalman filter and smoother, based on the method described in Koopman and Harvey (2003) <www.sciencedirect.com/science/article/pii/S0165188902000611>. The package supports in-depth exploration of state-space models, enabling researchers and practitioners to extract meaningful insights from time series data. This functionality is especially valuable in dynamic factor models, where the computed weights can be used to decompose the contributions of individual variables to the latent factors. See the README file for examples.
R-dsb improves protein expression analysis in droplet-based single-cell studies. The package specifically addresses noise in raw protein UMI counts from methods like CITE-seq. It identifies and removes two main sources of noise—protein-specific noise from unbound antibodies and droplet/cell-specific noise. The package is applicable to various methods, including CITE-seq, REAP-seq, ASAP-seq, TEA-seq, and Mission Bioplatform data. Check the vignette for tutorials on integrating dsb with Seurat and Bioconductor, and using dsb in Python.
This package provides functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data, in particular representation, manipulation and simulation of multistate data - the Lexis suite of functions, which includes interfaces to the mstate
, etm
and cmprsk
packages. It also contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
This package provides a testing framework for testing the multivariate point null hypothesis. A testing framework described in Elder et al. (2022) <arXiv:2203.01897>
to test the multivariate point null hypothesis. After the user selects a parameter of interest and defines the assumed data generating mechanism, this information should be encoded in functions for the parameter estimator and its corresponding influence curve. Some parameter and data generating mechanism combinations have codings in this package, and are explained in detail in the article.
This code provides a method to fit the hidden compact representation model as well as to identify the causal direction on discrete data. We implement an effective solution to recover the above hidden compact representation under the likelihood framework. Please see the Causal Discovery from Discrete Data using Hidden Compact Representation from NIPS 2018 by Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang and Zhifeng Hao (2018) <https://nips.cc/Conferences/2018/Schedule?showEvent=11274>
for a description of some of our methods.
Variable selection techniques are essential tools for model selection and estimation in high-dimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) (Fan and Lv (2008)<doi:10.1111/j.1467-9868.2008.00674.x>) and all of its variants in generalized linear models (Fan and Song (2009)<doi:10.1214/10-AOS798>) and the Cox proportional hazards model (Fan, Feng and Wu (2010)<doi:10.1214/10-IMSCOLL606>).
Uniform Error Index is the weighted average of different error measures. Uniform Error Index utilizes output from different error function and gives more robust and stable error values. This package has been developed to compute Uniform Error Index from ten different loss function like Error Square, Square of Square Error, Quasi Likelihood Error, LogR-Square
, Absolute Error, Absolute Square Error etc. The weights are determined using Principal Component Analysis (PCA) algorithm of Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
This package provides functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for "water quality" and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter.
This package provides fast and easy access to German census grid data from the 2011 and 2022 censuses <https://www.zensus2022.de/>, including a wide range of socio-economic indicators at multiple spatial resolutions (100m, 1km, 10km). Enables efficient download, processing, and analysis of large census datasets covering population, households, families, dwellings, and buildings. Harmonized data structures allow direct comparison with the 2011 census, supporting temporal and spatial analyses. Facilitates conversion of data into common formats for spatial analysis and mapping ('terra', sf', ggplot2').
This package provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census, including SF1 (Decennial short-form), SF3 (Decennial long-form), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs
objects. The package provides new methods to conduct standard operations on acs
objects and present/plot data in statistically appropriate ways.
This package provides a new method for interpretable heterogeneous treatment effects characterization in terms of decision rules via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing high stability in the discovery. It relies on a two-stage pseudo-outcome regression, and it is supported by theoretical convergence guarantees. Bargagli-Stoffi, F. J., Cadei, R., Lee, K., & Dominici, F. (2023) Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. arXiv
preprint <doi:10.48550/arXiv.2009.09036>
.