This is a collection of various kinds of data with broad uses for teaching. My students, and academics like me who teach the same topics I teach, should find this useful if their teaching workflow is also built around the R programming language. The applications are multiple but mostly cluster on topics of statistical methodology, international relations, and political economy.
Fits semiparametric linear and multilevel models with non-parametric additive Bayesian additive regression tree (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) components and Stan (Stan Development Team (2021) <https://mc-stan.org/>) sampled parametric ones. Multilevel models can be expressed using lme4 syntax (Bates, Maechler, Bolker, and Walker (2015) <doi:10.18637/jss.v067.i01>).
Work with containers over the Docker API. Rather than using system calls to interact with a docker client, using the API directly means that we can receive richer information from docker. The interface in the package is automatically generated using the OpenAPI (a.k.a., swagger') specification, and all return values are checked in order to make them type stable.
These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.
Import data from the STATcube REST API or from the open data portal of Statistics Austria. This package includes a client for API requests as well as parsing utilities for data which originates from STATcube'. Documentation about STATcubeR is provided by several vignettes included in the package as well as on the public pkgdown page at <https://statistikat.github.io/STATcubeR/>.
This package implements the Staggered Synthetic Control (SSC) method for estimating treatment effects in panel data with staggered adoption, as proposed by Cao, Lu, and Wu (2020) <doi:10.48550/arXiv.1912.06320>. Constructs synthetic control weights via constrained quadratic programming, estimates heterogeneous treatment effects and event-time average treatment effects on the treated (ATT), and provides placebo-in-time confidence intervals and p-values.
This package provides functions related to multivariate measures of independence and ICA: -estimate independent components by minimizing distance covariance; -conduct a test of mutual independence based on distance covariance; -estimate independent components via infomax (a popular method but generally performs poorer than mdcovica, ProDenICA, and/or fastICA, but is useful for comparisons); -order indepedent components by skewness; -match independent components from multiple estimates; -other functions useful in ICA.
Computes confidence intervals for variance using the Chi-Square distribution, without requiring raw data. Wikipedia (2025) <https://en.wikipedia.org/wiki/Chi-squared_distribution>. All-in-One Chi Distribution CI provides functions to calculate confidence intervals for the population variance based on a chi-squared distribution, utilizing a sample variance and sample size. It offers only a simple all-in-one method for quick calculations to find the CI for Chi Distribution.
The steepness package computes steepness as a property of dominance hierarchies. Steepness is defined as the absolute slope of the straight line fitted to the normalized David's scores. The normalized David's scores can be obtained on the basis of dyadic dominance indices corrected for chance or by means of proportions of wins. Given an observed sociomatrix, it computes hierarchy's steepness and estimates statistical significance by means of a randomization test.
It estimates the parameters of spatio-temporal models with censored or missing data using the SAEM algorithm (Delyon et al., 1999). This algorithm is a stochastic approximation of the widely used EM algorithm and is particularly valuable for models in which the E-step lacks a closed-form expression. It also provides a function to compute the observed information matrix using the method developed by Louis (1982). To assess the performance of the fitted model, case-deletion diagnostics are provided.
This package contains statistical methods to analyze graphs, such as graph parameter estimation, model selection based on the Graph Information Criterion, statistical tests to discriminate two or more populations of graphs, correlation between graphs, and clustering of graphs. References: Takahashi et al. (2012) <doi:10.1371/journal.pone.0049949>, Fujita et al. (2017) <doi:10.3389/fnins.2017.00066>, Fujita et al. (2017) <doi:10.1016/j.csda.2016.11.016>, Fujita et al. (2019) <doi:10.1093/comnet/cnz028>.
This package implements the S-type estimators, novel robust estimators for general linear regression models, addressing challenges such as outlier contamination and leverage points. This package introduces robust regression techniques to provide a robust alternative to classical methods and includes diagnostic tools for assessing model fit and performance. The methodology is based on the study, "Comparison of the Robust Methods in the General Linear Regression Model" by Sazak and Mutlu (2023). This package is designed for statisticians and applied researchers seeking advanced tools for robust regression analysis.
For making Trellis-type conditioning plots without strip labels. This is useful for displaying the structure of results from factorial designs and other studies when many conditioning variables would clutter the display with layers of redundant strip labels. Settings of the variables are encoded by layout and spacing in the trellis array and decoded by a separate legend. The functionality is implemented by a single S3 generic strucplot() function that is a wrapper for the Lattice package's xyplot() function. This allows access to all Lattice graphics capabilities in the usual way.
Detrending multivariate time-series to approximate stationarity when dealing with intensive longitudinal data, prior to Vector Autoregressive (VAR) or multilevel-VAR estimation. Classical VAR assumes weak stationarity (constant first two moments), and deterministic trends inflate spurious autocorrelation, biasing Granger-causality and impulse-response analyses. All functions operate on raw panel data and write detrended columns back to the data set, but differ in the level at which the trend is estimated. See, for instance, Wang & Maxwell (2015) <doi:10.1037/met0000030>; Burger et al. (2022) <doi:10.4324/9781003111238-13>; Epskamp et al. (2018) <doi:10.1177/2167702617744325>.
This package provides functionalities for performing stability analysis of genotype by environment interaction (GEI) to identify superior and stable genotypes across diverse environments. It implements Eberhart and Russellâ s ANOVA method (1966)(<doi:10.2135/cropsci1966.0011183X000600010011x>), Finlay and Wilkinsonâ s Joint Linear Regression method (1963) (<doi:10.1071/AR9630742>), Wrickeâ s Ecovalence (1962, 1964), Shuklaâ s stability variance parameter (1972) (<doi:10.1038/hdy.1972.87>), Kangâ s simultaneous selection for high yield and stability (1991) (<doi:10.2134/agronj1991.00021962008300010037x>), Additive Main Effects and Multiplicative Interaction (AMMI) method and Genotype plus Genotypes by Environment (GGE) Interaction methods.
The cartogram heatmaps generated by the included methods are an alternative to choropleth maps for the United States and are based on work by the Washington Post graphics department in their report on "The states most threatened by trade" (<http://www.washingtonpost.com/wp-srv/special/business/states-most-threatened-by-trade/>). "State bins" preserve as much of the geographic placement of the states as possible but have the look and feel of a traditional heatmap. Functions are provided that allow for use of a binned, discrete scale, a continuous scale or manually specified colors depending on what is needed for the underlying data.
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) (Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285>) for supervised learning and Bayesian Causal Forests (BCF) (Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195>) for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers. Includes the grow-from-root algorithm for accelerated forest sampling (He and Hahn (2021) <doi:10.1080/01621459.2021.1942012>), a log-linear leaf model for forest-based heteroskedasticity (Murray (2020) <doi:10.1080/01621459.2020.1813587>), and the cloglog BART model of Alam and Linero (2025) <doi:10.48550/arXiv.2502.00606> for ordinal outcomes.
This package implements confidence interval and sample size methods that are especially useful in psychological research. The methods can be applied in 1-group, 2-group, paired-samples, and multiple-group designs and to a variety of parameters including means, medians, proportions, slopes, standardized mean differences, standardized linear contrasts of means, plus several measures of correlation and association. Confidence interval and sample size functions are given for single parameters as well as differences, ratios, and linear contrasts of parameters. The sample size functions can be used to approximate the sample size needed to estimate a parameter or function of parameters with desired confidence interval precision or to perform a variety of hypothesis tests (directional two-sided, equivalence, superiority, noninferiority) with desired power. For details see: Statistical Methods for Psychologists, Volumes 1 â 4, <https://dgbonett.sites.ucsc.edu/>.
This package provides string parsing functionalities for generating plotnames, filenames and paths.
This package provides utilities to create or suppress start-up messages.
Allows the creation and manipulation of C++ std::vector's in R.
Support for reading and writing files in StatDataML---an XML-based data exchange format.
This package provides an extendable, performant and multithreaded alt-string implementation backed by C++ vectors and strings.
This package provides density, probability and quantile functions, and random number generation for (skew) stable distributions, using the parametrizations of Nolan.