Collision Risk Models for avian fauna (seabird and migratory birds) at offshore wind farms. The base deterministic model is derived from Band (2012) <https://tethys.pnnl.gov/publications/using-collision-risk-model-assess-bird-collision-risks-offshore-wind-farms>. This was further expanded on by Masden (2015) <doi:10.7489/1659-1> and code used here is heavily derived from this work with input from Dr A. Cook at the British Trust for Ornithology. These collision risk models are useful for marine ornithologists who are working in the offshore wind industry, particularly in UK waters. However, many of the species included in the stochastic collision risk models can also be found in the North Atlantic in the United States and Canada, and could be applied there.
Practitioners of Bayesian statistics often use Markov chain Monte Carlo (MCMC) samplers to sample from a posterior distribution. This package determines whether the MCMC sample is large enough to yield reliable estimates of the target distribution. In particular, this calculates a Gelman-Rubin convergence diagnostic using stable and consistent estimators of Monte Carlo variance. Additionally, this uses the connection between an MCMC sample's effective sample size and the Gelman-Rubin diagnostic to produce a threshold for terminating MCMC simulation. Finally, this informs the user whether enough samples have been collected and (if necessary) estimates the number of samples needed for a desired level of accuracy. The theory underlying these methods can be found in "Revisiting the Gelman-Rubin Diagnostic" by Vats and Knudson (2018) <arXiv:1812:09384>.
Facilitates probabilistic record linkage between infectious disease surveillance datasets (notifiable disease registers, outbreak line-lists), vaccination registries, and hospitalization records using methods based on Fellegi and Sunter (1969) <doi:10.1080/01621459.1969.10501049> and Sayers et al. (2016) <doi:10.1093/ije/dyv322>. The package provides core functions for data preparation, linkage, and analysis: clean_the_nest() standardizes variable names and formats across heterogeneous datasets; murmuration() performs machine learning-based record linkage using blocking variables and similarity metrics; molting() deidentifies datasets for secure sharing; homing() re-identifies previously deidentified datasets; plumage() identifies and categorizes comorbidities; and preening() creates analysis-ready variables including age categories and temporal groupings. Designed for epidemiological research linking acute and post-acute disease outcomes to vaccination status and healthcare utilization. Supports multiple linkage scenarios including case-to-vaccination, case-to-hospitalization, and event-based vaccination status determination (e.g., outbreak attendees, flight passengers, exposure site visitors).
This package provides multiple sources of stopwords, for use in text analysis and natural language processing.
This package provides a consistently well behaved method of interpolation based on piecewise rational functions using Stineman's algorithm.
Interfaces the stepcount Python module <https://github.com/OxWearables/stepcount> to estimate step counts and other activities from accelerometry data.
Efficiently estimates treatment effects in settings with randomized staggered rollouts, using tools proposed by Roth and Sant'Anna (2023) <doi:10.48550/arXiv.2102.01291>.
This package performs multiple testing corrections that take specific structure of hypotheses into account, as described in Sankaran & Holmes (2014) <doi:10.18637/jss.v059.i13>.
Quantify stratigraphic disorder using the metrics defined by Burgess (2016) <doi:10.2110/jsr.2016.10>. Contains a range of utility tools to construct and manipulate stratigraphic columns.
Tree-structured modelling of categorical predictors (Tutz and Berger (2018), <doi:10.1007/s11634-017-0298-6>) or measurement units (Berger and Tutz (2018), <doi:10.1080/10618600.2017.1371030>).
Model Selection Based on Combined Penalties. This package implements a stepwise forward variable selection algorithm based on a penalized likelihood criterion that combines the L0 with L2 or L1 norms.
Non-parametric test, originally proposed by Stute (1997) <https://www.jstor.org/stable/2242560>, that the expectation of a dependent variable Y given an independent variable D is linear in D.
This package can automatically extract statistical null-hypothesis significant testing (NHST) results from articles and recompute the p-values based on the reported test statistic and degrees of freedom to detect possible inconsistencies.
This package produces LaTeX code, HTML/CSS code and ASCII text for well-formatted tables that hold regression analysis results from several models side-by-side, as well as summary statistics.
Interface for data stream clustering algorithms implemented in the MOA (Massive Online Analysis) framework (Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer (2010). MOA: Massive Online Analysis, Journal of Machine Learning Research 11: 1601-1604).
Datasets for the textbook Stat2: Modeling with Regression and ANOVA (second edition). The package also includes data for the first edition, Stat2: Building Models for a World of Data and a few functions for plotting diagnostics.
This package performs simulation and inference of diffusion processes on circle. Stochastic correlation models based on circular diffusion models are provided. For details see Majumdar, S. and Laha, A.K. (2024) "Diffusion on the circle and a stochastic correlation model" <doi:10.48550/arXiv.2412.06343>.
This package provides fitting functions and other tools for decision confidence and metacognition researchers, including meta-d'/d', often considered to be the gold standard to measure metacognitive efficiency, and information-theoretic measures of metacognition. Also allows to fit and compare several static models of decision making and confidence.
Integration of two data sources referred to the same target population which share a number of variables. Some functions can also be used to impute missing values in data sets through hot deck imputation methods. Methods to perform statistical matching when dealing with data from complex sample surveys are available too.
This package provides indices and tools for directed acyclic graphs (DAGs), particularly DAG representations of intermittent streams. A detailed introduction to the package can be found in the publication: "Non-perennial stream networks as directed acyclic graphs: The R-package streamDAG" (Aho et al., 2023) <doi:10.1016/j.envsoft.2023.105775>, and in the introductory package vignette.
This is a collection of various kinds of data with broad uses for teaching. My students, and academics like me who teach the same topics I teach, should find this useful if their teaching workflow is also built around the R programming language. The applications are multiple but mostly cluster on topics of statistical methodology, international relations, and political economy.
Fits semiparametric linear and multilevel models with non-parametric additive Bayesian additive regression tree (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) components and Stan (Stan Development Team (2021) <https://mc-stan.org/>) sampled parametric ones. Multilevel models can be expressed using lme4 syntax (Bates, Maechler, Bolker, and Walker (2015) <doi:10.18637/jss.v067.i01>).
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.
Work with containers over the Docker API. Rather than using system calls to interact with a docker client, using the API directly means that we can receive richer information from docker. The interface in the package is automatically generated using the OpenAPI (a.k.a., swagger') specification, and all return values are checked in order to make them type stable.