The fossil record is a joint expression of ecological, taphonomic, evolutionary, and stratigraphic processes (Holland and Patzkowsky, 2012, ISBN:978-0226649382). This package allowing to simulate biological processes in the time domain (e.g., trait evolution, fossil abundance, phylogenetic trees), and examine how their expression in the rock record (stratigraphic domain) is influenced based on age-depth models, ecological niche models, and taphonomic effects. Functions simulating common processes used in modeling trait evolution, biostratigraphy or event type data such as first/last occurrences are provided and can be used standalone or as part of a pipeline. The package comes with example data sets and tutorials in several vignettes, which can be used as a template to set up one's own simulation.
This package provides a toolkit for common statistical analyses including descriptive statistics, Student's t-tests (one-sample, independent, and paired), one-way and two-way Analysis of Variance (ANOVA), chi-square tests, correlation analysis, and simple linear regression. Each function automatically interprets results in plain English, reporting effect sizes (Cohen's d, eta-squared, Cramer's V, R-squared), confidence intervals, and p-value interpretations. Post-hoc Tukey Honestly Significant Difference (HSD) tests are automatically applied following significant ANOVA results. A master function automatically detects the appropriate test based on the structure of the input data. Methods are based on Cohen, J. (1988) <doi:10.4324/9780203771587>, Tukey, J. W. (1949) <doi:10.2307/3001913>, and Shapiro and Wilk (1965) <doi:10.2307/2333709>.
Collision Risk Models for avian fauna (seabird and migratory birds) at offshore wind farms. The base deterministic model is derived from Band (2012) <https://tethys.pnnl.gov/publications/using-collision-risk-model-assess-bird-collision-risks-offshore-wind-farms>. This was further expanded on by Masden (2015) <doi:10.7489/1659-1> and code used here is heavily derived from this work with input from Dr A. Cook at the British Trust for Ornithology. These collision risk models are useful for marine ornithologists who are working in the offshore wind industry, particularly in UK waters. However, many of the species included in the stochastic collision risk models can also be found in the North Atlantic in the United States and Canada, and could be applied there.
Practitioners of Bayesian statistics often use Markov chain Monte Carlo (MCMC) samplers to sample from a posterior distribution. This package determines whether the MCMC sample is large enough to yield reliable estimates of the target distribution. In particular, this calculates a Gelman-Rubin convergence diagnostic using stable and consistent estimators of Monte Carlo variance. Additionally, this uses the connection between an MCMC sample's effective sample size and the Gelman-Rubin diagnostic to produce a threshold for terminating MCMC simulation. Finally, this informs the user whether enough samples have been collected and (if necessary) estimates the number of samples needed for a desired level of accuracy. The theory underlying these methods can be found in "Revisiting the Gelman-Rubin Diagnostic" by Vats and Knudson (2018) <arXiv:1812:09384>.
Facilitates probabilistic record linkage between infectious disease surveillance datasets (notifiable disease registers, outbreak line-lists), vaccination registries, and hospitalization records using methods based on Fellegi and Sunter (1969) <doi:10.1080/01621459.1969.10501049> and Sayers et al. (2016) <doi:10.1093/ije/dyv322>. The package provides core functions for data preparation, linkage, and analysis: clean_the_nest() standardizes variable names and formats across heterogeneous datasets; murmuration() performs machine learning-based record linkage using blocking variables and similarity metrics; molting() deidentifies datasets for secure sharing; homing() re-identifies previously deidentified datasets; plumage() identifies and categorizes comorbidities; and preening() creates analysis-ready variables including age categories and temporal groupings. Designed for epidemiological research linking acute and post-acute disease outcomes to vaccination status and healthcare utilization. Supports multiple linkage scenarios including case-to-vaccination, case-to-hospitalization, and event-based vaccination status determination (e.g., outbreak attendees, flight passengers, exposure site visitors).
This package provides multiple sources of stopwords, for use in text analysis and natural language processing.
This package provides a consistently well behaved method of interpolation based on piecewise rational functions using Stineman's algorithm.
Interfaces the stepcount Python module <https://github.com/OxWearables/stepcount> to estimate step counts and other activities from accelerometry data.
Efficiently estimates treatment effects in settings with randomized staggered rollouts, using tools proposed by Roth and Sant'Anna (2023) <doi:10.48550/arXiv.2102.01291>.
This package performs multiple testing corrections that take specific structure of hypotheses into account, as described in Sankaran & Holmes (2014) <doi:10.18637/jss.v059.i13>.
Calculates enrolment, graduation, dropout, and programme-switch indicators from the Dutch higher education registration data (1CHO) supplied by DUO. Includes an interactive Shiny dashboard for exploring results.
Quantify stratigraphic disorder using the metrics defined by Burgess (2016) <doi:10.2110/jsr.2016.10>. Contains a range of utility tools to construct and manipulate stratigraphic columns.
Tree-structured modelling of categorical predictors (Tutz and Berger (2018), <doi:10.1007/s11634-017-0298-6>) or measurement units (Berger and Tutz (2018), <doi:10.1080/10618600.2017.1371030>).
Model Selection Based on Combined Penalties. This package implements a stepwise forward variable selection algorithm based on a penalized likelihood criterion that combines the L0 with L2 or L1 norms.
Non-parametric test, originally proposed by Stute (1997) <https://www.jstor.org/stable/2242560>, that the expectation of a dependent variable Y given an independent variable D is linear in D.
This package can automatically extract statistical null-hypothesis significant testing (NHST) results from articles and recompute the p-values based on the reported test statistic and degrees of freedom to detect possible inconsistencies.
This package produces LaTeX code, HTML/CSS code and ASCII text for well-formatted tables that hold regression analysis results from several models side-by-side, as well as summary statistics.
Interface for data stream clustering algorithms implemented in the MOA (Massive Online Analysis) framework (Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer (2010). MOA: Massive Online Analysis, Journal of Machine Learning Research 11: 1601-1604).
Datasets for the textbook Stat2: Modeling with Regression and ANOVA (second edition). The package also includes data for the first edition, Stat2: Building Models for a World of Data and a few functions for plotting diagnostics.
This package performs simulation and inference of diffusion processes on circle. Stochastic correlation models based on circular diffusion models are provided. For details see Majumdar, S. and Laha, A.K. (2024) "Diffusion on the circle and a stochastic correlation model" <doi:10.48550/arXiv.2412.06343>.
This package provides fitting functions and other tools for decision confidence and metacognition researchers, including meta-d'/d', often considered to be the gold standard to measure metacognitive efficiency, and information-theoretic measures of metacognition. Also allows to fit and compare several static models of decision making and confidence.
Integration of two data sources referred to the same target population which share a number of variables. Some functions can also be used to impute missing values in data sets through hot deck imputation methods. Methods to perform statistical matching when dealing with data from complex sample surveys are available too.
This package provides a future backend that enables seamless execution of parallel R workloads on Amazon Web Services ('AWS', <https://aws.amazon.com>), including EC2 and Fargate'. staRburst handles environment synchronization, data transfer, quota management, and worker orchestration automatically, allowing users to scale from local execution to 100+ cloud workers with a single line of code change.
This package provides exact analytical algorithms for computing optimum sample allocations in stratified sampling. Supports classical Neyman-Tschuprow allocation, minimum-cost allocation under a variance constraint, and multi-domain allocation with controlled precision. Handles lower and upper bounds, cost constraints, and multiple domains. Includes helper functions for variance computation, allocation summaries, rounding, and example datasets for testing and benchmarking.