An implementation of an algorithm for outlier detection that can handle a) data with a mixed categorical and continuous variables, b) many columns of data, c) many rows of data, d) outliers that mask other outliers, and e) both unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, HDoutliers is based on a distributional model that uses probabilities to determine outliers.
This package provides tools are provided to streamline Bayesian analyses in JAGS using the jagsUI package. Included are functions for extracting output in simpler format, functions for streamlining assessment of convergence, and functions for producing summary plots of output. Also included is a function that provides a simple template for running JAGS from R'. Referenced materials can be found at <DOI:10.1214/ss/1177011136>.
Using this package, one can determine the minimum sample size required so that the mean square error of the sample mean and the population mean of a distribution becomes less than some pre-determined epsilon, i.e. it helps the user to determine the minimum sample size required to attain the pre-fixed precision level by minimizing the difference between the sample mean and population mean.
This package provides a class for multi-companion matrices with methods for arithmetic and factorization. A method for generation of multi-companion matrices with prespecified spectral properties is provided, as well as some utilities for periodically correlated and multivariate time series models. See Boshnakov (2002) <doi:10.1016/S0024-3795(01)00475-X> and Boshnakov & Iqelan (2009) <doi:10.1111/j.1467-9892.2009.00617.x>.
This package implements methodologies for modelling interval data by Normal and Skew-Normal distributions, considering appropriate parameterizations of the variance-covariance matrix that takes into account the intrinsic nature of interval data, and lead to four different possible configuration structures. The Skew-Normal parameters can be estimated by maximum likelihood, while Normal parameters may be estimated by maximum likelihood or robust trimmed maximum likelihood methods.
Sequential outlier identification for Gaussian mixture models using the distribution of Mahalanobis distances. The optimal number of outliers is chosen based on the dissimilarity between the theoretical and observed distributions of the scaled squared sample Mahalanobis distances. Also includes an extension for Gaussian linear cluster-weighted models using the distribution of studentized residuals. Doherty, McNicholas, and White (2025) <doi:10.48550/arXiv.2505.11668>.
This package provides a customizable timer widget for shiny applications. Key features include countdown and count-up mode, multiple display formats (including simple seconds, minutes-seconds, hours-minutes-seconds, and minutes-seconds-centiseconds), ability to pause, resume, and reset the timer. shinytimer widget can be particularly useful for creating interactive and time-sensitive applications, tracking session times, setting time limits for tasks or quizzes, and more.
We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs). tidypopgen scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory. The full functionalities of the package are described in Carter et al. (2025) <doi:10.1111/2041-210x.70204>.
The R language includes a set of defined types, but the language itself is "absurdly dynamic" (Turcotte & Vitek (2019) <doi:10.1145/3340670.3342426>), and lacks any way to specify which types are expected by any expression. The typetracer package enables code to be traced to extract detailed information on the properties of parameters passed to R functions. typetracer can trace individual functions or entire packages.
The LSTM (Long Short-Term Memory) model is a Recurrent Neural Network (RNN) based architecture that is widely used for time series forecasting. Customizable configurations for the model are allowed, improving the capabilities and usability of this model compared to other packages. This package is based on keras and tensorflow modules and the algorithm of Paul and Garai (2021) <doi:10.1007/s00500-021-06087-4>.
Despite there being a section in RFC 7231 <https://tools.ietf.org/html/rfc7231#section-5.5.3> defining a suggested structure for User-Agent headers this data is notoriously difficult to parse consistently. Tools are provided that will take in user agent strings and return structured R objects. This is a V8'-backed package based on the ua-parser project <https://github.com/ua-parser>.
PaleoClim <http://www.paleoclim.org> (Brown et al. 2019, <doi:10.1038/sdata.2018.254>) is a set of free, high resolution paleoclimate surfaces covering the whole globe. It includes data on surface temperature, precipitation and the standard bioclimatic variables commonly used in ecological modelling, derived from the HadCM3 general circulation model and downscaled to a spatial resolution of up to 2.5 minutes. Simulations are available for key time periods from the Late Holocene to mid-Pliocene. Data on current and Last Glacial Maximum climate is derived from CHELSA (Karger et al. 2017, <doi:10.1038/sdata.2017.122>) and reprocessed by PaleoClim to match their format; it is available at up to 30 seconds resolution. This package provides a simple interface for downloading PaleoClim data in R, with support for caching and filtering retrieved data by period, resolution, and geographic extent.
This package facilitates phyloseq exploration and analysis of taxonomic profiling data. This package provides tools for the manipulation, statistical analysis, and visualization of taxonomic profiling data. In addition to targeted case-control studies, microbiome facilitates scalable exploration of population cohorts. This package supports the independent phyloseq data format and expands the available toolkit in order to facilitate the standardization of the analyses and the development of best practices.
The package provides ready to use epigenomes (obtained from TWGBS) and transcriptomes (RNA-seq) from various tissues as obtained in the study (Delacher and Imbusch 2017, PMID: 28783152). Regulatory T cells (Treg cells) perform two distinct functions: they maintain self-tolerance, and they support organ homeostasis by differentiating into specialized tissue Treg cells. The underlying dataset characterises the epigenetic and transcriptomic modifications for specialized tissue Treg cells.
This package provides a shiny application to assess statistical assumptions and guide users toward appropriate tests. The app is designed for researchers with minimal statistical training and provides diagnostics, plots, and test recommendations for a wide range of analyses. Many statistical assumptions are implemented using the package rstatix (Kassambara, 2019) <doi:10.32614/CRAN.package.rstatix> and performance (Lüdecke et al., 2021) <doi:10.21105/joss.03139>.
This package contains functions for testing for significant differences between multiple coefficients of variation. Includes Feltz and Miller's (1996) <DOI:10.1002/(SICI)1097-0258(19960330)15:6%3C647::AID-SIM184%3E3.0.CO;2-P> asymptotic test and Krishnamoorthy and Lee's (2014) <DOI:10.1007/s00180-013-0445-2> modified signed-likelihood ratio test. See the vignette for more, including full details of citations.
This package provides functions to download, process, and visualize German geospatial data across administrative levels, including states, districts, and municipalities. Supports interactive tables and customized maps using built-in or external datasets. Official shapefiles are accessed from the German Federal Agency for Cartography and Geodesy (BKG) <https://gdz.bkg.bund.de/>, licensed under dl-de/by-2-0 <https://www.govdata.de/dl-de/by-2-0>.
This package provides four addons for analyzing trends and unit roots in financial time series: (i) functions for the density and probability of the augmented Dickey-Fuller Test, (ii) functions for the density and probability of MacKinnon's unit root test statistics, (iii) reimplementations for the ADF and MacKinnon Test, and (iv) an urca Unit Root Test Interface for Pfaff's unit root test suite.
This package contains a set of tools for constructing and coercing into and from the "mdate" class. This date class implements ISO 8601-2:2019(E) and allows regular dates to be annotated to express unspecified date components, approximate or uncertain date components, date ranges, and sets of dates. This is useful for describing and analysing temporal information, whether historical or recent, where date precision may vary.
This package provides functions and datasets to support Smilde, Næs and Liland (2021, ISBN: 978-1-119-60096-1) "Multiblock Data Fusion in Statistics and Machine Learning - Applications in the Natural and Life Sciences". This implements and imports a large collection of methods for multiblock data analysis with common interfaces, result- and plotting functions, several real data sets and six vignettes covering a range different applications.
Matching with string distance has never been easier! messy.cats contains various functions that employ string distance tools in order to make data management easier for users working with categorical data. Categorical data, especially user inputted categorical data that often tends to be plagued by typos, can be difficult to work with. messy.cats aims to provide functions that make cleaning categorical data simple and easy.
An implementation of the ternary plot for interpreting regression coefficients of trinomial regression models, as proposed in Santi, Dickson and Espa (2019) <doi:10.1080/00031305.2018.1442368>. Ternary plots can be drawn using either ggtern package (based on ggplot2') or Ternary package (based on standard graphics). The package and its features are illustrated in Santi, Dickson, Espa and Giuliani (2022) <doi:10.18637/jss.v103.c01>.
Scale alignment is a new procedure for rescaling dimensions of between-items multidimensional Rasch family models so that dimensions scores can be compared directly (Feuerstahler & Wilson, 2019; under review) <doi:10.1111/jedm.12209>. This package includes functions for implementing delta-dimensional alignment (DDA) and logistic regression alignment (LRA) for dichotomous or polytomous data. This function also includes a wrapper for models fit using the TAM package.
Transform complex statistical output into straightforward, understandable, and context-aware natural language descriptions using Large Language Models (LLMs), making complex analyses more accessible to individuals with varying statistical expertise. It relies on the ellmer package to interface with LLM providers including OpenAI <https://openai.com/>, Google AI Studio <https://aistudio.google.com/>, and Anthropic <https://www.anthropic.com/> (API keys are required and managed via ellmer').