This package provides tools for constructing a matched design with multiple comparison groups. Further specifications of refined covariate balance restriction and exact match on covariate can be imposed. Matches are approximately optimal in the sense that the cost of the solution is at most twice the optimal cost, Crama and Spieksma (1992) <doi:10.1016/0377-2217(92)90078-N>, Karmakar, Small and Rosenbaum (2019) <doi:10.1080/10618600.2019.1584900>.
Developer oriented utility functions designed to be used as the building blocks of R packages that work with ArcGIS Location Services. It provides functionality for authorization, Esri JSON construction and parsing, as well as other utilities pertaining to geometry and Esri type conversions. To support ArcGIS Pro users, authorization can be done via arcgisbinding'. Installation instructions for arcgisbinding can be found at <https://developers.arcgis.com/r-bridge/installation/>.
Generate ground truth cases for object localization algorithms. Cycle through a list of images, select points around which to generate bounding boxes and assign classifiers. Output the coordinates, and images annotated with boxes and labels. For an example study that uses bounding boxes for image localization and classification see Ibrahim, Badr, Abdallah, and Eissa (2012) "Bounding Box Object Localization Based on Image Superpixelization" <doi:10.1016/j.procs.2012.09.119>.
It fits linear regression models for censored spatial data. It provides different estimation methods as the SAEM (Stochastic Approximation of Expectation Maximization) algorithm and seminaive that uses Kriging prediction to estimate the response at censored locations and predict new values at unknown locations. It also offers graphical tools for assessing the fitted model. More details can be found in Ordonez et al. (2018) <doi:10.1016/j.spasta.2017.12.001>.
This package provides tools for fitting Bayesian Distributed Lag Models (DLMs) to longitudinal response data that is a count or binary. Count data is fit using negative binomial regression and binary is fit using quantile regression. The contribution of the lags are fit via b-splines. In addition, infers the predictor inclusion uncertainty. Multimomial models are not supported. Based on Dempsey and Wyse (2025) <doi:10.48550/arXiv.2403.03646>.
Identity by Descent (IBD) distributions in pedigrees. A Hidden Markov Model is used to compute identity coefficients, simulate IBD segments and to derive the distribution of total IBD sharing and segment count across chromosomes. The methods are applied in Kruijver (2025) <doi:10.3390/genes16050492>. The probability that the total IBD sharing is zero can be computed using the method of Donnelly (1983) <doi:10.1016/0040-5809(83)90004-7>.
This package provides a tool for optimizing scales of effect when modeling ecological processes in space. Specifically, the scale parameter of a distance-weighted kernel distribution is identified for all environmental layers included in the model. Includes functions to assist in model selection, model evaluation, efficient transformation of raster surfaces using fast Fourier transformation, and projecting models. For more details see Peterman (2025) <doi:10.21203/rs.3.rs-7246115/v1>.
Routines for PLS-based genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIP-chip analysis. The >=1.2-1 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logit-SPLS; and an adaptive version of the sparse PLS.
This package provides tools for analysing the agreement of two or more rankings of the same items. Examples are importance rankings of predictor variables and risk predictions of subjects. Benchmarks for agreement are computed based on random permutation and bootstrap. See Ekstrøm CT, Gerds TA, Jensen, AK (2018). "Sequential rank agreement methods for comparison of ranked lists." _Biostatistics_, *20*(4), 582-598 <doi:10.1093/biostatistics/kxy017> for more information.
This package provides utilities for computing measures to assess model quality, which are not directly provided by R's base or stats packages. These include e.g. measures like r-squared, intraclass correlation coefficient, root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models.
The bundle provides four packages:
rubikcubeprovides commands for typesetting Rubik cubes and their transformations,rubiktwocubeprovides commands for typesetting Rubik twocubes and their transformations,rubikrotationcan process a sequence of Rubik rotation moves, with the help of a Perl package executed via\write18(shell escape) commands,rubikpatternsis a collection of well known patterns and their associated rotation sequences.
This package provides tools for Bayesian basket trial design and analysis using a novel three-component local power prior framework with global borrowing control, pairwise similarity assessment and a borrowing threshold. Supports simulation-based evaluation of operating characteristics and comparison with other methods. Applicable to both equal and unequal sample size settings in early-phase oncology trials. For more details see Zhou et al. (2023) <doi:10.48550/arXiv.2312.15352>.
Sampling from the Cholesky factorization of a Wishart random variable, sampling from the inverse Wishart distribution, sampling from the Cholesky factorization of an inverse Wishart random variable, sampling from the pseudo Wishart distribution, sampling from the generalized inverse Wishart distribution, computing densities for the Wishart and inverse Wishart distributions, and computing the multivariate gamma and digamma functions. Provides a header file so the C functions can be called directly from other programs.
Built by Hodges lab members for current and future Hodges lab members. Other individuals are welcome to use as well. Provides useful functions that the lab uses everyday to analyze various genomic datasets. Critically, only general use functions are provided; functions specific to a given technique are reserved for a separate package. As the lab grows, we expect to continue adding functions to the package to build on previous lab members code.
Enhances mlexperiments <https://CRAN.R-project.org/package=mlexperiments> with additional machine learning ('ML') learners for survival analysis. The package provides R6-based survival learners for the following algorithms: glmnet <https://CRAN.R-project.org/package=glmnet>, ranger <https://CRAN.R-project.org/package=ranger>, xgboost <https://CRAN.R-project.org/package=xgboost>, and rpart <https://CRAN.R-project.org/package=rpart>. These can be used directly with the mlexperiments R package.
This package provides a collection of NASCAR race, driver, owner and manufacturer data across the three major NASCAR divisions: NASCAR Cup Series, NASCAR Xfinity Series, and NASCAR Craftsman Truck Series. The curated data begins with the 1949 season and extends through the end of the 2024 season. Explore race, season, or career performance for drivers, teams, and manufacturers throughout NASCAR's history. Data was sourced with permission from DriverAverages.com.
Given a dataset, the user is invited to utilize the Empirical Cumulative Distribution Function (ECDF) to guess interactively the mean and the mean deviation. Thereafter, using the quadratic curve the user can guess the Root Mean Squared Deviation (RMSD) and visualize the standard deviation (SD). For details, see Sarkar and Rashid (2019)<doi:10.3126/njs.v3i0.25574>, Have You Seen the Standard Deviaton?, Nepalese Journal of Statistics, Vol. 3, 1-10.
Converts text into speech using various text-to-speech (TTS) engines and provides an unified interface for accessing their functionality. With this package, users can easily generate audio files of spoken words, phrases, or sentences from plain text data. The package supports multiple TTS engines, including Google's Cloud Text-to-Speech API', Amazon Polly', Microsoft's Cognitive Services Text to Speech REST API', and a free TTS engine called Coqui TTS'.
It performs the smoothing approach provided by penalized least squares for univariate and bivariate time series, as proposed by Guerrero (2007) and Gerrero et al. (2017). This allows to estimate the time series trend by controlling the amount of resulting (joint) smoothness. --- Guerrero, V.M (2007) <DOI:10.1016/j.spl.2007.03.006>. Guerrero, V.M; Islas-Camargo, A. and Ramirez-Ramirez, L.L. (2017) <DOI:10.1080/03610926.2015.1133826>.
Leveraging (large) language models for automatic topic labeling. The main function converts a list of top terms into a label for each topic. Hence, it is complementary to any topic modeling package that produces a list of top terms for each topic. While human judgement is indispensable for topic validation (i.e., inspecting top terms and most representative documents), automatic topic labeling can be a valuable tool for researchers in various scenarios.
This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.
An automated pipeline for the detection, integration and reporting of predefined features across a large number of mass spectrometry data files. It enables the real time annotation of multiple compounds in a single file, or the parallel annotation of multiple compounds in multiple files. A graphical user interface as well as command line functions will assist in assessing the quality of annotation and update fitting parameters until a satisfactory result is obtained.
This package provides methods to compute simultaneous prediction and confidence bands for dense time series data. The implementation builds on the functional bootstrap approach proposed by Lenhoff et al. (1999) <doi:10.1016/S0966-6362(98)00043-5> and extended by Koska et al. (2023) <doi:10.1016/j.jbiomech.2023.111506> to support both independent and clustered (hierarchical) data. Includes a simple API (see band()) and an Rcpp backend for performance.
Pairwise Hamming distances are computed between the rows of a binary (0/1) matrix using highly optimized C code. The input is an integer matrix where each row represents a binary feature vector and returns a symmetric integer matrix of pairwise distances. Internally, rows are bit-packed into 64-bit words for fast XOR-based comparisons, with hardware-accelerated popcount operations to count differences. OpenMP parallelization ensures efficient performance for large matrices.