The ability to tune models is important. finetune enhances the tune package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <doi:10.48550/arXiv.1405.6974> are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.
Anonymized data from surveys conducted by Forwards <https://forwards.github.io/>, the R Foundation task force on women and other under-represented groups. Currently, a single data set of responses to a survey of attendees at useR! 2016 <https://www.r-project.org/useR-2016/>, the R user conference held at Stanford University, Stanford, California, USA, June 27 - June 30 2016.
Ease the transition between R vectors and markdown text. With gluedown and rmarkdown', users can create traditional vectors in R, glue those strings together with the markdown syntax, and print those formatted vectors directly to the document. This package primarily uses GitHub Flavored Markdown (GFM), an offshoot of the unambiguous CommonMark specification by John MacFarlane (2019) <https://spec.commonmark.org/>.
Utilizes methods of the PyMongo Python library to initialize, insert and query GeoJson data (see <https://github.com/mongodb/mongo-python-driver> for more information on PyMongo'). Furthermore, it allows the user to validate GeoJson objects and to use the console for MongoDB (bulk) commands. The reticulate package provides the R interface to Python modules, classes and functions.
This package provides functions and data are provided that support a course that emphasizes statistical issues of inference and generalizability. The functions are designed to make it straightforward to illustrate the use of cross-validation, the training/test approach, simulation, and model-based estimates of accuracy. Methods considered are Generalized Additive Modeling, Linear and Quadratic Discriminant Analysis, Tree-based methods, and Random Forests.
This package provides a pipeline to annotate chromatography peaks from the IDSL.IPA workflow <doi:10.1021/acs.jproteome.2c00120> with molecular formulas of a prioritized chemical space using an isotopic profile matching approach. The IDSL.UFA workflow only requires mass spectrometry level 1 (MS1) data for formula annotation. The IDSL.UFA methods was described in <doi:10.1021/acs.analchem.2c00563> .
Data from the United States Center for Medicare and Medicaid Services (CMS) is included in this package. There are ICD-9 and ICD-10 diagnostic and procedure codes, and lists of the chapter and sub-chapter headings and the ranges of ICD codes they encompass. There are also two sample datasets. These data are used by the icd package for finding comorbidities.
This is for code management functions, NLP tools, a Monty Hall simulator, and for implementing my own variable reduction technique called Feed Reduction. The Feed Reduction technique is not yet published, but is merely a tool for implementing a series of binary neural networks meant for reducing data into N dimensions, where N is the number of possible values of the response variable.
This package provides a set of tools designed to enhance transparency and understanding of date-time manipulation functions from the lubridate package. It provides detailed feedback about the operations performed by lubridate functions, allowing users to better comprehend and debug their code. These insights serve as both a learning tool for newcomers and a debugging aid for programmers working with date-time data.
Semi-parametric approach for sparse canonical correlation analysis which can handle mixed data types: continuous, binary and truncated continuous. Bridge functions are provided to connect Kendall's tau to latent correlation under the Gaussian copula model. The methods are described in Yoon, Carroll and Gaynanova (2020) <doi:10.1093/biomet/asaa007> and Yoon, Mueller and Gaynanova (2021) <doi:10.1080/10618600.2021.1882468>.
An interface to build machine learning models for classification and regression problems. mikropml implements the ML pipeline described by TopçuoÄ lu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
This package provides a comprehensive implementation of Petersen-type estimators and its many variants for two-sample capture-recapture studies. A conditional likelihood approach is used that allows for tag loss; non reporting of tags; reward tags; categorical, geographical and temporal stratification; partial stratification; reverse capture-recapture; and continuous variables in modeling the probability of capture. Many examples from fisheries management are presented.
Makes it easy to push data to Power BI using R and the Power BI REST APIs (see <https://docs.microsoft.com/en-us/rest/api/power-bi/>). A set of functions for turning data frames into Power BI datasets and refreshing these datasets are provided. Administrative tasks such as monitoring refresh statuses and pulling metadata about workspaces and users are also supported.
Fast computation of multivariate analyses of small (10s to 100s markers) to big (1000s to 100000s) genotype data. Runs Principal Component Analysis allowing for centering, z-score standardization and scaling for genetic drift, projection of ancient samples to modern genetic space and multivariate tests for differences in group location (Permutation-Based Multivariate Analysis of Variance) and dispersion (Permutation-Based Multivariate Analysis of Dispersion).
This package implements the Seinhorst model to analyze the relationship between initial nematode densities and plant growth response using nonlinear least squares estimation. The package provides tools for model fitting, prediction, and visualization, facilitating the study of plant-nematode interactions. Model parameters can be estimated or set to predefined values based on Seinhorst (1986) <doi:10.1007/978-1-4613-2251-1_11>.
Bayesian Tensor Factorization for decomposition of tensor data sets using the trilinear CANDECOMP/PARAFAC (CP) factorization, with automatic component selection. The complete data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The method performs factorization for three-way tensor datasets and the inference is implemented with Gibbs sampling.
This package implements an algorithm for generating maps, known as tile maps, in which each region is represented by a single tile of the same shape and size. The algorithm was first proposed in "Generating Tile Maps" by Graham McNeill and Scott Hale (2017) <doi:10.1111/cgf.13200>. Functions allow users to generate, plot, and compare square or hexagon tile maps.
This package implements a robust multivariate control-chart methodology for batch-based industrial processes with multiple correlated variables using the Dual STATIS (Structuration des Tableaux A Trois Indices de la Statistique) framework. A robust compromise covariance matrix is constructed from Phase I batches with the Minimum Covariance Determinant (MCD) estimator, and a Hotelling-type T² statistic is applied for anomaly detection in Phase II. The package includes functions to simulate clean and contaminated batches, to compute both robust and classical Hotelling T² control charts, to visualize results via robust biplots, and to launch an interactive shiny dashboard. An internal dataset (pharma_data) is provided for reproducibility. See Lavit, Escoufier, Sabatier and Traissac (1994) <doi:10.1016/0167-9473(94)90134-1> for the original STATIS methodology, and Rousseeuw and Van Driessen (1999) <doi:10.1080/00401706.1999.10485670> for the MCD estimator.
This package provides an R port of the library Clipper. It performs polygon clipping operations (intersection, union, set minus, set difference) for polygonal regions of arbitrary complexity, including holes. It computes offset polygons (spatial buffer zones, morphological dilations, Minkowski dilations) for polygonal regions and polygonal lines. It computes the Minkowski Sum of general polygons. There is a function for removing self-intersections from polygon data.
This package provides tools for the variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). The main applications are in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
Helps enable adaptive management by codifying knowledge in the form of models generated from numerous analyses and data sets. Facilitates this process by storing all models and data sets in a single object that can be updated and saved, thus tracking changes in knowledge through time. A shiny application called AM Model Manager (modelMgr()) enables the use of these functions via a GUI.
This package provides easy access to the AviList Global Avian Checklist, the first unified global bird taxonomy that harmonizes previous differences between International Ornithological Committee ('IOC'), Clements', and BirdLife checklists. This package contains the complete AviList dataset as R data objects ready for ornithological research and analysis. For more details see AviList Core Team (2025) <doi:10.2173/avilist.v2025>.
Calculate ActiGraph counts from the X, Y, and Z axes of a triaxial accelerometer. This work was inspired by Neishabouri et al. who published the article "Quantification of Acceleration as Activity Counts in ActiGraph Wearables" on February 24, 2022. The link to the article (<https://pubmed.ncbi.nlm.nih.gov/35831446>) and python implementation of this code (<https://github.com/actigraph/agcounts>).
This package provides a number of functions to access the National Energy Research Laboratory Alternate Fuel Locator API <https://developer.nrel.gov/docs/transportation/alt-fuel-stations-v1/>. The Alternate Fuel Locator shows the location of alternate fuel stations in the United States and Canada. This package also includes the data from the US Department of Energy Alternate Fuel database as a data set.