This package provides a collection of Irucka Embry's miscellaneous USGS data sets (USGS Parameter codes with fixed values, USGS global time zone codes, and US Air Force Global Engineering Weather Data). Irucka created these data sets while a Cherokee Nation Technology Solutions (CNTS) United States Geological Survey (USGS) Contractor and/or USGS employee.
This package provides a collection of functions for sensitivity analysis of model outputs (factor screening, global sensitivity analysis and robustness analysis), for variable importance measures of data, as well as for interpretability of machine learning models. Most of the functions have to be applied on scalar output, but several functions support multi-dimensional outputs.
Read General Transit Feed Specification (GTFS) zipfiles into a list of R dataframes. Perform validation of the data structure against the specification. Analyze the headways and frequencies at routes and stops. Create maps and perform spatial analysis on the routes and stops. Please see the GTFS documentation here for more detail: <https://gtfs.org/>.
This package is a feature selection package of the mlr3 ecosystem. It selects the optimal feature set for any mlr3 learner. The package works with several optimization algorithms e.g. random search, Recursive feature elimination, and genetic search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.
This package contains functions to implement the methodology and considerations laid out by Marks et al. in the article "Measuring abnormality in high dimensional spaces: applications in biomechanical gait analysis". Using high-dimensional datasets to measure a subject's overall level of abnormality as compared to a reference population is often needed in outcomes research.
Bit-level reading and writing are necessary when dealing with many file formats e.g. compressed data and binary files. Currently, R connections are manipulated at the byte level. This package wraps existing connections and raw vectors so that it is possible to read bits, bit sequences, unaligned bytes and low-bit representations of integers.
Create life tables with a Bayesian approach, which can be very useful for modelling a complex health process when considering multiple predisposing factors and multiple coexisting health conditions. Details for this method can be found in: Lynch, Scott, et al., (2022) <doi:10.1177/00811750221112398>; Zang, Emma, et al., (2022) <doi:10.1093/geronb/gbab149>.
Various functions to import, verify, process and plot high-resolution dendrometer data using daily and stem-cycle approaches as described in Deslauriers et al, 2007 <doi:10.1016/j.dendro.2007.05.003>. For more details about the package please see: Van der Maaten et al. 2016 <doi:10.1016/j.dendro.2016.06.001>.
Three methods are provided to estimate graphical models with latent variables: (1) Jin, Y., Ning, Y., and Tan, K. M. (2020) (preprint available); (2) Chandrasekaran, V., Parrilo, P. A. & Willsky, A. S. (2012) <doi:10.1214/11-AOS949>; (3) Tan, K. M., Ning, Y., Witten, D. M. & Liu, H. (2016) <doi:10.1093/biomet/asw050>.
Uses multiple AUCs to select a combination of predictors when the outcome has multiple (ordered) levels and the focus is discriminating one particular level from the others. This method is most naturally applied to settings where the outcome has three levels. (Meisner, A, Parikh, CR, and Kerr, KF (2017) <http://biostats.bepress.com/uwbiostat/paper423/>.).
The pharmaverse is a set of packages that compose multiple pathways through clinical data generation and reporting in the pharmaceutical industry. This package is designed to guide users to our work-spaces on GitHub
', Slack and LinkedIn
as well as our website and examples. Learn more about the pharmaverse at <https://pharmaverse.org>.
This package provides tools for exchanging pedigree data between the pedsuite packages and the Familias software for forensic kinship computations (Egeland et al. (2000) <doi:10.1016/s0379-0738(00)00147-x>). These functions were split out from the forrel package to streamline maintenance and provide a lightweight alternative for packages otherwise independent of forrel'.
Price comparisons within or between countries provide an overall measure of the relative difference in prices, often denoted as price levels. This package provides index number methods for such price comparisons (e.g., The World Bank, 2011, <doi:10.1596/978-0-8213-9728-2>). Moreover, it contains functions for sampling and characterizing price data.
Corrects the spelling of a given word in English using a modification of Peter Norvig's spell correct algorithm (see <http://norvig.com/spell-correct.html>) which handles up to three edits. The algorithm tries to find the spelling with maximum probability of intended correction out of all possible candidate corrections from the original word.
To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy tibble it lends itself to working with the rest of the tidyverse'.
Given a partition resulting from any clustering algorithm, the implemented tests allow valid post-clustering inference by testing if a given variable significantly separates two of the estimated clusters. Methods are detailed in: Hivert B, Agniel D, Thiebaut R & Hejblum BP (2022). "Post-clustering difference testing: valid inference and practical considerations", <arXiv:2210.13172>
.
This package provides tools for analyzing R expressions or blocks of code and determining the dependencies between them. It focuses on R scripts, but can be used on the bodies of functions. There are many facilities including the ability to summarize or get a high-level view of code, determining dependencies between variables, code improvement suggestions.
This package creates dummy columns from columns that have categorical variables (character or factor types). You can also specify which columns to make dummies out of, or which columns to ignore. Also creates dummy rows from character, factor, and Date columns. This package provides a significant speed increase from creating dummy variables through model.matrix()
.
The IPC::Run3 module allows you to run a subprocess and redirect stdin, stdout, and/or stderr to files and perl data structures. It aims to satisfy 99% of the need for using system, qx, and open3 with a simple, extremely Perlish API and none of the bloat and rarely used features of IPC::Run.
The Autoregressive Integrated Moving Average (ARIMA) model is very popular univariate time series model. Its application has been widened by the incorporation of exogenous variable(s) (X) in the model and modified as ARIMAX by Bierens (1987) <doi:10.1016/0304-4076(87)90086-8>. In this package we estimate the ARIMAX model using Bayesian framework.
Add trendline and confidence interval of linear or nonlinear regression model and show equation to ggplot as simple as possible. For a general overview of the methods used in this package, see Ritz and Streibig (2008) <doi:10.1007/978-0-387-09616-2> and Greenwell and Schubert Kabban (2014) <doi:10.32614/RJ-2014-009>.
Sports Injury Data analysis aims to identify and describe the magnitude of the injury problem, and to gain more insights (e.g. determine potential risk factors) by statistical modelling approaches. The injurytools package provides standardized routines and utilities that simplify such analyses. It offers functions for data preparation, informative visualizations and descriptive and model-based analyses.
This package performs variety of viral quasispecies diversity analyses [see Pamornchainavakul et al. (2024) <doi:10.21203/rs.3.rs-4637890/v1>] based on long-read sequence alignment. Main functions include 1) sequencing error and other noise minimization and read sampling, 2) Single nucleotide variant (SNV) profiles comparison, and 3) viral quasispecies profiles comparison and visualization.
This package implements methods to automate the Auer-Gervini graphical Bayesian approach for determining the number of significant principal components. Automation uses clustering, change points, or simple statistical models to distinguish "long" from "short" steps in a graph showing the posterior number of components as a function of a prior parameter. See <doi:10.1101/237883>.