To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy tibble it lends itself to working with the rest of the tidyverse'.
Given a partition resulting from any clustering algorithm, the implemented tests allow valid post-clustering inference by testing if a given variable significantly separates two of the estimated clusters. Methods are detailed in: Hivert B, Agniel D, Thiebaut R & Hejblum BP (2022). "Post-clustering difference testing: valid inference and practical considerations", <arXiv:2210.13172>.
In many analyses, a large amount of variables have to be tested independently against the trait/endpoint of interest, and also adjusted for covariates and confounding factors at the same time. The major bottleneck in these is the amount of time that it takes to complete these analyses. With RegParallel, a large number of tests can be performed simultaneously. On a 12-core system, 144 variables can be tested simultaneously, with 1000s of variables processed in a matter of seconds via nested parallel processing. Works for logistic regression, linear regression, conditional logistic regression, Cox proportional hazards and survival models, and Bayesian logistic regression. Also caters for generalised linear models that utilise survey weights created by the survey CRAN package and that utilise survey::svyglm'.
An R package providing extended biological annotations for the SomaScan Assay, a proteomics platform developed by SomaLogic Operating Co., Inc. The annotations in this package were assembled using data from public repositories. For more information about the SomaScan assay and its data, please reference the SomaLogic/SomaLogic-Data GitHub repository.
The Autoregressive Integrated Moving Average (ARIMA) model is very popular univariate time series model. Its application has been widened by the incorporation of exogenous variable(s) (X) in the model and modified as ARIMAX by Bierens (1987) <doi:10.1016/0304-4076(87)90086-8>. In this package we estimate the ARIMAX model using Bayesian framework.
Add trendline and confidence interval of linear or nonlinear regression model and show equation to ggplot as simple as possible. For a general overview of the methods used in this package, see Ritz and Streibig (2008) <doi:10.1007/978-0-387-09616-2> and Greenwell and Schubert Kabban (2014) <doi:10.32614/RJ-2014-009>.
Sports Injury Data analysis aims to identify and describe the magnitude of the injury problem, and to gain more insights (e.g. determine potential risk factors) by statistical modelling approaches. The injurytools package provides standardized routines and utilities that simplify such analyses. It offers functions for data preparation, informative visualizations and descriptive and model-based analyses.
This package performs variety of viral quasispecies diversity analyses [see Pamornchainavakul et al. (2024) <doi:10.21203/rs.3.rs-4637890/v1>] based on long-read sequence alignment. Main functions include 1) sequencing error and other noise minimization and read sampling, 2) Single nucleotide variant (SNV) profiles comparison, and 3) viral quasispecies profiles comparison and visualization.
This package implements methods to automate the Auer-Gervini graphical Bayesian approach for determining the number of significant principal components. Automation uses clustering, change points, or simple statistical models to distinguish "long" from "short" steps in a graph showing the posterior number of components as a function of a prior parameter. See <doi:10.1101/237883>.
Suppose we have data that has so many series that it is hard to identify them by their colors as the differences are so subtle. With gghighlight we can highlight those lines that match certain criteria. The result is a usual ggplot object, so it is fully customizable and can be used with custom themes and facets.
This package performs angle-based outlier detection on a given data frame. It offers three methods to process data:
full but slow implementation using all the data that has cubic complexity;
a fully randomized method;
a method using k-nearest neighbours.
These algorithms are well suited for high dimensional data outlier detection.
This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.
This package provides tools for assessing and selecting auxiliary variables using LASSO. The package includes functions for variable selection and diagnostics, facilitating survey calibration analysis with emphasis on robust auxiliary vector selection. For more details see Tibshirani (1996) <doi:10.1111/j.2517-6161.1996.tb02080.x> and Caughrey and Hartman (2017) <doi:10.2139/ssrn.3494436>.
This package implements Bayesian estimation and inference for alpha-mixture survival models, including Weibull and Exponential based components, with tools for simulation and posterior summaries. The methods target applications in reliability and biomedical survival analysis. The package implements Bayesian estimation for the alpha-mixture methodology introduced in Asadi et al. (2019) <doi:10.1017/jpr.2019.72>.
Simulating bivariate survival data from copula models. Estimation of the association parameter in copula models. Two different ways to estimate the association parameter in copula models are implemented. A goodness-of-fit test for a given copula model is implemented. See Emura, Lin and Wang (2010) <doi:10.1016/j.csda.2010.03.013> for details.
Quick and easy access to datasets that let you replicate the empirical examples in Cameron and Trivedi (2005) "Microeconometrics: Methods and Applications" (ISBN: 9780521848053).The data are available as soon as you install and load the package (lazy-loading) as data frames. The documentation includes reference to chapter sections and page numbers where the datasets are used.
Counts colors within color range(s) in images, and provides a masked version of the image with targeted pixels changed to a different color. Output includes the locations of the pixels in the images, and the proportion of the image within the target color range with optional background masking. Users can specify multiple color ranges for masking.
Set of functions for step-wise generation of (weighted) graphs. Aimed for research in the field of single- and multi-objective combinatorial optimization. Graphs are generated adding nodes, edges and weights. Each step may be repeated multiple times with different predefined and custom generators resulting in high flexibility regarding the graph topology and structure of edge weights.
This package provides a sparklyr <https://spark.rstudio.com/> extension that provides an R interface for GraphFrames <https://graphframes.github.io/>. GraphFrames is a package for Apache Spark that provides a DataFrame-based API for working with graphs. Functionality includes motif finding and common graph algorithms, such as PageRank and Breadth-first search.
This package provides a bridge between the loon and ggplot2 packages. Extends the grammar of ggplot to add clauses to create interactive loon plots. Existing ggplot(s) can be turned into interactive loon plots and loon plots into static ggplot(s); the function loon.ggplot() is the bridge from one plot structure to the other.
This package contains functions intended to facilitate the production of plant taxonomic monographs. The package includes functions to convert tables into taxonomic descriptions, lists of collectors, examined specimens, identification keys (dichotomous and interactive), and can generate a monograph skeleton. Additionally, wrapper functions to batch the production of phenology histograms and distributional and diversity maps are also available.
R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
Balancing computational and statistical efficiency, subsampling techniques offer a practical solution for handling large-scale data analysis. Subsampling methods enhance statistical modeling for massive datasets by efficiently drawing representative subsamples from full dataset based on tailored sampling probabilities. These probabilities are optimized for specific goals, such as minimizing the variance of coefficient estimates or reducing prediction error.
This package provides a collection of various oversampling techniques developed from SMOTE is provided. SMOTE is a oversampling technique which synthesizes a new minority instance between a pair of one minority instance and one of its K nearest neighbor. Other techniques adopt this concept with other criteria in order to generate balanced dataset for class imbalance problem.