Automated performance of common transformations used to fulfill parametric assumptions of normality and identification of the best performing method for the user. Output for various normality tests (Thode, 2002) corresponding to the best performing method and a descriptive statistical report of the input data in its original units (5-number summary and mathematical moments) are also presented. Lastly, the Rankit, an empirical normal quantile transformation (ENQT) (Soloman & Sawilowsky, 2009), is provided to accommodate non-standard use cases and facilitate adoption. <DOI: 10.1201/9780203910894>. <DOI: 10.22237/jmasm/1257034080>.
Assists in statistical model building to find optimal and semi-optimal higher order interactions and best subsets. Uses the lm(), glm(), and other R functions to fit models generated from a feasible solution algorithm. Discussed in Subset Selection in Regression, A Miller (2002). Applied and explained for least median of squares in Hawkins (1993) <doi:10.1016/0167-9473(93)90246-P>. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.
This package provides bindings to the OSQP solver. The OSQP solver is a numerical optimization package or solving convex quadratic programs written in C and based on the alternating direction method of multipliers. See <arXiv:1711.08013> for details.
This package provides linear models based on Theil-Sen single median and Siegel repeated medians. They are very robust (29 or 50 percent breakdown point, respectively), and if no outliers are present, the estimators are very similar to OLS.
The ability to tune models is important. tune contains functions and classes to be used in conjunction with other tidymodels packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
This package contains functions that allow analysing and comparing omic data across various cancers/cancer subgroups easily. So far, it is compatible with RNA-seq, microRNA-seq, microarray and methylation datasets that are stored on cbioportal.org.
Toolkit for the analysis of multiple gene data (Jombart et al. 2017) <doi:10.1111/1755-0998.12567>. apex implements the new S4 classes multidna', multiphyDat and associated methods to handle aligned DNA sequences from multiple genes.
Calculate the area of triangles and polygons using the shoelace formula. Area may be signed, taking into account path orientation, or unsigned, ignoring path orientation. The shoelace formula is described at <https://en.wikipedia.org/wiki/Shoelace_formula>.
Simulate, estimate and forecast a wide range of regression based dynamic models for bounded time series, covering the most commonly applied models in the literature. The main calculations are done in FORTRAN, which translates into very fast algorithms.
The Citation File Format version 1.2.0 <doi:10.5281/zenodo.5171937> is a human and machine readable file format which provides citation metadata for software. This package provides core utilities to generate and validate this metadata.
Estimation of counterfactual outcomes for multiple values of continuous interventions at different time points, and plotting of causal dose-response curves. Details are given in Schomaker, McIlleron, Denti, Diaz (2024) <doi:10.48550/arXiv.2305.06645>.
Model fitting and evaluation tools for double generalized linear models (DGLMs). This class of models uses one generalized linear model (GLM) to fit the specified response and a second GLM to fit the deviance of the first model.
Applies dynamic structural equation models to time-series data with generic and simplified specification for simultaneous and lagged effects. Methods are described in Thorson et al. (2024) "Dynamic structural equation models synthesize ecosystem dynamics constrained by ecological mechanisms.".
This package implements the daily based Morgan-Morgan-Finney (DMMF) soil erosion model (Choi et al., 2017 <doi:10.3390/w9040278>) for estimating surface runoff and sediment budgets from a field or a catchment on a daily basis.
Presents a statistical method that uses a recursive algorithm for signal extraction. The method handles a non-parametric estimation for the correlation of the errors. See "Krivobokova", "Serra", "Rosales" and "Klockmann" (2021) <arXiv:1812.06948> for details.
Feature Ordering by Conditional Independence (FOCI) is a variable selection algorithm based on the measure of conditional dependence. For more information, see the paper: Azadkia and Chatterjee (2019),"A simple measure of conditional dependence" <arXiv:1910.12327>.
Estimating trait heritability and handling overfitting. This package includes a collection of functions for (1) estimating genetic variance-covariances and calculate trait heritability; and (2) handling overfitting by calculating the variance components and the heritability through cross validation.
Goodness-of-fit tests for skew-normal, gamma, inverse Gaussian, log-normal, Weibull', Frechet', Gumbel, normal, multivariate normal, Cauchy, Laplace or double exponential, exponential and generalized Pareto distributions. Parameter estimators for gamma, inverse Gaussian and generalized Pareto distributions.
Two main functionalities are provided. One of them is predicting values with k-nearest neighbors algorithm and the other is optimizing the parameters k and d of the algorithm. These are carried out in parallel using multiple threads.
This package provides a unified interface to large language models across multiple providers. Supports text generation, structured output with optional JSON Schema validation, and embeddings. Includes tidyverse-friendly helpers, chat session, consistent error handling, and parallel batch tools.
This package creates and manages a PostgreSQL database suitable for storing fisheries data and aggregating ready for use within a Gadget <https://gadget-framework.github.io/gadget2/> model. See <https://mareframe.github.io/mfdb/> for more information.
An efficient implementation of SCCI using Rcpp'. SCCI is short for the Stochastic Complexity-based Conditional Independence criterium (Marx and Vreeken, 2019). SCCI is an asymptotically unbiased and L2 consistent estimator of (conditional) mutual information for discrete data.
Agglomerative hierarchical clustering with a bespoke distance measure based on medication similarities in the Anatomical Therapeutic Chemical Classification System, medication timing and medication amount or dosage. Tools for summarizing, illustrating and manipulating the cluster objects are also available.
This package contains logic for single sample gene set testing of cancer transcriptomic data with adjustment for normal tissue-specificity. Frost, H. Robert (2023) "Tissue-adjusted pathway analysis of cancer (TPAC)" <doi:10.1101/2022.03.17.484779>.