This package provides a set of tools to streamline data analysis. Learning both R and introductory statistics at the same time can be challenging, and so we created rigr to facilitate common data analysis tasks and enable learners to focus on statistical concepts. We provide easy-to-use interfaces for descriptive statistics, one- and two-sample inference, and regression analyses. rigr output includes key information while omitting unnecessary details that can be confusing to beginners. Heteroscedasticity-robust ("sandwich") standard errors are returned by default, and multiple partial F-tests and tests for contrasts are easy to specify. A single regression function can fit both linear and generalized linear models, allowing students to more easily make connections between different classes of models.
Rlwrap is a 'readline wrapper', a small utility that uses the GNU readline library to allow the editing of keyboard input for any command. You should consider rlwrap especially when you need user-defined completion (by way of completion word lists) and persistent history, or if you want to program `special effects' using the filter mechanism.
LACE is an algorithmic framework that processes single-cell somatic mutation profiles from cancer samples collected at different time points and in distinct experimental settings, to produce longitudinal models of cancer evolution. The approach solves a Boolean Matrix Factorization problem with phylogenetic constraints, by maximizing a weighed likelihood function computed on multiple time points.
This package provides tools and workflow to choose design parameters in Bayesian adaptive single-arm phase II trial designs with binary endpoint (response, success) with possible stopping for efficacy and futility at interim analyses. Also contains routines to determine and visualize operating characteristics. See Kopp-Schneider et al. (2018) <doi:10.1002/bimj.201700209>.
Partitions data points (variables) into communities/clusters, similar to clustering algorithms such as k-means and hierarchical clustering. This package implements a clustering algorithm based on a new metric CORD, defined for high-dimensional parametric or semiparametric distributions. For more details see Bunea et al. (2020), Annals of Statistics <doi:10.1214/18-AOS1794>.
Clean, decompose and aggregate univariate time series following the procedure "Cyclic/trend decomposition using bin interpolation" and the Logbox method for flagging outliers, both detailed in Ritter, F.: Technical note: A procedure to clean, decompose, and aggregate time series, Hydrol. Earth Syst. Sci., 27, 349â 361, <doi:10.5194/hess-27-349-2023>, 2023.
Estimate common causal parameters using double/debiased machine learning as proposed by Chernozhukov et al. (2018) <doi:10.1111/ectj.12097>. ddml simplifies estimation based on (short-)stacking as discussed in Ahrens et al. (2024) <doi:10.1002/jae.3103>, which leverages multiple base learners to increase robustness to the underlying data generating process.
Estimation of fully and partially observed Exponential-Family Random Network Models (ERNM). Exponential-family Random Graph Models (ERGM) and Gibbs Fields are special cases of ERNMs and can also be estimated with the package. Please cite Fellows and Handcock (2012), "Exponential-family Random Network Models" available at <doi:10.48550/arXiv.1208.0121>.
When you want to install R package or download file from GitHub, but you can't access GitHub, this package helps you install R packages or download file from GitHub via the proxy website <https://gh-proxy.com/> or <https://ghfast.top/>, which is in real-time sync with GitHub.
Impute observed values below the limit of detection (LOD) via censored likelihood multiple imputation (CLMI) in single-pollutant models, developed by Boss et al (2019) <doi:10.1097/EDE.0000000000001052>. CLMI handles exposure detection limits that may change throughout the course of exposure assessment. lodi provides functions for imputing and pooling for this method.
This package implements the One Rule (OneR) Machine Learning classification algorithm (Holte, R.C. (1993) <doi:10.1023/A:1022631118932>) with enhancements for sophisticated handling of numeric data and missing values together with extensive diagnostic functions. It is useful as a baseline for machine learning models and the rules are often helpful heuristics.
Given a project schedule and associated costs, this package calculates the earned value to date. It is an implementation of Project Management Body of Knowledge (PMBOK) methodologies (reference Project Management Institute. (2021). A guide to the Project Management Body of Knowledge (PMBOK guide) (7th ed.). Project Management Institute, Newtown Square, PA, ISBN 9781628256673 (pdf)).
The portmanteau local feature discriminant approach first identifies the local discriminant features and their differential structures, then constructs the discriminant rule by pooling the identified local features together. This method is applicable to high-dimensional matrix-variate data. See the paper by Xu, Luo and Chen (2023, <doi:10.1007/s13171-021-00255-2>).
Quantile-based estimators (Q-estimators) can be used to fit any parametric distribution, using its quantile function. Q-estimators are usually more robust than standard maximum likelihood estimators. The method is described in: Sottile G. and Frumento P. (2022). Robust estimation and regression with parametric quantile functions. <doi:10.1016/j.csda.2022.107471>.
Offers Bayesian semiparametric density estimation and tail-index estimation for heavy tailed data, by using a parametric, tail-respecting transformation of the data to the unit interval and then modeling the transformed data with a purely nonparametric logistic Gaussian process density prior. Based on Tokdar et al. (2022) <doi:10.1080/01621459.2022.2104727>.
Sparsity Oriented Importance Learning (SOIL) provides a new variable importance measure for high dimensional linear regression and logistic regression from a sparse penalization perspective, by taking into account the variable selection uncertainty via the use of a sensible model weighting. The package is an implementation of Ye, C., Yang, Y., and Yang, Y. (2017+).
Strength training prescription using percent-based approach requires numerous computations and assumptions. STMr package allow users to estimate individual reps-max relationships, implement various progression tables, and create numerous set and rep schemes. The STMr package is originally created as a tool to help writing JovanoviÄ M. (2020) Strength Training Manual <ISBN:979-8604459898>.
This is a statistical tool interactive that provides multivariate statistical tests that are more powerful than traditional Hotelling T2 test and LRT (likelihood ratio test) for the vector of normal mean populations with and without contamination and non-normal populations (Henrique J. P. Alves & Daniel F. Ferreira (2019) <DOI: 10.1080/03610918.2019.1693596>).
Fast, reproducible detection and quantitative analysis of tertiary lymphoid structures (TLS) in multiplexed tissue imaging. Implements Independent Component Analysis Trace (ICAT) index, local Ripley's K scanning, automated K Nearest Neighbor (KNN)-based TLS detection, and T-cell clusters identification as described in Amiryousefi et al. (2025) <doi:10.1101/2025.09.21.677465>.
This package provides a suite of routines for Weyl algebras. Notation follows Coutinho (1995, ISBN 0-521-55119-6, "A Primer of Algebraic D-Modules"). Uses disordR discipline (Hankin 2022 <doi:10.48550/arXiv.2210.03856>). To cite the package in publications, use Hankin 2022 <doi:10.48550/arXiv.2212.09230>.
Many modern biological datasets consist of small counts that are not well fit by standard linear-Gaussian methods such as principal component analysis. This package provides implementations of count-based feature selection and dimension reduction algorithms. These methods can be used to facilitate unsupervised analysis of any high-dimensional data such as single-cell RNA-seq.
MDQC is a multivariate quality assessment method for microarrays based on quality control (QC) reports. The Mahalanobis distance of an array's quality attributes is used to measure the similarity of the quality of that array against the quality of the other arrays. Then, arrays with unusually high distances can be flagged as potentially low-quality.
This package is an implementation of about 6 major classes of statistical regression models. Currently only fixed-effects models are implemented, i.e., no random-effects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate generalised linear models.
reptyr is a utility for taking an existing running program and attaching it to a new terminal. Started a long-running process over ssh, but have to leave and don't want to interrupt it? Just start a screen, use reptyr to grab it, and then kill the ssh session and head on home.