High-throughput analysis of growth curves and fluorescence data using three methods: linear regression, growth model fitting, and smooth spline fit. Analysis of dose-response relationships via smoothing splines or dose-response models. Complete data analysis workflows can be executed in a single step via user-friendly wrapper functions. The results of these workflows are summarized in detailed reports as well as intuitively navigable R data containers. A shiny application provides access to all features without requiring any programming knowledge. The package is described in further detail in Wirth et al. (2023) <doi:10.1038/s41596-023-00850-7>.
This package implements the SISAL algorithm by Tikka and Hollmén. It is a sequential backward selection algorithm which uses a linear model in a cross-validation setting. Starting from the full model, one variable at a time is removed based on the regression coefficients. From this set of models, a parsimonious (sparse) model is found by choosing the model with the smallest number of variables among those models where the validation error is smaller than a threshold. Also implements extensions which explore larger parts of the search space and/or use ridge regression instead of ordinary least squares.
Rapidly build accurate genetic prediction models for genome-wide association or whole-genome sequencing study data by smooth-threshold multivariate genetic prediction (STMGP) method. Variable selection is performed using marginal association test p-values with an optimal p-value cutoff selected by Cp-type criterion. Quantitative and binary traits are modeled respectively via linear and logistic regression models. A function that works through PLINK software (Purcell et al. 2007 <DOI:10.1086/519795>, Chang et al. 2015 <DOI:10.1186/s13742-015-0047-8>) <https://www.cog-genomics.org/plink2> is provided. Covariates can be included in regression model.
Health research using data from electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error for association tests. Here, the assumed target of inference is the relationship between binary disease status and predictors modeled using a logistic regression model. SAMBA implements several methods for obtaining bias-corrected point estimates along with valid standard errors as proposed in Beesley and Mukherjee (2020) <doi:10.1101/2019.12.26.19015859>, currently under review.
High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression.
Our recently developed fully robust Bayesian semiparametric mixed-effect model for high-dimensional longitudinal studies with heterogeneous observations can be implemented through this package. This model can distinguish between time-varying interactions and constant-effect-only cases to avoid model misspecifications. Facilitated by spike-and-slab priors, this model leads to superior performance in estimation, identification and statistical inference. In particular, robust Bayesian inferences in terms of valid Bayesian credible intervals on both parametric and nonparametric effects can be validated on finite samples. The Markov chain Monte Carlo algorithms of the proposed and alternative models are efficiently implemented in C++'.
This package provides a suite of loon related packages providing data analytic tools for Direct Interactive Visual Exploration in R ('diveR'). These tools work with and complement those of the tidyverse suite, extending the grammar of ggplot2 to become a grammar of interactive graphics. The suite provides many visual tools designed for moderately (100s of variables) high dimensional data analysis, through zenplots and novel tools in loon', and extends the ggplot2 grammar to provide parallel coordinates, Andrews plots, and arbitrary glyphs through ggmulti'. The diveR package gathers together and installs all these related packages in a single step.
An interface for fitting generalized additive models (GAMs) and generalized additive mixed models (GAMMs) using the lme4 package as the computational engine, as described in Helwig (2024) <doi:10.3390/stats7010003>. Supports default and formula methods for model specification, additive and tensor product splines for capturing nonlinear effects, and automatic determination of spline type based on the class of each predictor. Includes an S3 plot method for visualizing the (nonlinear) model terms, an S3 predict method for forming predictions from a fit model, and an S3 summary method for conducting significance testing using the Bayesian interpretation of a smoothing spline.
Implementations of MOSUM-based statistical procedures and algorithms for detecting multiple changes in the mean. This comprises the MOSUM procedure for estimating multiple mean changes from Eichinger and Kirch (2018) <doi:10.3150/16-BEJ887> and the multiscale algorithmic extension from Cho and Kirch (2022) <doi:10.1007/s10463-021-00811-5>, as well as the bootstrap procedure for generating confidence intervals about the locations of change points as proposed in Cho and Kirch (2022) <doi:10.1016/j.csda.2022.107552>. See also Meier, Kirch and Cho (2021) <doi:10.18637/jss.v097.i08> which accompanies the R package.
This package provides a collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.
This package provides a tool for survival analysis using a discrete time approach with ensemble binary classification. spect provides a simple interface consistent with commonly used R data analysis packages, such as caret', a variety of parameter options to help facilitate search automation, a high degree of transparency to the end-user - all intermediate data sets and parameters are made available for further analysis and useful, out-of-the-box visualizations of model performance. Methods for transforming survival data into discrete-time are adapted from the autosurv package by Suresh et al., (2022) <doi:10.1186/s12874-022-01679-6>.
There are several functions to implement the method for analysis in a randomized clinical trial with strata with following key features. A stratified Mann-Whitney estimator addresses the comparison between two randomized groups for a strictly ordinal response variable. The multivariate vector of such stratified Mann-Whitney estimators for multivariate response variables can be considered for one or more response variables such as in repeated measurements and these can have missing completely at random (MCAR) data. Non-parametric covariance adjustment is also considered with the minimal assumption of randomization. The p-value for hypothesis test and confidence interval are provided.
ISLET is a method to conduct signal deconvolution for general -omics data. It can estimate the individual-specific and cell-type-specific reference panels, when there are multiple samples observed from each subject. It takes the input of the observed mixture data (feature by sample matrix), and the cell type mixture proportions (sample by cell type matrix), and the sample-to-subject information. It can solve for the reference panel on the individual-basis and conduct test to identify cell-type-specific differential expression (csDE) genes. It also improves estimated cell type mixture proportions by integrating personalized reference panels.
This package provides comprehensive tools for blinded sample size re-estimation (BSSR) in two-arm clinical trials with binary endpoints. Unlike traditional fixed-sample designs, BSSR allows adaptive sample size adjustments during trials while maintaining statistical integrity and study blinding. Implements five exact statistical tests: Pearson chi-squared, Fisher exact, Fisher mid-p, Z-pooled exact unconditional, and Boschloo exact unconditional tests. Supports restricted, unrestricted, and weighted BSSR approaches with exact Type I error control. Statistical methods based on Mehrotra et al. (2003) <doi:10.1111/1541-0420.00051> and Kieser (2020) <doi:10.1007/978-3-030-49528-2_21>.
This package implements a very fast C++ algorithm to quickly bootstrap receiver operating characteristics (ROC) curves and derived performance metrics, including the area under the curve (AUC) and the partial area under the curve as well as the true and false positive rate. The analysis of paired receiver operating curves is supported as well, so that a comparison of two predictors is possible. You can also plot the results and calculate confidence intervals. On a typical desktop computer the time needed for the calculation of 100000 bootstrap replicates given 500 observations requires time on the order of magnitude of one second.
Density, distribution function, quantile function, and random generation for the generalized Beta and Beta prime distributions. The family of generalized Beta distributions is conjugate for the Bayesian binomial model, and the generalized Beta prime distribution is the posterior distribution of the relative risk in the Bayesian two Poisson samples model when a Gamma prior is assigned to the Poisson rate of the reference group and a Beta prime prior is assigned to the relative risk. References: Laurent (2012) <doi:10.1214/11-BJPS139>, Hamza & Vallois (2016) <doi:10.1016/j.spl.2016.03.014>, Chen & Novick (1984) <doi:10.3102/10769986009002163>.
Run simple direct gravitational N-body simulations. The package can access different external N-body simulators (e.g. GADGET-4 by Springel et al. (2021) <doi:10.48550/arXiv.2010.03567>), but also has a simple built-in simulator. This default simulator uses a variable block time step and lets the user choose between a range of integrators, including 4th and 6th order integrators for high-accuracy simulations. Basic top-hat smoothing is available as an option. The code also allows the definition of background particles that are fixed or in uniform motion, not subject to acceleration by other particles.
This package provides small area estimation for count data type and gives option whether to use covariates in the estimation or not. By implementing Empirical Bayes (EB) Poisson-Gamma model, each function returns EB estimators and mean squared error (MSE) estimators for each area. The EB estimators without covariates are obtained using the model proposed by Clayton & Kaldor (1987) <doi:10.2307/2532003>, the EB estimators with covariates are obtained using the model proposed by Wakefield (2006) <doi:10.1093/biostatistics/kxl008> and the MSE estimators are obtained using Jackknife method by Jiang et. al. (2002) <doi:10.1214/aos/1043351257>.
This package provides a suite of auxiliary functions that enhance time series estimation and forecasting, including a robust anomaly detection routine based on Chen and Liu (1993) <doi:10.2307/2290724> (imported and wrapped from the tsoutliers package), utilities for managing calendar and time conversions, performance metrics to assess both point forecasts and distributional predictions, advanced simulation by allowing the generation of time series componentsâ such as trend, seasonal, ARMA, irregular, and anomaliesâ in a modular fashion based on the innovations form of the state space model and a number of transformation methods including Box-Cox, Logit, Softplus-Logit and Sigmoid.
Suffix Array Kernel Smoothing (see https://academic.oup.com/bioinformatics/article-abstract/35/20/3944/5418797), or SArKS, identifies sequence motifs whose presence correlates with numeric scores (such as differential expression statistics) assigned to the sequences (such as gene promoters). SArKS smooths over sequence similarity, quantified by location within a suffix array based on the full set of input sequences. A second round of smoothing over spatial proximity within sequences reveals multi-motif domains. Discovered motifs can then be merged or extended based on adjacency within MMDs. False positive rates are estimated and controlled by permutation testing.
Designed for simplicity, a mirai evaluates an R expression asynchronously in a parallel process, locally or distributed over the network. The result is automatically available upon completion. Modern networking and concurrency, built on nanonext and NNG (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Distributed computing can launch remote resources via SSH or cluster managers. An inherently queued architecture handles many more tasks than available processes, and requires no storage on the file system. Innovative features include support for otherwise non-exportable reference objects, event-driven promises, and asynchronous parallel map.
This package provides utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data.
Fit restricted mean models for the conditional association between an exposure and an outcome, given covariates. Three methods are implemented: O-estimation, where a nuisance model for the association between the covariates and the outcome is used; E-estimation where a nuisance model for the association between the covariates and the exposure is used, and doubly robust (DR) estimation where both nuisance models are used. In DR-estimation, the estimates will be consistent when at least one of the nuisance models is correctly specified, not necessarily both. For more information, see Zetterqvist and Sjölander (2015) <doi:10.1515/em-2014-0021>.
This package provides a friendly (flexible) Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithm in a modular way allowing users to specify automatic convergence checker, personalized transition kernels, and out-of-the-box multiple MCMC chains using parallel computing. Most of the methods implemented in this package can be found in Brooks et al. (2011, ISBN 9781420079425). Among the methods included, we have: Haario (2001) <doi:10.1007/s11222-011-9269-5> Adaptive Metropolis, Vihola (2012) <doi:10.1007/s11222-011-9269-5> Robust Adaptive Metropolis, and Thawornwattana et al. (2018) <doi:10.1214/17-BA1084> Mirror transition kernels.