High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET
is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET
can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET
facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression.
GLDEX
offers fitting algorithms corresponding to two major objectives. One is to provide a smoothing device to fit distributions to data using the weighted and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, and L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
This package provides simulation methods for the evolution of antibody repertoires. The heavy and light chain variable region of both human and C57BL/6 mice can be simulated in a time-dependent fashion. Both single lineages using one set of V-, D-, and J-genes or full repertoires can be simulated. The algorithm begins with an initial V-D-J recombination event, starting the first phylogenetic tree. Upon completion, the main loop of the algorithm begins, with each iteration representing one simulated time step. Various mutation events are possible at each time step, contributing to a diverse final repertoire.
Our recently developed fully robust Bayesian semiparametric mixed-effect model for high-dimensional longitudinal studies with heterogeneous observations can be implemented through this package. This model can distinguish between time-varying interactions and constant-effect-only cases to avoid model misspecifications. Facilitated by spike-and-slab priors, this model leads to superior performance in estimation, identification and statistical inference. In particular, robust Bayesian inferences in terms of valid Bayesian credible intervals on both parametric and nonparametric effects can be validated on finite samples. The Markov chain Monte Carlo algorithms of the proposed and alternative models are efficiently implemented in C++'.
Gene expression profiles are commonly utilized to infer disease subtypes and many clustering methods can be adopted for this task. However, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering. To deal with these challenges, we develop a novel clustering method in the Bayesian setting. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering.
This package provides a suite of loon related packages providing data analytic tools for Direct Interactive Visual Exploration in R ('diveR
'). These tools work with and complement those of the tidyverse suite, extending the grammar of ggplot2 to become a grammar of interactive graphics. The suite provides many visual tools designed for moderately (100s of variables) high dimensional data analysis, through zenplots and novel tools in loon', and extends the ggplot2 grammar to provide parallel coordinates, Andrews plots, and arbitrary glyphs through ggmulti'. The diveR
package gathers together and installs all these related packages in a single step.
An interface for fitting generalized additive models (GAMs) and generalized additive mixed models (GAMMs) using the lme4 package as the computational engine, as described in Helwig (2024) <doi:10.3390/stats7010003>. Supports default and formula methods for model specification, additive and tensor product splines for capturing nonlinear effects, and automatic determination of spline type based on the class of each predictor. Includes an S3 plot method for visualizing the (nonlinear) model terms, an S3 predict method for forming predictions from a fit model, and an S3 summary method for conducting significance testing using the Bayesian interpretation of a smoothing spline.
Implementations of MOSUM-based statistical procedures and algorithms for detecting multiple changes in the mean. This comprises the MOSUM procedure for estimating multiple mean changes from Eichinger and Kirch (2018) <doi:10.3150/16-BEJ887> and the multiscale algorithmic extension from Cho and Kirch (2022) <doi:10.1007/s10463-021-00811-5>, as well as the bootstrap procedure for generating confidence intervals about the locations of change points as proposed in Cho and Kirch (2022) <doi:10.1016/j.csda.2022.107552>. See also Meier, Kirch and Cho (2021) <doi:10.18637/jss.v097.i08> which accompanies the R package.
This package provides a collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.
There are several functions to implement the method for analysis in a randomized clinical trial with strata with following key features. A stratified Mann-Whitney estimator addresses the comparison between two randomized groups for a strictly ordinal response variable. The multivariate vector of such stratified Mann-Whitney estimators for multivariate response variables can be considered for one or more response variables such as in repeated measurements and these can have missing completely at random (MCAR) data. Non-parametric covariance adjustment is also considered with the minimal assumption of randomization. The p-value for hypothesis test and confidence interval are provided.
This package provides a tool for survival analysis using a discrete time approach with ensemble binary classification. spect provides a simple interface consistent with commonly used R data analysis packages, such as caret', a variety of parameter options to help facilitate search automation, a high degree of transparency to the end-user - all intermediate data sets and parameters are made available for further analysis and useful, out-of-the-box visualizations of model performance. Methods for transforming survival data into discrete-time are adapted from the autosurv package by Suresh et al., (2022) <doi:10.1186/s12874-022-01679-6>.
ISLET is a method to conduct signal deconvolution for general -omics data. It can estimate the individual-specific and cell-type-specific reference panels, when there are multiple samples observed from each subject. It takes the input of the observed mixture data (feature by sample matrix), and the cell type mixture proportions (sample by cell type matrix), and the sample-to-subject information. It can solve for the reference panel on the individual-basis and conduct test to identify cell-type-specific differential expression (csDE
) genes. It also improves estimated cell type mixture proportions by integrating personalized reference panels.
This package implements a very fast C++ algorithm to quickly bootstrap receiver operating characteristics (ROC) curves and derived performance metrics, including the area under the curve (AUC) and the partial area under the curve as well as the true and false positive rate. The analysis of paired receiver operating curves is supported as well, so that a comparison of two predictors is possible. You can also plot the results and calculate confidence intervals. On a typical desktop computer the time needed for the calculation of 100000 bootstrap replicates given 500 observations requires time on the order of magnitude of one second.
Density, distribution function, quantile function, and random generation for the generalized Beta and Beta prime distributions. The family of generalized Beta distributions is conjugate for the Bayesian binomial model, and the generalized Beta prime distribution is the posterior distribution of the relative risk in the Bayesian two Poisson samples model when a Gamma prior is assigned to the Poisson rate of the reference group and a Beta prime prior is assigned to the relative risk. References: Laurent (2012) <doi:10.1214/11-BJPS139>, Hamza & Vallois (2016) <doi:10.1016/j.spl.2016.03.014>, Chen & Novick (1984) <doi:10.3102/10769986009002163>.
Run simple direct gravitational N-body simulations. The package can access different external N-body simulators (e.g. GADGET-4 by Springel et al. (2021) <doi:10.48550/arXiv.2010.03567>
), but also has a simple built-in simulator. This default simulator uses a variable block time step and lets the user choose between a range of integrators, including 4th and 6th order integrators for high-accuracy simulations. Basic top-hat smoothing is available as an option. The code also allows the definition of background particles that are fixed or in uniform motion, not subject to acceleration by other particles.
This package provides small area estimation for count data type and gives option whether to use covariates in the estimation or not. By implementing Empirical Bayes (EB) Poisson-Gamma model, each function returns EB estimators and mean squared error (MSE) estimators for each area. The EB estimators without covariates are obtained using the model proposed by Clayton & Kaldor (1987) <doi:10.2307/2532003>, the EB estimators with covariates are obtained using the model proposed by Wakefield (2006) <doi:10.1093/biostatistics/kxl008> and the MSE estimators are obtained using Jackknife method by Jiang et. al. (2002) <doi:10.1214/aos/1043351257>.
This package provides a suite of auxiliary functions that enhance time series estimation and forecasting, including a robust anomaly detection routine based on Chen and Liu (1993) <doi:10.2307/2290724> (imported and wrapped from the tsoutliers package), utilities for managing calendar and time conversions, performance metrics to assess both point forecasts and distributional predictions, advanced simulation by allowing the generation of time series componentsâ such as trend, seasonal, ARMA, irregular, and anomaliesâ in a modular fashion based on the innovations form of the state space model and a number of transformation methods including Box-Cox, Logit, Softplus-Logit and Sigmoid.
Suffix Array Kernel Smoothing (see https://academic.oup.com/bioinformatics/article-abstract/35/20/3944/5418797), or SArKS
, identifies sequence motifs whose presence correlates with numeric scores (such as differential expression statistics) assigned to the sequences (such as gene promoters). SArKS
smooths over sequence similarity, quantified by location within a suffix array based on the full set of input sequences. A second round of smoothing over spatial proximity within sequences reveals multi-motif domains. Discovered motifs can then be merged or extended based on adjacency within MMDs. False positive rates are estimated and controlled by permutation testing.
Fit restricted mean models for the conditional association between an exposure and an outcome, given covariates. Three methods are implemented: O-estimation, where a nuisance model for the association between the covariates and the outcome is used; E-estimation where a nuisance model for the association between the covariates and the exposure is used, and doubly robust (DR) estimation where both nuisance models are used. In DR-estimation, the estimates will be consistent when at least one of the nuisance models is correctly specified, not necessarily both. For more information, see Zetterqvist and Sjölander (2015) <doi:10.1515/em-2014-0021>.
This package provides a friendly (flexible) Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithm in a modular way allowing users to specify automatic convergence checker, personalized transition kernels, and out-of-the-box multiple MCMC chains using parallel computing. Most of the methods implemented in this package can be found in Brooks et al. (2011, ISBN 9781420079425). Among the methods included, we have: Haario (2001) <doi:10.1007/s11222-011-9269-5> Adaptive Metropolis, Vihola (2012) <doi:10.1007/s11222-011-9269-5> Robust Adaptive Metropolis, and Thawornwattana et al. (2018) <doi:10.1214/17-BA1084> Mirror transition kernels.
This is a set of functions to retrieve information about GIMMS NDVI3g files currently available online; download (and re-arrange, in the case of NDVI3g.v0) the half-monthly data sets; import downloaded files from ENVI binary (NDVI3g.v0) or NetCDF
format (NDVI3g.v1) directly into R based on the widespread raster package; conduct quality control; and generate monthly composites (e.g., maximum values) from the half-monthly input data. As a special gimmick, a method is included to conveniently apply the Mann-Kendall trend test upon Raster* images, optionally featuring trend-free pre-whitening to account for lag-1 autocorrelation.
This package provides a method for detecting outliers with a Kalman filter on impulsed noised outliers and prediction on cleaned data. kfino is a robust sequential algorithm allowing to filter data with a large number of outliers. This algorithm is based on simple latent linear Gaussian processes as in the Kalman Filter method and is devoted to detect impulse-noised outliers. These are data points that differ significantly from other observations. ML (Maximization Likelihood) and EM (Expectation-Maximization algorithm) algorithms were implemented in kfino'. The method is described in full details in the following arXiv
e-Print: <arXiv:2208.00961>
.
This package provides methods for estimating and utilizing the multivariate generalized propensity score (mvGPS
) for multiple continuous exposures described in Williams, J.R, and Crespi, C.M. (2020) <arxiv:2008.13767>. The methods allow estimation of a dose-response surface relating the joint distribution of multiple continuous exposure variables to an outcome. Weights are constructed assuming a multivariate normal density for the marginal and conditional distribution of exposures given a set of confounders. Confounders can be different for different exposure variables. The weights are designed to achieve balance across all exposure dimensions and can be used to estimate dose-response surfaces.
This R package provides an implementation of multivariate extensions of a well-known fractal analysis technique, Detrended Fluctuations Analysis (DFA; Peng et al., 1995<doi:10.1063/1.166141>), for multivariate time series: multivariate DFA (mvDFA
). Several coefficients are implemented that take into account the correlation structure of the multivariate time series to varying degrees. These coefficients may be used to analyze long memory and changes in the dynamic structure that would by univariate DFA. Therefore, this R package aims to extend and complement the original univariate DFA (Peng et al., 1995) for estimating the scaling properties of nonstationary time series.