HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular `.tsv` files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro `.matrix` and `.bed` files.
Prognostic Enrichment is a clinical trial strategy of evaluating an intervention in a patient population with a higher rate of the unwanted event than the broader patient population (R. Temple (2010) <DOI:10.1038/clpt.2010.233>). A higher event rate translates to a lower sample size for the clinical trial, which can have both practical and ethical advantages. This package is a tool to help evaluate biomarkers for prognostic enrichment of clinical trials.
Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2019) <doi:10.1080/10618600.2017.1407325> and the R package in Umlauf, Klein, Simon, Zeileis (2021) <doi:10.18637/jss.v100.i04>.
For identifying, estimating, and plotting descriptive multidimensional item response theory models, restricted to 3D and dichotomous or polytomous data that fit the two-parameter logistic model or the graded response model. The method is foremost explorative and centered around the plot function that exposes item characteristics and constructs, represented by vector arrows, located in a three-dimensional interactive latent space. The results can be useful for item-level analysis as well as test development.
This package provides methods for fitting nonstationary Gaussian process models by spatial deformation, as introduced by Sampson and Guttorp (1992) <doi:10.1080/01621459.1992.10475181>, and by dimension expansion, as introduced by Bornn et al. (2012) <doi:10.1080/01621459.2011.646919>. Low-rank thin-plate regression splines, as developed in Wood, S.N. (2003) <doi:10.1111/1467-9868.00374>, are used to either transform co-ordinates or create new latent dimensions.
This package provides easy access to tidy education finance data using Bellwether's methodology to combine NCES F-33 Survey, Census Bureau Small Area Income Poverty Estimates (SAIPE), and community data from the ACS 5-Year Estimates. The package simplifies downloading, caching, and filtering education finance data by year and state, enabling researchers and analysts to explore K-12 education funding patterns, revenue sources, expenditure categories, and demographic factors across U.S. school districts.
Quantify the serial correlation across lags of a given functional time series using the autocorrelation function and a partial autocorrelation function for functional time series proposed in Mestre et al. (2021) <doi:10.1016/j.csda.2020.107108>. The autocorrelation functions are based on the L2 norm of the lagged covariance operators of the series. Functions are available for estimating the distribution of the autocorrelation functions under the assumption of strong functional white noise.
An R API to MET Norway's Frost API <https://frost.met.no/index.html> to retrieve data as data frames. The Frost API, and the underlying data, is made available by the Norwegian Meteorological Institute (MET Norway). The data and products are distributed under the Norwegian License for Open Data 2.0 (NLOD) <https://data.norge.no/nlod/en/2.0> and Creative Commons 4.0 <https://creativecommons.org/licenses/by/4.0/>.
Routines that allow the user to run goodness of fit tests based on empirical distribution functions for formal model evaluation in a general likelihood model. In addition, functions are provided to test if a sample follows Normal or Gamma distributions, validate the normality assumptions in a linear model, and examine the appropriateness of a Gamma distribution in generalized linear models with various link functions. Michael Arthur Stephens (1976) <http://www.jstor.org/stable/2958206>.
Organize a so-called ragged array as generalized arrays, which is simply an array with sub-dimensions denoting the subdivision of dimensions (grouping of members within dimensions). By the margins (names of dimensions and sub-dimensions) in generalized arrays, operators and utility functions provided in this package automatically match the margins, doing map-reduce style parallel computation along margins. Generalized arrays are also cooperative to R's native functions that work on simple arrays.
Easy wrangling and model-free analysis of microbial growth curve data, as commonly output by plate readers. Tools for reshaping common plate reader outputs into tidy formats and merging them with design information, making data easy to work with using gcplyr and other packages. Also streamlines common growth curve processing steps, like smoothing and calculating derivatives, and facilitates model-free characterization and analysis of growth data. See methods at <https://mikeblazanin.github.io/gcplyr/>.
This package provides methods for processing spatial data for decision-making. This package is an R implementation of methods provided by the open source software GeoFIS <https://www.geofis.org> (Leroux et al. 2018) <doi:10.3390/agriculture8060073>. The main functionalities are the management zone delineation (Pedroso et al. 2010) <doi:10.1016/j.compag.2009.10.007> and data aggregation (Mora-Herrera et al. 2020) <doi:10.1016/j.compag.2020.105624>.
This package provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>.
It gathers information, meta-data and scripts in a two-part Henry-Stewart talk by Zhao (2009, <doi:10.69645/DCRY5578>), which showcases analysis in aspects such as testing of polymorphic variant(s) for Hardy-Weinberg equilibrium, association with trait using genetic and statistical models as well as Bayesian implementation, power calculation in study design and genetic annotation. It also covers R integration with the Linux environment, GitHub, package creation and web applications.
The model is high-dimensional vector autoregression with measurement error, also known as linear gaussian state-space model. Provable sparse expectation-maximization algorithm is provided for the estimation of transition matrix and noise variances. Global and simultaneous testings are implemented for transition matrix with false discovery rate control. For more information, see the accompanying paper: Lyu, X., Kang, J., & Li, L. (2023). "Statistical inference for high-dimensional vector autoregression with measurement error", Statistica Sinica.
Calculates 3D lacunarity from voxel data. It is designed for use with point clouds generated from Light Detection And Ranging (LiDAR) scans in order to measure the spatial heterogeneity of 3-dimensional structures such as forest stands. It provides fast C++ functions to efficiently bin point cloud data into voxels and calculate lacunarity using different variants of the gliding-box algorithm originated by Allain & Cloitre (1991) <doi:10.1103/PhysRevA.44.3552>.
Extension of the mgcv package, providing visual tools for Generalized Additive Models that exploit the additive structure of such models, scale to large data sets and can be used in conjunction with a wide range of response distributions. The focus is providing visual methods for better understanding the model output and for aiding model checking and development beyond simple exponential family regression. The graphical framework is based on the layering system provided by ggplot2'.
Solve scalar-on-function linear models, including generalized linear mixed effect model and quantile linear regression model, and bias correction estimation methods due to measurement error. Details about the measurement error bias correction methods, see Luan et al. (2023) <doi:10.48550/arXiv.2305.12624>, Tekwe et al. (2022) <doi:10.1093/biostatistics/kxac017>, Zhang et al. (2023) <doi:10.5705/ss.202021.0246>, Tekwe et al. (2019) <doi:10.1002/sim.8179>.
This package provides a suite of diagnostic tools for univariate point processes. This includes tools for simulating and fitting both common and more complex temporal point processes. We also include functions to visualise these point processes and collect existing diagnostic tools of Brown et al. (2002) <doi:10.1162/08997660252741149> and Wu et al. (2021) <doi:10.1002/9781119821588.ch7>, which can be used to assess the fit of a chosen point process model.
Evaluation of the pdf and the cdf of the univariate, noncentral, p-generalized normal distribution. Sampling from the univariate, noncentral, p-generalized normal distribution using either the p-generalized polar method, the p-generalized rejecting polar method, the Monty Python method, the Ziggurat method or the method of Nardon and Pianca. The package also includes routines for the simulation of the bivariate, p-generalized uniform distribution and the simulation of the corresponding angular distribution.
This package provides a robust and powerful empirical Bayesian approach is developed for replicability analysis of two large-scale experimental studies. The method controls the false discovery rate by using the joint local false discovery rate based on the replicability null as the test statistic. An EM algorithm combined with a shape constraint nonparametric method is used to estimate unknown parameters and functions. [Li, Y. et al., (2024), <doi:10.1371/journal.pgen.1011423>].
This package creates classifier for binary outcomes using Adaptive Boosting (AdaBoost) algorithm on decision stumps with a fast C++ implementation. For a description of AdaBoost, see Freund and Schapire (1997) <doi:10.1006/jcss.1997.1504>. This type of classifier is nonlinear, but easy to interpret and visualize. Feature vectors may be a combination of continuous (numeric) and categorical (string, factor) elements. Methods for classifier assessment, predictions, and cross-validation also included.
Separate a data frame in two based on key columns. The function unjoin() provides an inside-out version of a nested data frame. This is used to identify duplication and normalize it (in the database sense) by linking two tables with the redundancy removed. This is a basic requirement for detecting topology within spatial structures that has motivated the need for this package as a building block for workflows within more applied projects.
This package provides a toolbox of common robust statistical tests, including robust descriptives, robust t-tests, and robust ANOVA. It is also available as a module for jamovi (see <https://www.jamovi.org> for more information). Walrus is based on the WRS2 package by Patrick Mair, which is in turn based on the scripts and work of Rand Wilcox. These analyses are described in depth in the book Introduction to Robust Estimation & Hypothesis Testing'.