Impute the survival times for censored observations based on their conditional survival distributions derived from the Kaplan-Meier estimator. CondiS can replace the censored observations with the best approximations from the statistical model, allowing for direct application of machine learning-based methods. When covariates are available, CondiS is extended by incorporating the covariate information through machine learning-based regression modeling ('CondiS_X'), which can further improve the imputed survival time.
Individual gene expression patterns are encoded into a series of eigenvector patterns ('WGCNA package). Using the framework of linear model-based differential expression comparisons ('limma package), time-course expression patterns for genes in different conditions are compared and analyzed for significant pattern changes. For reference, see: Greenham K, Sartor RC, Zorich S, Lou P, Mockler TC and McClung CR. eLife. 2020 Sep 30;9(4). <doi:10.7554/eLife.58993>.
This package implements a simple, likelihood-based estimation of the reproduction number (R0) using a branching process with a Poisson likelihood. This model requires knowledge of the serial interval distribution, and dates of symptom onsets. Infectiousness is determined by weighting R0 by the probability mass function of the serial interval on the corresponding day. It is a simplified version of the model introduced by Cori et al. (2013) <doi:10.1093/aje/kwt133>.
This package contains logic for computing sparse principal components via the EESPCA method, which is based on an approximation of the eigenvector/eigenvalue identity. Includes logic to support execution of the TPower and rifle sparse PCA methods, as well as logic to estimate the sparsity parameters used by EESPCA, TPower and rifle via cross-validation to minimize the out-of-sample reconstruction error. H. Robert Frost (2021) <doi:10.1080/10618600.2021.1987254>.
Computes the sample probability value (p-value) for the estimated coefficient from a standard genome-wide univariate regression. It computes the exact finite-sample p-value under the assumption that the measured phenotype (the dependent variable in the regression) has a known Bernoulli-normal mixture distribution. Finite-sample genome-wide regression p-values (Gwrpv) with a non-normally distributed phenotype (Gregory Connor and Michael O'Neill, bioRxiv 204727 <doi:10.1101/204727>).
Implement a coherent and flexible protocol for animal color tagging. GenTag provides a simple computational routine with low CPU usage to create color sequences for animal tag. First, a single-color tag sequence is created from an algorithm selected by the user, followed by verification of the combination uniqueness. Three methods to produce color tag sequences are provided. Users can modify the main function core to allow a wide range of applications.
This package provides tools for the development of packages related to General Transit Feed Specification (GTFS) files. Establishes a standard for representing GTFS feeds using R data types. Provides fast and flexible functions to read and write GTFS feeds while sticking to this standard. Defines a basic gtfs class which is meant to be extended by packages that depend on it. And offers utility functions that support checking the structure of GTFS objects.
Semiparametric regression models on the cumulative incidence function for interval-censored competing risks data as described in Bakoyannis, Yu, & Yiannoutsos (2017) /doi10.1002/sim.7350 and the models with missing event types as described in Park, Bakoyannis, Zhang, & Yiannoutsos (2021) \doi10.1093/biostatistics/kxaa052. The proportional subdistribution hazards model (Fine-Gray model), the proportional odds model, and other models that belong to the class of semiparametric generalized odds rate transformation models.
This package provides functions implementing multivariate state space models for purposes of time series analysis and forecasting. The focus of the package is on multivariate models, such as Vector Exponential Smoothing, Vector ETS (Error-Trend-Seasonal model) etc. It currently includes Vector Exponential Smoothing (VES, de Silva et al., 2010, <doi:10.1177/1471082X0901000401>), Vector ETS (Svetunkov et al., 2023, <doi:10.1016/j.ejor.2022.04.040>) and simulation function for VES.
This package provides tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. The goal is to help users interpret the topics in their LDA topic model.
This package provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation. * The title is a wish not a promise. By no means we expect this package to deliver everything what is needed to achieve full reproducibility and transparency, but we believe that it supports efforts in this direction.
This package provides statistical methods for network meta-analysis of 1â 5 diagnostic tests to simultaneously compare multiple tests within a missing data framework, including: - Bayesian hierarchical model for network meta-analysis of multiple diagnostic tests (Ma, Lian, Chu, Ibrahim, and Chen (2018) <doi:10.1093/biostatistics/kxx025>) - Bayesian Hierarchical Summary Receiver Operating Characteristic Model for Network Meta-Analysis of Diagnostic Tests (Lian, Hodges, and Chu (2019) <doi:10.1080/01621459.2018.1476239>).
This package provides a framework for estimating difference-in-differences with unpoolable data, based on Karim, Webb, Austin, and Strumpf (2025) <doi:10.48550/arXiv.2403.15910>. Supports common or staggered adoption, multiple groups, and the inclusion of covariates. Also computes p-values for the aggregate average treatment effect on the treated via the randomization inference procedure described in MacKinnon and Webb (2020) <doi:10.1016/j.jeconom.2020.04.024>.
This package provides bindings to ImageMagick, a comprehensive image processing library. It supports many common formats (PNG, JPEG, TIFF, PDF, etc.) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio, images are automatically previewed when printed to the console, resulting in an interactive editing environment.
This package represents a collection of plotting and table output functions for data visualization. Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, histograms, box plots, (generalized) linear models, mixed effects models, principal component analysis and correlation matrices, cluster analyses, scatter plots, stacked scales, effects plots of regression models (including interaction terms) and much more. This package supports labelled data.
HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular `.tsv` files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro `.matrix` and `.bed` files.
This package provides a collection of tools for regression analysis of non-negative data, including strictly positive and zero-inflated observations, based on the class of the Box-Cox symmetric (BCS) distributions and its zero-adjusted extension. The BCS distributions are a class of flexible probability models capable of describing different levels of skewness and tail-heaviness. The package offers a comprehensive regression modeling framework, including estimation and tools for evaluating goodness-of-fit.
Prognostic Enrichment is a clinical trial strategy of evaluating an intervention in a patient population with a higher rate of the unwanted event than the broader patient population (R. Temple (2010) <DOI:10.1038/clpt.2010.233>). A higher event rate translates to a lower sample size for the clinical trial, which can have both practical and ethical advantages. This package is a tool to help evaluate biomarkers for prognostic enrichment of clinical trials.
Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2019) <doi:10.1080/10618600.2017.1407325> and the R package in Umlauf, Klein, Simon, Zeileis (2021) <doi:10.18637/jss.v100.i04>.
Calculate various cardiovascular disease risk scores from the Framingham Heart Study (FHS), the American College of Cardiology (ACC), and the American Heart Association (AHA) as described in Dâ agostino, et al (2008) <doi:10.1161/circulationaha.107.699579>, Goff, et al (2013) <doi:10.1161/01.cir.0000437741.48606.98>, and Mclelland, et al (2015) <doi:10.1016/j.jacc.2015.08.035>, and Khan, et al (2024) <doi:10.1161/CIRCULATIONAHA.123.067626>.
For identifying, estimating, and plotting descriptive multidimensional item response theory models, restricted to 3D and dichotomous or polytomous data that fit the two-parameter logistic model or the graded response model. The method is foremost explorative and centered around the plot function that exposes item characteristics and constructs, represented by vector arrows, located in a three-dimensional interactive latent space. The results can be useful for item-level analysis as well as test development.
This package provides methods for fitting nonstationary Gaussian process models by spatial deformation, as introduced by Sampson and Guttorp (1992) <doi:10.1080/01621459.1992.10475181>, and by dimension expansion, as introduced by Bornn et al. (2012) <doi:10.1080/01621459.2011.646919>. Low-rank thin-plate regression splines, as developed in Wood, S.N. (2003) <doi:10.1111/1467-9868.00374>, are used to either transform co-ordinates or create new latent dimensions.
This package provides easy access to tidy education finance data using Bellwether's methodology to combine NCES F-33 Survey, Census Bureau Small Area Income Poverty Estimates (SAIPE), and community data from the ACS 5-Year Estimates. The package simplifies downloading, caching, and filtering education finance data by year and state, enabling researchers and analysts to explore K-12 education funding patterns, revenue sources, expenditure categories, and demographic factors across U.S. school districts.
Quantify the serial correlation across lags of a given functional time series using the autocorrelation function and a partial autocorrelation function for functional time series proposed in Mestre et al. (2021) <doi:10.1016/j.csda.2020.107108>. The autocorrelation functions are based on the L2 norm of the lagged covariance operators of the series. Functions are available for estimating the distribution of the autocorrelation functions under the assumption of strong functional white noise.