Semiparametric regression models on the cumulative incidence function for interval-censored competing risks data as described in Bakoyannis, Yu, & Yiannoutsos (2017) /doi10.1002/sim.7350 and the models with missing event types as described in Park, Bakoyannis, Zhang, & Yiannoutsos (2021) \doi10.1093/biostatistics/kxaa052. The proportional subdistribution hazards model (Fine-Gray model), the proportional odds model, and other models that belong to the class of semiparametric generalized odds rate transformation models.
This package provides functions implementing multivariate state space models for purposes of time series analysis and forecasting. The focus of the package is on multivariate models, such as Vector Exponential Smoothing, Vector ETS (Error-Trend-Seasonal model) etc. It currently includes Vector Exponential Smoothing (VES, de Silva et al., 2010, <doi:10.1177/1471082X0901000401>), Vector ETS (Svetunkov et al., 2023, <doi:10.1016/j.ejor.2022.04.040>) and simulation function for VES.
This package provides tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. The goal is to help users interpret the topics in their LDA topic model.
This package provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation. * The title is a wish not a promise. By no means we expect this package to deliver everything what is needed to achieve full reproducibility and transparency, but we believe that it supports efforts in this direction.
Probabilistic Regression Trees (PRTree). Functions for fitting and predicting PRTree models with some adaptations to handle missing values. The main calculations are performed in FORTRAN', resulting in highly efficient algorithms. This package's implementation is based on the PRTree methodology described in Alkhoury, S.; Devijver, E.; Clausel, M.; Tami, M.; Gaussier, E.; Oppenheim, G. (2020) - "Smooth And Consistent Probabilistic Regression Trees" <https://proceedings.neurips.cc/paper_files/paper/2020/file/8289889263db4a40463e3f358bb7c7a1-Paper.pdf>.
This package provides a framework for estimating difference-in-differences with unpoolable data, based on Karim, Webb, Austin, and Strumpf (2024) <doi:10.48550/arXiv.2403.15910>
. Supports common or staggered adoption, multiple groups, and the inclusion of covariates. Also computes p-values for the aggregate average treatment effect on the treated via the randomization inference procedure described in MacKinnon
and Webb (2020) <doi:10.1016/j.jeconom.2020.04.024>.
BSeq-sc is a bioinformatics analysis pipeline that leverages single-cell sequencing data to estimate cell type proportion and cell type-specific gene expression differences from RNA-seq data from bulk tissue samples. This is a companion package to the publication "A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure." Baron et al. Cell Systems (2016) https://www.ncbi.nlm.nih.gov/pubmed/27667365.
This package provides e-statistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, one-sample, two-sample, and multi-sample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodness-of-fit tests, clustering based on energy distance, testing for multivariate normality, distance components (disco) for non-parametric analysis of structured data, and other energy statistics/methods are implemented.
This a package containing diverse spatial datasets for demonstrating, benchmarking and teaching spatial data analysis. It includes R data of class sf
, Spatial
, and nb
. It also contains data stored in a range of file formats including GeoJSON, ESRI Shapefile and GeoPackage. Some of the datasets are designed to illustrate specific analysis techniques. cycle_hire()
and cycle_hire_osm()
, for example, are designed to illustrate point pattern analysis techniques.
This package provides an implementation of sparse linear discriminant analysis, which is a supervised classification method for multiple classes. Various novel optimization approaches to this problem are implemented including alternating direction method of multipliers (ADMM), proximal gradient (PG) and accelerated proximal gradient (APG). Functions for performing cross validation are also supplied along with basic prediction and plotting functions. Sparse zero variance discriminant (SZVD) analysis is also included in the package.
Prognostic Enrichment is a clinical trial strategy of evaluating an intervention in a patient population with a higher rate of the unwanted event than the broader patient population (R. Temple (2010) <DOI:10.1038/clpt.2010.233>). A higher event rate translates to a lower sample size for the clinical trial, which can have both practical and ethical advantages. This package is a tool to help evaluate biomarkers for prognostic enrichment of clinical trials.
Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2019) <doi:10.1080/10618600.2017.1407325> and the R package in Umlauf, Klein, Simon, Zeileis (2021) <doi:10.18637/jss.v100.i04>.
For identifying, estimating, and plotting descriptive multidimensional item response theory models, restricted to 3D and dichotomous or polytomous data that fit the two-parameter logistic model or the graded response model. The method is foremost explorative and centered around the plot function that exposes item characteristics and constructs, represented by vector arrows, located in a three-dimensional interactive latent space. The results can be useful for item-level analysis as well as test development.
This package provides methods for fitting nonstationary Gaussian process models by spatial deformation, as introduced by Sampson and Guttorp (1992) <doi:10.1080/01621459.1992.10475181>, and by dimension expansion, as introduced by Bornn et al. (2012) <doi:10.1080/01621459.2011.646919>. Low-rank thin-plate regression splines, as developed in Wood, S.N. (2003) <doi:10.1111/1467-9868.00374>, are used to either transform co-ordinates or create new latent dimensions.
An R API to MET Norway's Frost API <https://frost.met.no/index.html> to retrieve data as data frames. The Frost API, and the underlying data, is made available by the Norwegian Meteorological Institute (MET Norway). The data and products are distributed under the Norwegian License for Open Data 2.0 (NLOD) <https://data.norge.no/nlod/en/2.0> and Creative Commons 4.0 <https://creativecommons.org/licenses/by/4.0/>.
Quantify the serial correlation across lags of a given functional time series using the autocorrelation function and a partial autocorrelation function for functional time series proposed in Mestre et al. (2021) <doi:10.1016/j.csda.2020.107108>. The autocorrelation functions are based on the L2 norm of the lagged covariance operators of the series. Functions are available for estimating the distribution of the autocorrelation functions under the assumption of strong functional white noise.
Easy wrangling and model-free analysis of microbial growth curve data, as commonly output by plate readers. Tools for reshaping common plate reader outputs into tidy formats and merging them with design information, making data easy to work with using gcplyr and other packages. Also streamlines common growth curve processing steps, like smoothing and calculating derivatives, and facilitates model-free characterization and analysis of growth data. See methods at <https://mikeblazanin.github.io/gcplyr/>.
This package provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv
preprint <doi:10.48550/arXiv.2105.03454>
.
It gathers information, meta-data and scripts in a two-part Henry-Stewart talk by Zhao (2009, <doi:10.69645/DCRY5578>), which showcases analysis in aspects such as testing of polymorphic variant(s) for Hardy-Weinberg equilibrium, association with trait using genetic and statistical models as well as Bayesian implementation, power calculation in study design and genetic annotation. It also covers R integration with the Linux environment, GitHub
, package creation and web applications.
This package provides methods for processing spatial data for decision-making. This package is an R implementation of methods provided by the open source software GeoFIS
<https://www.geofis.org> (Leroux et al. 2018) <doi:10.3390/agriculture8060073>. The main functionalities are the management zone delineation (Pedroso et al. 2010) <doi:10.1016/j.compag.2009.10.007> and data aggregation (Mora-Herrera et al. 2020) <doi:10.1016/j.compag.2020.105624>.
Organize a so-called ragged array as generalized arrays, which is simply an array with sub-dimensions denoting the subdivision of dimensions (grouping of members within dimensions). By the margins (names of dimensions and sub-dimensions) in generalized arrays, operators and utility functions provided in this package automatically match the margins, doing map-reduce style parallel computation along margins. Generalized arrays are also cooperative to R's native functions that work on simple arrays.
Routines that allow the user to run goodness of fit tests based on empirical distribution functions for formal model evaluation in a general likelihood model. In addition, functions are provided to test if a sample follows Normal or Gamma distributions, validate the normality assumptions in a linear model, and examine the appropriateness of a Gamma distribution in generalized linear models with various link functions. Michael Arthur Stephens (1976) <http://www.jstor.org/stable/2958206>.
The model is high-dimensional vector autoregression with measurement error, also known as linear gaussian state-space model. Provable sparse expectation-maximization algorithm is provided for the estimation of transition matrix and noise variances. Global and simultaneous testings are implemented for transition matrix with false discovery rate control. For more information, see the accompanying paper: Lyu, X., Kang, J., & Li, L. (2023). "Statistical inference for high-dimensional vector autoregression with measurement error", Statistica Sinica.
Calculates 3D lacunarity from voxel data. It is designed for use with point clouds generated from Light Detection And Ranging (LiDAR
) scans in order to measure the spatial heterogeneity of 3-dimensional structures such as forest stands. It provides fast C++ functions to efficiently bin point cloud data into voxels and calculate lacunarity using different variants of the gliding-box algorithm originated by Allain & Cloitre (1991) <doi:10.1103/PhysRevA.44.3552>
.