Calculate multiple statistics with confidence intervals for matched case-control data including risk difference, risk ratio, relative difference, and the odds ratio. Results are equivalent to those from Stata', and you can choose how to format your input data. Methods used are those described on page 56 the Stata documentation for "Epitab - Tables for Epidemologists" <https://www.stata.com/manuals/repitab.pdf>.
Visualizes panel data. It has three main functionalities: (1) it plots the treatment status and missing values in a panel dataset; (2) it visualizes the temporal dynamics of a main variable of interest; (3) it depicts the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. For details, see <doi:10.18637/jss.v107.i07>.
This package provides functions to load Research Patient Data Registry ('RPDR') text queries from Partners Healthcare institutions into R. The package also provides helper functions to manipulate data and execute common procedures such as finding the closest radiological exams considering a given timepoint, or creating a DICOM header database from the downloaded images. All functionalities are parallelized for fast and efficient analyses.
It is often useful when developing an R package to track the relationship between functions in order to appropriately test and track changes. This package generates a graph of the relationship between all R functions in a package. It can also be used on any directory containing .R files which can be very useful for shiny apps or other non-package workflows.
These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.
The superdiag package provides a comprehensive test suite for testing Markov Chain nonconvergence. It integrates five standard empirical MCMC convergence diagnostics (Gelman-Rubin, Geweke, Heidelberger-Welch, Raftery-Lewis, and Hellinger distance) and plotting functions for trace plots and density histograms. The functions of the package can be used to present all diagnostic statistics and graphs at once for conveniently checking MCMC nonconvergence.
Efficient tabulation with Stata-like output. For each unique value of the variable, it shows the number of observations with that value, proportion of observations with that value, and cumulative proportion, in descending order of frequency. Accepts data.table, tibble, or data.frame as input. Efficient with big data: if you give it a data.table, tab()
uses data.table syntax.
This package contains several utility functions for manipulating tensor-valued data (centering, multiplication from a single mode etc.) and the implementations of the following blind source separation methods for tensor-valued data: tPCA
', tFOBI
', tJADE
', k-tJADE
', tgFOBI
', tgJADE
', tSOBI
', tNSS.SD
', tNSS.JD
', tNSS.TD.JD
', tPP
and tTUCKER
'.
Determine the path of the executing script. Compatible with several popular GUIs: Rgui', RStudio', Positron', VSCode', Jupyter', Emacs', and Rscript (shell). Compatible with several functions and packages: source()
', sys.source()
', debugSource()
in RStudio', compiler::loadcmp()
', utils::Sweave()
', box::use()
', knitr::knit()
', plumber::plumb()
', shiny::runApp()
', package:targets', and testthat::source_file()
'.
This package provides a collection of high-performance functions for the triangular distribution that consists of the probability density function, cumulative distribution function, quantile function, random variate generator, moment generating function, characteristic function, and expected shortfall function. References: Samuel Kotz, Johan Ren Van Dorp (2004) <doi:10.1142/5720> and Acerbi, Carlo and Tasche, Dirk. (2002) <doi:10.1111/1468-0300.00091>.
Non- and semiparametric regression for generalized additive, partial linear, and varying coefficient models as well as their combinations via smoothed backfitting. Based on Roca-Pardinas J and Sperlich S (2010) <doi:10.1007/s11222-009-9130-2>; Mammen E, Linton O and Nielsen J (1999) <doi:10.1214/aos/1017939138>; Lee YK, Mammen E, Park BU (2012) <doi:10.1214/12-AOS1026>.
This package provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.
PiGX RNAseq is an analysis pipeline for preprocessing and reporting for RNA sequencing experiments. It is easy to use and produces high quality reports. The inputs are reads files from the sequencing experiment, and a configuration file which describes the experiment. In addition to quality control of the experiment, the pipeline produces a differential expression report comparing samples in an easily configurable manner.
juliex is a concurrent executor for Rust futures. It is implemented as a threadpool executor using a single, shared queue. Algorithmically, it is very similar to the Threadpool executor provided by the futures crate. The main difference is that juliex uses a crossbeam channel and performs a single allocation per spawned future, whereas the futures Threadpool uses std concurrency primitives and multiple allocations.
The Readline library provides a set of functions for use by applications that allow users to edit command lines as they are typed in. Both Emacs and vi editing modes are available. The Readline library includes additional functions to maintain a list of previously-entered command lines, to recall and perhaps reedit those lines, and perform csh-like history expansion on previous commands.
Implementations of algorithms for data analysis based on the rough set theory (RST) and the fuzzy rough set theory (FRST). We not only provide implementations for the basic concepts of RST and FRST but also popular algorithms that derive from those theories. The methods included in the package can be divided into several categories based on their functionality: discretization, feature selection, instance selection, rule induction and classification based on nearest neighbors. RST was introduced by ZdzisÅ aw Pawlak in 1982 as a sophisticated mathematical tool to model and process imprecise or incomplete information. By using the indiscernibility relation for objects/instances, RST does not require additional parameters to analyze the data. FRST is an extension of RST. The FRST combines concepts of vagueness and indiscernibility that are expressed with fuzzy sets (as proposed by Zadeh, in 1965) and RST.
Estimates heterogeneous effects in factorial (and conjoint) models. The methodology employs a Bayesian finite mixture of regularized logistic regressions, where moderators can affect each observation's probability of group membership and a sparsity-inducing prior fuses together levels of each factor while respecting ANOVA-style sum-to-zero constraints. Goplerud, Imai, and Pashley (2024) <doi:10.48550/ARXIV.2201.01357> provide further details.
Two-Step Lasso (TS-Lasso) and compound minimum methods to recover the abundance of missing peaks in mass spectrum analysis. TS-Lasso is an imputation method that handles various types of missing peaks simultaneously. This package provides the procedure to generate missing peaks (or data) for simulation study, as well as a tool to estimate and visualize the proportion of missing at random.
Estimate natural mortality (M) throughout the life history for organisms, mainly fish and invertebrates, based on gnomonic interval approach proposed by Caddy (1996) <doi:10.1051/alr:1996023> and Martinez-Aguilar et al. (2005) <doi:10.1016/j.fishres.2004.04.008>. It includes estimation of duration of each gnomonic interval (life stage), the constant probability of death (G), and some basic plots.
This package implements a one-sector Armington-CES gravity model with general equilibrium (GE) effects. This model is designed to analyze international and domestic trade by capturing the impacts of trade costs and policy changes within a general equilibrium framework. Additionally, it includes a local parameter to run simulations on productivity. The package provides functions for calibration, simulation, and analysis of the model.
Fast scalable Gaussian process approximations, particularly well suited to spatial (aerial, remote-sensed) and environmental data, described in more detail in Katzfuss and Guinness (2017) <arXiv:1708.06302>
. Package also contains a fast implementation of the incomplete Cholesky decomposition (IC0), based on Schaefer et al. (2019) <arXiv:1706.02205>
and MaxMin
ordering proposed in Guinness (2018) <arXiv:1609.05372>
.
Computes bilateral and multilateral index numbers. It has support for many standard bilateral indexes as well as multilateral index number methods such as GEKS, GEKS-Tornqvist (or CCDI), Geary-Khamis and the weighted time product dummy (for details on these methods see Diewert and Fox (2020) <doi:10.1080/07350015.2020.1816176>). It also supports updating of multilateral indexes using several splicing methods.
An implementation of corrected sandwich variance (CSV) estimation method for making inference of marginal hazard ratios (HR) in inverse probability weighted (IPW) Cox model without and with clustered data, proposed by Shu, Young, Toh, and Wang (2019) in their paper under revision for Biometrics. Both conventional inverse probability weights and stabilized weights are implemented. Logistic regression model is assumed for propensity score model.
Analysis of DNA copy number in single cells using custom genome-wide targeted DNA sequencing panels for the Mission Bio Tapestri platform. Users can easily parse, manipulate, and visualize datasets produced from the automated Tapestri Pipeline', with support for normalization, clustering, and copy number calling. Functions are also available to deconvolute multiplexed samples by genotype and parsing barcoded reads from exogenous lentiviral constructs.