Example data sets to run the example problems from causal inference textbooks. Currently, contains data sets for Huntington-Klein, Nick (2021 and 2025) "The Effect" <https://theeffectbook.net>, first and second edition, Cunningham, Scott (2021 and 2025, ISBN-13: 978-0-300-25168-5) "Causal Inference: The Mixtape", and Hernán, Miguel and James Robins (2020) "Causal Inference: What If" <https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/>.
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Exploring, analyzing, and manipulating General Transit Feed Specification (GTFS) files, which represent public transportation schedules and geographic data. The package allows users to filter data by routes, trips, stops, and time, generate spatial visualizations, and perform detailed analyses of transit networks, including headway, dwell times, and route frequencies. Designed for transit planners, researchers, and data analysts, GTFSwizard integrates functionalities from popular packages to enable efficient GTFS data manipulation and visualization.
This package provides tools for plotting gene clusters and transcripts by importing data from GenBank
, FASTA, and GFF files. It performs BLASTP and MUMmer alignments [Altschul et al. (1990) <doi:10.1016/S0022-2836(05)80360-2>; Delcher et al. (1999) <doi:10.1093/nar/27.11.2369>] and displays results on gene arrow maps. Extensive customization options are available, including legends, labels, annotations, scales, colors, tooltips, and more.
Reads annual financial reports including assets, liabilities, dividends history, stockholder composition and much more from Bovespa's DFP, FRE and FCA systems <http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm>. These are web based interfaces for all financial reports of companies traded at Bovespa. The package is specially designed for large scale data importation, keeping a tabular (long) structure for easier processing.
Computes individual causes of death and population cause-specific mortality fractions using the InSilicoVA
algorithm from McCormick
et al. (2016) <DOI:10.1080/01621459.2016.1152191>. It uses data derived from verbal autopsy (VA) interviews, in a format similar to the input of the widely used InterVA
method. This package provides general model fitting and customization for InSilicoVA
algorithm and basic graphical visualization of the output.
Multiple tools are now available for inferring the personalised germ line set from an adaptive immune receptor repertoire. Output from these tools is converted to a single format and supplemented with rich data such as usage and characterisation of novel germ line alleles. This data can be particularly useful when considering the validity of novel inferences. Use of the analysis provided is described in <doi:10.3389/fimmu.2019.00435>.
Web application using shiny for the SSD (Species Sensitivity Distribution) module of the MOSAIC (MOdeling and StAtistical
tools for ecotoxICology
) platform. It estimates the Hazardous Concentration for x% of the species (HCx) from toxicity values that can be censored and provides various plotting options for a better understanding of the results. See our companion paper Kon Kam King et al. (2014) <doi:10.48550/arXiv.1311.5772>
.
The accumulation of single-cell RNA-seq ('scRNA-seq
') studies highlights the potential benefits of integrating multiple datasets. By augmenting sample sizes and enhancing analytical robustness, integration can lead to more insightful biological conclusions. However, challenges arise due to the inherent diversity and batch discrepancies within and across studies. SCIntRuler
', a novel R package, addresses these challenges by guiding the integration of multiple scRNA-seq
datasets.
Tidal analysis of evenly spaced observed time series (time step 1 to 60 min) with or without shorter gaps using the harmonic representation of inequalities. The analysis should preferably cover an observation period of at least 19 years. For shorter periods low frequency constituents are not taken into account, in accordance with the Rayleigh-Criterion. The main objective of this package is to synthesize or predict a tidal time series.
This package provides tools for the statistical analysis of regular vine copula models, see Aas et al. (2009) <doi:10.1016/j.insmatheco.2007.02.001> and Dissman et al. (2013) <doi:10.1016/j.csda.2012.08.010>. The package includes tools for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.
This package implements inferential methods to compare gene lists in terms of their biological meaning as expressed in the GO. The compared gene lists are characterized by cross-tabulation frequency tables of enriched GO items. Dissimilarity between gene lists is evaluated using the Sorensen-Dice index. The fundamental guiding principle is that two gene lists are taken as similar if they share a great proportion of common enriched GO items.
This package provides a random-effects stochastic model that allows quick detection of clonal dominance events from clonal tracking data collected in gene therapy studies. Starting from the Ito-type equation describing the dynamics of cells duplication, death and differentiation at clonal level, we first considered its local linear approximation as the base model. The parameters of the base model, which are inferred using a maximum likelihood approach, are assumed to be shared across the clones. Although this assumption makes inference easier, in some cases it can be too restrictive and does not take into account possible scenarios of clonal dominance. Therefore we extended the base model by introducing random effects for the clones. In this extended formulation the dynamic parameters are estimated using a tailor-made expectation maximization algorithm. Further details on the methods can be found in L. Del Core et al., (2022) <doi:10.1101/2022.05.31.494100>.
Imputation of missing numerical outcomes for a longitudinal trial with protocol deviations. The package uses distinct treatment arm-based assumptions for the unobserved data, following the general algorithm of Carpenter, Roger, and Kenward (2013) <doi:10.1080/10543406.2013.834911>, and the causal model of White, Royes and Best (2020) <doi:10.1080/10543406.2019.1684308>. Sensitivity analyses to departures from these assumptions can be done by the Delta method of Roger. The program uses the same algorithm as the mimix Stata package written by Suzie Cro, with additional coding for the causal model and delta method. The reference-based methods are jump to reference (J2R), copy increments in reference (CIR), copy reference (CR), and the causal model, all of which must specify the reference treatment arm. Other methods are missing at random (MAR) and the last mean carried forward (LMCF). Individual-specific imputation methods (and their reference groups) can be specified.
Implementations of several robust nonparametric two-sample tests for location or scale differences. The test statistics are based on robust location and scale estimators, e.g. the sample median or the Hodges-Lehmann estimators as described in Fried & Dehling (2011) <doi:10.1007/s10260-011-0164-1>. The p-values can be computed via the permutation principle, the randomization principle, or by using the asymptotic distributions of the test statistics under the null hypothesis, which ensures (approximate) distribution independence of the test decision. To test for a difference in scale, we apply the tests for location difference to transformed observations; see Fried (2012) <doi:10.1016/j.csda.2011.02.012>. Random noise on a small range can be added to the original observations in order to hold the significance level on data from discrete distributions. The location tests assume homoscedasticity and the scale tests require the location parameters to be zero.
Casting metadata for REDCap database creation and handling of castellated data using repeated instruments and longitudinal projects in REDCap'. Keeps a focused data export approach, by allowing to only export required data from the database. Also for casting new REDCap databases based on datasets from other sources. Originally forked from the R part of REDCapRITS
by Paul Egeler. See <https://github.com/pegeler/REDCapRITS>
. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies, providing 1) an intuitive interface for validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for data integration and interoperability with external sources (Harris et al (2009) <doi:10.1016/j.jbi.2008.08.010>; Harris et al (2019) <doi:10.1016/j.jbi.2019.103208>).
Offers meta programming style tools to generate configurable R functions that produce HTML forms based on table input and SQL meta data. Also generates functions for collecting the parameters of those HTML forms after they are submitted. Useful for quickly generating HTML forms based on existing SQL tables. To use the resultant functions, the output files containing those functions must be read into the R environment (perhaps using base::source()
).
Generates DNA sequences based on Markov model techniques for matched sequences. This can be generalized to several sequences. The sequences (taxa) are then arranged in an evolutionary tree (phylogenetic tree) depicting how taxa diverge from their common ancestors. This gives the tests and estimation methods for the parameters of different models. Standard phylogenetic methods assume stationarity, homogeneity and reversibility for the Markov processes, and often impose further restrictions on the parameters.
In competing risks regression, the proportional subdistribution hazards (PSH) model is popular for its direct assessment of covariate effects on the cumulative incidence function. This package allows for both penalized and unpenalized PSH regression in linear time using a novel forward-backward scan. Penalties include Ridge, Lease Absolute Shrinkage and Selection Operator (LASSO), Smoothly Clipped Absolute Deviation (SCAD), Minimax Concave Plus (MCP), and elastic net <doi: 10.32614/RJ-2021-010>.
This package implements fast, scalable optimization algorithms for fitting generalized principal components analysis (GLM-PCA) models, as described in "A Generalization of Principal Components Analysis to the Exponential Family" Collins M, Dasgupta S, Schapire RE (2002, ISBN:9780262271738), and subsequently "Feature Selection and Dimension Reduction for Single-Cell RNA-Seq Based on a Multinomial Model" Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) <doi:10.1186/s13059-019-1861-6>.
Given a landscape resistance surface, creates minimum planar graph (Fall et al. (2007) <doi:10.1007/s10021-007-9038-7>) and grains of connectivity (Galpern et al. (2012) <doi:10.1111/j.1365-294X.2012.05677.x>) models that can be used to calculate effective distances for landscape connectivity at multiple scales. Documentation is provided by several vignettes, and a paper (Chubaty, Galpern & Doctolero (2020) <doi:10.1111/2041-210X.13350>).
This package provides plotting functions for visualizing pedigrees in behavior genetics and kinship research. The package complements BGmisc [Garrison et al. (2024) <doi:10.21105/joss.06203>] by rendering pedigrees using the ggplot2 framework and offers a modern alternative to the base-graphics pedigree plot in kinship2 [Sinnwell et al. (2014) <doi:10.1159/000363105>]. Features include support for duplicated individuals, complex mating structures, integration with simulated pedigrees, and layout customization.
Kernel density estimation with hexagonal grid for bivariate data. Hexagonal grid has many beneficial properties like equidistant neighbours and less edge bias, making it better for spatial analyses than the more commonly used rectangular grid. Carr, D. B. et al. (1987) <doi:10.2307/2289444>. Diggle, P. J. (2010) <doi:10.1201/9781420072884>. Hill, B. (2017) <https://blog.bruce-hill.com/meandering-triangles>. Jones, M. C. (1993) <doi:10.1007/BF00147776>.
Predicts any variable in any categorical dataset for given values of predictor variables. If a dataset contains 4 variables, then any variable can be predicted based on the values of the other three variables given by the user. The user can upload their own datasets and select what variable they want to predict. A handsontable is provided to enter the predictor values and also accuracy of the prediction is also shown.