Estimation and inference methods for bounding average treatment effects (on the treated) that are valid under an unconfoundedness assumption. The bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. For more details, see the paper by Lee and Weidner (2021), "Bounding Treatment Effects by Pooling Limited Information across Observations," <arXiv:2111.05243>
.
Analysis of agreement for nominal data between two raters using the Delta model. This model is proposed as an alternative to the widespread measure Cohen kappa coefficient, which performs poorly when the marginal distributions are very asymmetric (Martin-Andres and Femia-Marzo (2004), <doi:10.1348/000711004849268>; Martin-Andres and Femia-Marzo (2008) <doi:10.1080/03610920701669884>). The package also contains a function to perform a massive analysis of multiple raters against a gold standard. A shiny app is also provided to obtain the measures of nominal agreement between two raters.
We provide an efficient implementation for two-step multi-source transfer learning algorithms in high-dimensional generalized linear models (GLMs). The elastic-net penalized GLM with three popular families, including linear, logistic and Poisson regression models, can be fitted. To avoid negative transfer, a transferable source detection algorithm is proposed. We also provides visualization for the transferable source detection results. The details of methods can be found in "Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.".
The American Association Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) BioPharma
Collaborative represents a multi-year, multi-institution effort to build a pan-cancer repository of linked clinico-genomic data. The genomic and clinical data are provided in multiple releases (separate releases for each cancer cohort with updates following data corrections), which are stored on the data sharing platform Synapse <https://www.synapse.org/>. The genieBPC
package provides a seamless way to obtain the data corresponding to each release from Synapse and to prepare datasets for analysis.
Data sets related to the Islas Malvinas /// Sets de datos relacionados a las Islas Malvinas - La Nación Argentina ratifica su legà tima e imprescriptible soberanà a sobre las islas Malvinas, Georgias del Sur y Sándwich del Sur y los espacios marà timos e insulares correspondientes, por ser parte integrante del territorio nacional. La recuperación de dichos territorios y el ejercicio pleno de la soberanà a, respetando el modo de vida de sus habitantes y conforme a los principios del Derecho Internacional, constituyen un objetivo permanente e irrenunciable del pueblo argentino.
Modelling Multivariate Binary Data with Blocks of Specific One-Factor Distribution. Variables are grouped into independent blocks. Each variable is described by two continuous parameters (its marginal probability and its dependency strength with the other block variables), and one binary parameter (positive or negative dependency). Model selection consists in the estimation of the repartition of the variables into blocks. It is carried out by the maximization of the BIC criterion by a deterministic (faster) algorithm or by a stochastic (more time consuming but optimal) algorithm. Tool functions facilitate the model interpretation.
This package provides a collection of data structures and methods for handling volumetric brain imaging data, with a focus on functional magnetic resonance imaging (fMRI
). Provides efficient representations for three-dimensional and four-dimensional neuroimaging data through sparse and dense array implementations, memory-mapped file access for large datasets, and spatial transformation capabilities. Implements methods for image resampling, spatial filtering, region of interest analysis, and connected component labeling. General introduction to fMRI
analysis can be found in Poldrack et al. (2024, "Handbook of functional MRI data analysis", <ISBN:9781108795760>).
This package provides a suite of tools for the comprehensive visualization of multi-omics data, including genomics, transcriptomics, and proteomics. Offers user-friendly functions to generate publication-quality plots, thereby facilitating the exploration and interpretation of complex biological datasets. Supports seamless integration with popular R visualization frameworks and is well-suited for both exploratory data analysis and the presentation of final results. Key formats and methods are presented in Huang, S., et al. (2024) "The Born in Guangzhou Cohort Study enables generational genetic discoveries" <doi:10.1038/s41586-023-06988-4>.
The use of overparameterization is proposed with combinatorial analysis to test a broader spectrum of possible ARIMA models. In the selection of ARIMA models, the most traditional methods such as correlograms or others, do not usually cover many alternatives to define the number of coefficients to be estimated in the model, which represents an estimation method that is not the best. The popstudy package contains several tools for statistical analysis in demography and time series based in Shryock research (Shryock et. al. (1980) <https://books.google.co.cr/books?id=8Oo6AQAAMAAJ>).
We present Platypus', an open-source software platform providing a user-friendly interface to investigate B-cell receptor and T-cell receptor repertoires from scSeq
experiments. Platypus provides a framework to automate and ease the analysis of single-cell immune repertoires while also incorporating transcriptional information involving unsupervised clustering, gene expression and gene ontology. This R version of Platypus is part of the ePlatypus
ecosystem for computational analysis of immunogenomics data: Yermanos et al. (2021) <doi:10.1093/nargab/lqab023>, Cotet et al. (2023) <doi:10.1093/bioinformatics/btad553>.
Assessment of the distributions of baseline continuous and categorical variables in randomised trials. This method is based on the Carlisle-Stouffer method with Monte Carlo simulations. It calculates p-values for each trial baseline variable, as well as combined p-values for each trial - these p-values measure how compatible are distributions of trials baseline variables with random sampling. This package also allows for graphically plotting the cumulative frequencies of computed p-values. Please note that code was partly adapted from Carlisle JB, Loadsman JA. (2017) <doi:10.1111/anae.13650>.
Processor for selected ion flow tube mass spectrometer (SIFT-MS) output file from breath analysis. It allows the filtering of the SIFT output file (i.e., variation over time of the target analyte concentration) and the following analysis for the determination of: maximum, average, and standard deviation value of target concentration measured at each exhalation, and the respiratory rate over the measurement. Additionally, it is possible to align the SIFT-MS data with other on-line techniques such as cardio pulmonary exercise test (CPET) for a comprehensive characterization of breath samples.
The purpose of this package is to identify traits in a dataset that can separate groups. This is done on two levels. First, clustering is performed, using an implementation of sparse K-means. Secondly, the generated clusters are used to predict outcomes of groups of individuals based on their distribution of observations in the different clusters. As certain clusters with separating information will be identified, and these clusters are defined by a sparse number of variables, this method can reduce the complexity of data, to only emphasize the data that actually matters.
Circular layout is an efficient way to visualise huge amounts of information. This package provides an implementation of circular layout generation in R as well as an enhancement of available software. Its flexibility is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multi-dimensional data.
This package provides a user friendly, easy to understand way of doing event history regression for marginal estimands of interest, including the cumulative incidence and the restricted mean survival, using the pseudo observation framework for estimation. For a review of the methodology, see Andersen and Pohar Perme (2010) <doi:10.1177/0962280209105020> or Sachs and Gabriel (2022) <doi:10.18637/jss.v102.i09>. The interface uses the well known formulation of a generalized linear model and allows for features including plotting of residuals, the use of sampling weights, and corrected variance estimation.
Fast and easily computes an Euclidean Minimum Spanning Tree (EMST) from data, relying on the R API for mlpack - the C++ Machine Learning Library (Curtin et. al., 2013). emstreeR
uses the Dual-Tree Boruvka (March, Ram, Gray, 2010, <doi:10.1145/1835804.1835882>), which is theoretically and empirically the fastest algorithm for computing an EMST. This package also provides functions and an S3 method for readily visualizing Minimum Spanning Trees (MST) using either the style of the base', scatterplot3d', or ggplot2 libraries; and functions to export the MST output to shapefiles.
This package implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. This includes functionalities to conduct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. In addition, tools for preparing, adjusting, and summarizing data merges are included. The package implements methods described in Enamorado, Fifield, and Imai (2019) Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records <doi:10.1017/S0003055418000783> and is available at <https://imai.fas.harvard.edu/research/linkage.html>.
An interval-valued extension of ordinary and simple kriging. Optimization of the function is based on a generalized interval distance. This creates a non-differentiable cost function that requires a differentiable approximation to the absolute value function. This differentiable approximation is optimized using a Newton-Raphson algorithm with a penalty function to impose the constraints. Analyses in the package are driven by the intsp and intgrd classes, which are interval-valued extensions of SpatialPointsDataFrame
and SpatialPixelsDataFrame
respectively. The package includes several wrappers to functions in the gstat and sp packages.
This package provides functions to standardize and whiten data, and to perform Principal Component Analysis (PCA). The main advantage of this package over alternatives like prcomp()
is, that jvcoords makes it easy to convert (additional) data between the original and the transformed coordinates. The package also provides a class coords, which can represent affine coordinate transformations. This class forms the basis of the transformations provided by the package, but can also be used independently. The implementation has been optimized to be of comparable speed (and sometimes even faster) than existing alternatives.
Implementation of a theoretically supported alternative to k-nearest neighbors for functional data to solve problems of estimating unobserved segments of a partially observed functional data sample, functional classification and outlier detection. The approximating neighbor curves are piecewise functions built from a functional sample. Instead of a distance on a function space we use a locally defined distance function that satisfies stabilization criteria. The package allows the implementation of the methodology and the replication of the results in Elà as, A., Jiménez, R. and Yukich, J. (2020) <arXiv:2007.16059>
.
Comprehensive analytical tools are provided to characterize infectious disease superspreading from contact tracing surveillance data. The underlying theoretical frameworks of this toolkit include branching process with transmission heterogeneity (Lloyd-Smith et al. (2005) <doi:10.1038/nature04153>), case cluster size distribution (Nishiura et al. (2012) <doi:10.1016/j.jtbi.2011.10.039>, Blumberg et al. (2014) <doi:10.1371/journal.ppat.1004452>, and Kucharski and Althaus (2015) <doi:10.2807/1560-7917.ES2015.20.25.21167>), and decomposition of reproduction number (Zhao et al. (2022) <doi:10.1371/journal.pcbi.1010281>).
Traditional model evaluation metrics fail to capture model performance under less than ideal conditions. This package employs techniques to evaluate models "under-stress". This includes testing models extrapolation ability, or testing accuracy on specific sub-samples of the overall model space. Details describing stress-testing methods in this package are provided in Haycock (2023) <doi:10.26076/2am5-9f67>. The other primary contribution of this package is provided to R users access to the Python library PyCaret
<https://pycaret.org/> for quick and easy access to auto-tuned machine learning models.
The main objective of cooperative games is to allocate a good among the agents involved. This package includes the most well-known allocation rules, i.e., the Shapley value, the Banzhaf value, the egalitarian rule, and the equal surplus division value. In addition, it considers the point of view of a priori unions (situations in which agents can form coalitions). For this purpose, the package includes the Owen value, the Banzhaf-Owen value, and the corresponding extensions of the egalitarian rules. All these values can be calculated exactly or estimated by sampling.
This package provides a latent, quasi-independent truncation time is assumed to be linked with the observed dependent truncation time, the event time, and an unknown transformation parameter via a structural transformation model. The transformation parameter is chosen to minimize the conditional Kendall's tau (Martin and Betensky, 2005) <doi:10.1198/016214504000001538> or the regression coefficient estimates (Jones and Crowley, 1992) <doi:10.2307/2336782>. The marginal distribution for the truncation time and the event time are completely left unspecified. The methodology is applied to survival curve estimation and regression analysis.