The American Association Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) BioPharma Collaborative represents a multi-year, multi-institution effort to build a pan-cancer repository of linked clinico-genomic data. The genomic and clinical data are provided in multiple releases (separate releases for each cancer cohort with updates following data corrections), which are stored on the data sharing platform Synapse <https://www.synapse.org/>. The genieBPC package provides a seamless way to obtain the data corresponding to each release from Synapse and to prepare datasets for analysis.
We provide an efficient implementation for two-step multi-source transfer learning algorithms in high-dimensional generalized linear models (GLMs). The elastic-net penalized GLM with three popular families, including linear, logistic and Poisson regression models, can be fitted. To avoid negative transfer, a transferable source detection algorithm is proposed. We also provides visualization for the transferable source detection results. The details of methods can be found in "Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.".
Modelling Multivariate Binary Data with Blocks of Specific One-Factor Distribution. Variables are grouped into independent blocks. Each variable is described by two continuous parameters (its marginal probability and its dependency strength with the other block variables), and one binary parameter (positive or negative dependency). Model selection consists in the estimation of the repartition of the variables into blocks. It is carried out by the maximization of the BIC criterion by a deterministic (faster) algorithm or by a stochastic (more time consuming but optimal) algorithm. Tool functions facilitate the model interpretation.
Data sets related to the Islas Malvinas /// Sets de datos relacionados a las Islas Malvinas - La Nación Argentina ratifica su legà tima e imprescriptible soberanà a sobre las islas Malvinas, Georgias del Sur y Sándwich del Sur y los espacios marà timos e insulares correspondientes, por ser parte integrante del territorio nacional. La recuperación de dichos territorios y el ejercicio pleno de la soberanà a, respetando el modo de vida de sus habitantes y conforme a los principios del Derecho Internacional, constituyen un objetivo permanente e irrenunciable del pueblo argentino.
This package provides a collection of data structures and methods for handling volumetric brain imaging data, with a focus on functional magnetic resonance imaging (fMRI). Provides efficient representations for three-dimensional and four-dimensional neuroimaging data through sparse and dense array implementations, memory-mapped file access for large datasets, and spatial transformation capabilities. Implements methods for image resampling, spatial filtering, region of interest analysis, and connected component labeling. General introduction to fMRI analysis can be found in Poldrack et al. (2024, "Handbook of functional MRI data analysis", <ISBN:9781108795760>).
This package provides a suite of tools for the comprehensive visualization of multi-omics data, including genomics, transcriptomics, and proteomics. Offers user-friendly functions to generate publication-quality plots, thereby facilitating the exploration and interpretation of complex biological datasets. Supports seamless integration with popular R visualization frameworks and is well-suited for both exploratory data analysis and the presentation of final results. Key formats and methods are presented in Huang, S., et al. (2024) "The Born in Guangzhou Cohort Study enables generational genetic discoveries" <doi:10.1038/s41586-023-06988-4>.
The use of overparameterization is proposed with combinatorial analysis to test a broader spectrum of possible ARIMA models. In the selection of ARIMA models, the most traditional methods such as correlograms or others, do not usually cover many alternatives to define the number of coefficients to be estimated in the model, which represents an estimation method that is not the best. The popstudy package contains several tools for statistical analysis in demography and time series based in Shryock research (Shryock et. al. (1980) <https://books.google.co.cr/books?id=8Oo6AQAAMAAJ>).
Pharmacometric tools for common data analytical tasks; closed-form solutions for calculating concentrations at given times after dosing based on compartmental PK models (1-compartment, 2-compartment and 3-compartment, covering infusions, zero- and first-order absorption, and lag times, after single doses and at steady state, per Bertrand & Mentre (2008) <https://www.facm.ucl.ac.be/cooperation/Vietnam/WBI-Vietnam-October-2011/Modelling/Monolix32_PKPD_library.pdf>); parametric simulation from NONMEM-generated parameter estimates and other output; and parsing, tabulating and plotting results generated by Perl-speaks-NONMEM (PsN).
Processor for selected ion flow tube mass spectrometer (SIFT-MS) output file from breath analysis. It allows the filtering of the SIFT output file (i.e., variation over time of the target analyte concentration) and the following analysis for the determination of: maximum, average, and standard deviation value of target concentration measured at each exhalation, and the respiratory rate over the measurement. Additionally, it is possible to align the SIFT-MS data with other on-line techniques such as cardio pulmonary exercise test (CPET) for a comprehensive characterization of breath samples.
Assessment of the distributions of baseline continuous and categorical variables in randomised trials. This method is based on the Carlisle-Stouffer method with Monte Carlo simulations. It calculates p-values for each trial baseline variable, as well as combined p-values for each trial - these p-values measure how compatible are distributions of trials baseline variables with random sampling. This package also allows for graphically plotting the cumulative frequencies of computed p-values. Please note that code was partly adapted from Carlisle JB, Loadsman JA. (2017) <doi:10.1111/anae.13650>.
Identification of Latent Patient Phenotype from Electronic Health Records (EHR) Data using Variational Bayes Gaussian Mixture Model for Latent Class Analysis and Variational Bayes regression for Biomarker level shifts, both implemented by Coordinate Ascent Variational Inference algorithms. Variational methods are used to enable Bayesian analysis of very large Electronic Health Records data. For VB GMM details see Bishop (2006,ISBN:9780-387-31073-2). For Logistic VB see Jaakkola and Jordan (2000) <doi:10.1023/A:1008932416310>. Please see preprint of JSS-submitted paper <doi:10.48550/arXiv.2512.14272>.
This package provides a flexible method for fitting regression models that can be used to find genes that are differentially expressed along one or multiple lineages in a trajectory. Based on the fitted models, it uses a variety of tests suited to answer different questions of interest, e.g. the discovery of genes for which expression is associated with pseudotime, or which are differentially expressed (in a specific region) along the trajectory. It fits a negative binomial generalized additive model (GAM) for each gene, and performs inference on the parameters of the GAM.
Dropout events make the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. ccImpute is an imputation algorithm that uses cell similarity established by consensus clustering to impute the most probable dropout events in the scRNA-seq datasets. ccImpute demonstrated performance which exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
Automates the process of containerizing R projects. The core function of containr is generate_dockerfile()', which analyzes an R project's environment and dependencies via an renv lock file and generates a ready-to-use Dockerfile that encapsulates the computational setup. The package helps researchers build portable and consistent workflows so that analyses can be reliably shared, archived, and rerun across systems. See R Core Team (2025) <https://www.R-project.org/>, Ushey et al. (2025) <https://CRAN.R-project.org/package=renv>, and Docker Inc. (2025) <https://www.docker.com/>.
Fast and easy computation of Euclidean Minimum Spanning Trees (EMST) from data, relying on the R API for mlpack - the C++ Machine Learning Library (Curtin et. al., 2013). emstreeR uses the Dual-Tree Boruvka (March, Ram, Gray, 2010, <doi:10.1145/1835804.1835882>), which is theoretically and empirically the fastest algorithm for computing an EMST. This package also provides functions and an S3 method for readily visualizing Minimum Spanning Trees (MST) using either the style of the base', scatterplot3d', or ggplot2 libraries; and functions to export the MST output to shapefiles.
This package provides a user friendly, easy to understand way of doing event history regression for marginal estimands of interest, including the cumulative incidence and the restricted mean survival, using the pseudo observation framework for estimation. For a review of the methodology, see Andersen and Pohar Perme (2010) <doi:10.1177/0962280209105020> or Sachs and Gabriel (2022) <doi:10.18637/jss.v102.i09>. The interface uses the well known formulation of a generalized linear model and allows for features including plotting of residuals, the use of sampling weights, and corrected variance estimation.
This package implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. This includes functionalities to conduct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. In addition, tools for preparing, adjusting, and summarizing data merges are included. The package implements methods described in Enamorado, Fifield, and Imai (2019) Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records <doi:10.1017/S0003055418000783> and is available at <https://imai.fas.harvard.edu/research/linkage.html>.
An interval-valued extension of ordinary and simple kriging. Optimization of the function is based on a generalized interval distance. This creates a non-differentiable cost function that requires a differentiable approximation to the absolute value function. This differentiable approximation is optimized using a Newton-Raphson algorithm with a penalty function to impose the constraints. Analyses in the package are driven by the intsp and intgrd classes, which are interval-valued extensions of SpatialPointsDataFrame and SpatialPixelsDataFrame respectively. The package includes several wrappers to functions in the gstat and sp packages.
This package provides functions to standardize and whiten data, and to perform Principal Component Analysis (PCA). The main advantage of this package over alternatives like prcomp() is, that jvcoords makes it easy to convert (additional) data between the original and the transformed coordinates. The package also provides a class coords, which can represent affine coordinate transformations. This class forms the basis of the transformations provided by the package, but can also be used independently. The implementation has been optimized to be of comparable speed (and sometimes even faster) than existing alternatives.
Implementation of a theoretically supported alternative to k-nearest neighbors for functional data to solve problems of estimating unobserved segments of a partially observed functional data sample, functional classification and outlier detection. The approximating neighbor curves are piecewise functions built from a functional sample. Instead of a distance on a function space we use a locally defined distance function that satisfies stabilization criteria. The package allows the implementation of the methodology and the replication of the results in Elà as, A., Jiménez, R. and Yukich, J. (2020) <arXiv:2007.16059>.
Comprehensive analytical tools are provided to characterize infectious disease superspreading from contact tracing surveillance data. The underlying theoretical frameworks of this toolkit include branching process with transmission heterogeneity (Lloyd-Smith et al. (2005) <doi:10.1038/nature04153>), case cluster size distribution (Nishiura et al. (2012) <doi:10.1016/j.jtbi.2011.10.039>, Blumberg et al. (2014) <doi:10.1371/journal.ppat.1004452>, and Kucharski and Althaus (2015) <doi:10.2807/1560-7917.ES2015.20.25.21167>), and decomposition of reproduction number (Zhao et al. (2022) <doi:10.1371/journal.pcbi.1010281>).
Pupillometric data collected using SR Research Eyelink eye trackers requires significant preprocessing. This package contains functions for preparing pupil dilation data for visualization and statistical analysis. Specifically, it provides a pipeline of functions which aid in data validation, the removal of blinks/artifacts, downsampling, and baselining, among others. Additionally, plotting functions for creating grand average and conditional average plots are provided. See the vignette for samples of the functionality. The package is designed for handling data collected with SR Research Eyelink eye trackers using Sample Reports created in SR Research Data Viewer.
This package provides functions for conducting power analysis in ANOVA designs, including between-, within-, and mixed-factor designs, with full support for both main effects and interactions. The package allows calculation of statistical power, required total sample size, significance level, and minimal detectable effect sizes expressed as partial eta squared or Cohen's f for ANOVA terms and planned contrasts. In addition, complementary functions are included for common related tests such as t-tests and correlation tests, making the package a convenient toolkit for power analysis in experimental psychology and related fields.
The statistical tools in this package do one of four things: 1) Enhance basic statistical functions with more flexible inputs, smarter defaults, and richer, clearer, and ready-to-use output (e.g., t.test2()) 2) Produce publication-ready commonly needed figures with one line of code (e.g., plot_cdf()) 3) Implement novel analytical tools developed by the authors (e.g., twolines()) 4) Deliver niche functions of high value to the authors that are not easily available elsewhere (e.g., clear(), convert_to_sql(), resize_images()).