Traditional model evaluation metrics fail to capture model performance under less than ideal conditions. This package employs techniques to evaluate models "under-stress". This includes testing models extrapolation ability, or testing accuracy on specific sub-samples of the overall model space. Details describing stress-testing methods in this package are provided in Haycock (2023) <doi:10.26076/2am5-9f67>. The other primary contribution of this package is provided to R users access to the Python library PyCaret <https://pycaret.org/> for quick and easy access to auto-tuned machine learning models.
This package provides a latent, quasi-independent truncation time is assumed to be linked with the observed dependent truncation time, the event time, and an unknown transformation parameter via a structural transformation model. The transformation parameter is chosen to minimize the conditional Kendall's tau (Martin and Betensky, 2005) <doi:10.1198/016214504000001538> or the regression coefficient estimates (Jones and Crowley, 1992) <doi:10.2307/2336782>. The marginal distribution for the truncation time and the event time are completely left unspecified. The methodology is applied to survival curve estimation and regression analysis.
This package provides a graphical R package designed to visualize behavioral observations over time. Based on raw time data extracted from video recorded sessions of experimental observations, ViSiElse grants a global overview of a process by combining the visualization of multiple actions timestamps for all participants in a single graph. Individuals and/or group behavior can easily be assessed. Supplementary features allow users to further inspect their data by adding summary statistics (mean, standard deviation, quantile or statistical test) and/or time constraints to assess the accuracy of the realized actions.
Create adjacency matrices of vocalisation graphs from dataframes containing sequences of speech and silence intervals, transforming these matrices into Markov diagrams, and generating datasets for classification of these diagrams by flattening them and adding global properties (functionals) etc. Vocalisation diagrams date back to early work in psychiatry (Jaffe and Feldstein, 1970) and social psychology (Dabbs and Ruback, 1987) but have only recently been employed as a data representation method for machine learning tasks including meeting segmentation (Luz, 2012) <doi:10.1145/2328967.2328970> and classification (Luz, 2013) <doi:10.1145/2522848.2533788>.
Quickly install Java Development Kit (JDK) without administrative privileges and set environment variables in current R session or project to solve common issues with Java environment management in R'. Recommended to users of Java'/'rJava'-dependent R packages such as r5r', opentripplanner', xlsx', openNLP', rWeka', RJDBC', tabulapdf', and many more. rJavaEnv prevents common problems like Java not found, Java version conflicts, missing Java installations, and the inability to install Java due to lack of administrative privileges. rJavaEnv automates the download, installation, and setup of the Java on a per-project basis by setting the relevant JAVA_HOME in the current R session or the current working directory (via .Rprofile', with the user's consent). Similar to what renv does for R packages, rJavaEnv allows different Java versions to be used across different projects, but can also be configured to allow multiple versions within the same project (e.g. with the help of targets package). Note: there are a few extra steps for Linux users, who don't have any Java previously installed in their system, and who prefer package installation from source, rather then installing binaries from Posit Package Manager'. See documentation for details.
This package provides a fast and accurate analysis toolkit for single cell ATAC-seq (Assay for transposase-accessible chromatin using sequencing). Single cell ATAC-seq can resolve the heterogeneity of a complex tissue and reveal cell-type specific regulatory landscapes. However, the exceeding data sparsity has posed unique challenges for the data analysis. This package r-snapatac is an end-to-end bioinformatics pipeline for analyzing large- scale single cell ATAC-seq data which includes quality control, normalization, clustering analysis, differential analysis, motif inference and exploration of single cell ATAC-seq sequencing data.
This package provides a self-tuning spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density aware kernel that strengthens connections in the graph based on common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to integrate different data sources and reduce noise. Spectrum uses either the eigengap or multimodality gap heuristics to determine the number of clusters. The method is sufficiently flexible so that a wide range of Gaussian and non-Gaussian structures can be clustered with automatic selection of K.
This package provides utilities for computation and analysis of correlation/covariation in multiple sequence alignments and in side chain motions during molecular dynamics simulations. Features include the computation of correlation/covariation scores using a variety of scoring functions between either sequence positions in alignments or side chain dihedral angles in molecular dynamics simulations and utilities to analyze the correlation/covariation matrix through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation.
This package provides tools for raster georeferencing, grid affine transforms, and general raster logic. These functions provide converters between raster specifications, world vector, geotransform, RasterIO window, and RasterIO window in sf package list format. There are functions to offset a matrix by padding any of four corners (useful for vectorizing neighbourhood operations), and helper functions to harvesting user clicks on a graphics device to use for simple georeferencing of images. Methods used are available from <https://en.wikipedia.org/wiki/World_file> and <https://gdal.org/user/raster_data_model.html>.
This package provides JAR to perform Markov chain Monte Carlo (MCMC) inference using the popular Bayesian Evolutionary Analysis by Sampling Trees BEAST X software library of Baele et al (2025) <doi:10.1038/s41592-025-02751-x>. BEAST X supports auto-tuning Metropolis-Hastings, slice, Hamiltonian Monte Carlo and Sequential Monte Carlo sampling for a large variety of composable standard and phylogenetic statistical models using high performance computing. By placing the BEAST X JAR in this package, we offer an efficient distribution system for BEAST X use by other R packages using CRAN.
This package provides a client for the Environmental Data Initiative repository REST API. The EDI data repository <https://portal.edirepository.org/nis/home.jsp> is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the PASTA+ software stack <https://pastaplus-core.readthedocs.io/en/latest/index.html#> and was developed in collaboration with the US LTER Network <https://lternet.edu/>. EDIutils includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
This package provides a general-purpose workflow for image segmentation using TensorFlow models based on the U-Net architecture by Ronneberger et al. (2015) <arXiv:1505.04597> and the U-Net++ architecture by Zhou et al. (2018) <arXiv:1807.10165>. We provide pre-trained models for assessing canopy density and understory vegetation density from vegetation photos. In addition, the package provides a workflow for easily creating model input and model architectures for general-purpose image segmentation based on grayscale or color images, both for binary and multi-class image segmentation.
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
This package provides a shiny application estimating the operating characteristics of the Student's t-test by Student (1908) <doi:10.1093/biomet/6.1.1>, Welch's t-test by Welch (1947) <doi:10.1093/biomet/34.1-2.28>, and Wilcoxon test by Wilcoxon (1945) <doi:10.2307/3001968> in one-sample or two-sample cases, in settings defined by the user (conditional distribution, sample size per group, location parameter per group, nuisance parameter per group), using Monte Carlo simulations Malvin H. Kalos, Paula A. Whitlock (2008) <doi:10.1002/9783527626212>.
Simulate genotypes for case-parent triads, case-control, and quantitative trait samples with realistic linkage diequilibrium structure and allele frequency distribution. For studies of epistasis one can simulate models that involve specific SNPs at specific sets of loci, which we will refer to as "pathways". TriadSim generates genotype data by resampling triad genotypes from existing data. The details of the method is described in the manuscript under preparation "Simulating Autosomal Genotypes with Realistic Linkage Disequilibrium and a Spiked in Genetic Effect" Shi, M., Umbach, D.M., Wise A.S., Weinberg, C.R.
To make the semiparametric transformation models easier to apply in real studies, we introduce this R package, in which the MLE in transformation models via an EM algorithm proposed by Zeng D, Lin DY(2007) <doi:10.1111/j.1369-7412.2007.00606.x> and adaptive lasso method in transformation models proposed by Liu XX, Zeng D(2013) <doi:10.1093/biomet/ast029> are implemented. C++ functions are used to compute complex loops. The coefficient vector and cumulative baseline hazard function can be estimated, along with the corresponding standard errors and P values.
Outliers virtually exist in any datasets of any application field. To avoid the impact of outliers, we need to use robust estimators. Classical estimators of multivariate mean and covariance matrix are the sample mean and the sample covariance matrix. Outliers will affect the sample mean and the sample covariance matrix, and thus they will affect the classical factor analysis which depends on the classical estimators (Pison, G., Rousseeuw, P.J., Filzmoser, P. and Croux, C. (2003) <doi:10.1016/S0047-259X(02)00007-6>). So it is necessary to use the robust estimators of the sample mean and the sample covariance matrix. There are several robust estimators in the literature: Minimum Covariance Determinant estimator, Orthogonalized Gnanadesikan-Kettenring, Minimum Volume Ellipsoid, M, S, and Stahel-Donoho. The most direct way to make multivariate analysis more robust is to replace the sample mean and the sample covariance matrix of the classical estimators to robust estimators (Maronna, R.A., Martin, D. and Yohai, V. (2006) <doi:10.1002/0470010940>) (Todorov, V. and Filzmoser, P. (2009) <doi:10.18637/jss.v032.i03>), which is our choice of robust factor analysis. We created an object oriented solution for robust factor analysis based on new S4 classes.
This package implements two methods for performing a constrained principal component analysis (PCA), where non-negativity and/or sparsity constraints are enforced on the principal axes (PAs). The function nsprcomp computes one principal component (PC) after the other. Each PA is optimized such that the corresponding PC has maximum additional variance not explained by the previous components. In contrast, the function nscumcomp jointly computes all PCs such that the cumulative variance is maximal. Both functions have the same interface as the prcomp function from the stats package (plus some extra parameters).
The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.
This package implements a bias-aware framework for evidence synthesis in systematic reviews and health technology assessments, as described in Kabali (2025) <doi:10.1111/jep.70272>. The package models study-level effect estimates by explicitly accounting for multiple sources of bias through prior distributions and propagates uncertainty using posterior simulation. Evidence across studies is combined using posterior mixture distributions rather than a single pooled likelihood, enabling probabilistic inference on clinically or policy-relevant thresholds. The methods are designed to support transparent decision-making when study relevance and bias vary across the evidence base.
Intelligently assign samples to batches in order to reduce batch effects. Batch effects can have a significant impact on data analysis, especially when the assignment of samples to batches coincides with the contrast groups being studied. By defining a batch container and a scoring function that reflects the contrasts, this package allows users to assign samples in a way that minimizes the potential impact of batch effects on the comparison of interest. Among other functionality, we provide an implementation for OSAT score by Yan et al. (2012, <doi:10.1186/1471-2164-13-689>).
The dfmirroR package allows users to input a data frame, simulate some number of observations based on specified columns of that data frame, and then outputs a string that contains the code to re-create the simulation. The goal is to both provide workable test data sets and provide users with the information they need to set up reproducible examples with team members. This package was created out of a need to share examples in cases where data are private and where a full data frame is not needed for testing or coordinating.
This package provides a ggplot2 based implementation of biplots, giving a representation of a dataset in a two dimensional space accounting for the greatest variance, together with variable vectors showing how the data variables relate to this space. It provides a replacement for stats::biplot(), but with many enhancements to control the analysis and graphical display. It implements biplot and scree plot methods which can be used with the results of prcomp(), princomp(), FactoMineR::PCA(), ade4::dudi.pca() or MASS::lda() and can be customized using ggplot2 techniques.