Dropout events make the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. ccImpute is an imputation algorithm that uses cell similarity established by consensus clustering to impute the most probable dropout events in the scRNA-seq datasets. ccImpute demonstrated performance which exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
This package provides a flexible method for fitting regression models that can be used to find genes that are differentially expressed along one or multiple lineages in a trajectory. Based on the fitted models, it uses a variety of tests suited to answer different questions of interest, e.g. the discovery of genes for which expression is associated with pseudotime, or which are differentially expressed (in a specific region) along the trajectory. It fits a negative binomial generalized additive model (GAM) for each gene, and performs inference on the parameters of the GAM.
Quickly install Java Development Kit (JDK) without administrative privileges and set environment variables in current R session or project to solve common issues with Java environment management in R'. Recommended to users of Java'/'rJava'-dependent R packages such as r5r', opentripplanner', xlsx', openNLP', rWeka', RJDBC', tabulapdf', and many more. rJavaEnv prevents common problems like Java not found, Java version conflicts, missing Java installations, and the inability to install Java due to lack of administrative privileges. rJavaEnv automates the download, installation, and setup of the Java on a per-project basis by setting the relevant JAVA_HOME in the current R session or the current working directory (via .Rprofile', with the user's consent). Similar to what renv does for R packages, rJavaEnv allows different Java versions to be used across different projects, but can also be configured to allow multiple versions within the same project (e.g. with the help of targets package). Note: there are a few extra steps for Linux users, who don't have any Java previously installed in their system, and who prefer package installation from source, rather then installing binaries from Posit Package Manager'. See documentation for details.
This package provides tools for raster georeferencing, grid affine transforms, and general raster logic. These functions provide converters between raster specifications, world vector, geotransform, RasterIO window, and RasterIO window in sf package list format. There are functions to offset a matrix by padding any of four corners (useful for vectorizing neighbourhood operations), and helper functions to harvesting user clicks on a graphics device to use for simple georeferencing of images. Methods used are available from <https://en.wikipedia.org/wiki/World_file> and <https://gdal.org/user/raster_data_model.html>.
This package provides JAR to perform Markov chain Monte Carlo (MCMC) inference using the popular Bayesian Evolutionary Analysis by Sampling Trees BEAST X software library of Baele et al (2025) <doi:10.1038/s41592-025-02751-x>. BEAST X supports auto-tuning Metropolis-Hastings, slice, Hamiltonian Monte Carlo and Sequential Monte Carlo sampling for a large variety of composable standard and phylogenetic statistical models using high performance computing. By placing the BEAST X JAR in this package, we offer an efficient distribution system for BEAST X use by other R packages using CRAN.
This package provides a client for the Environmental Data Initiative repository REST API. The EDI data repository <https://portal.edirepository.org/nis/home.jsp> is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the PASTA+ software stack <https://pastaplus-core.readthedocs.io/en/latest/index.html#> and was developed in collaboration with the US LTER Network <https://lternet.edu/>. EDIutils includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
This package provides a general-purpose workflow for image segmentation using TensorFlow models based on the U-Net architecture by Ronneberger et al. (2015) <arXiv:1505.04597> and the U-Net++ architecture by Zhou et al. (2018) <arXiv:1807.10165>. We provide pre-trained models for assessing canopy density and understory vegetation density from vegetation photos. In addition, the package provides a workflow for easily creating model input and model architectures for general-purpose image segmentation based on grayscale or color images, both for binary and multi-class image segmentation.
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
This package provides a shiny application estimating the operating characteristics of the Student's t-test by Student (1908) <doi:10.1093/biomet/6.1.1>, Welch's t-test by Welch (1947) <doi:10.1093/biomet/34.1-2.28>, and Wilcoxon test by Wilcoxon (1945) <doi:10.2307/3001968> in one-sample or two-sample cases, in settings defined by the user (conditional distribution, sample size per group, location parameter per group, nuisance parameter per group), using Monte Carlo simulations Malvin H. Kalos, Paula A. Whitlock (2008) <doi:10.1002/9783527626212>.
This package provides a variety of original and flexible user-friendly statistical latent variable models and unsupervised learning algorithms to segment and represent time-series data (univariate or multivariate), and more generally, longitudinal data, which include regime changes. samurais is built upon the following packages, each of them is an autonomous time-series segmentation approach: Regression with Hidden Logistic Process ('RHLP'), Hidden Markov Model Regression ('HMMR'), Multivariate RHLP ('MRHLP'), Multivariate HMMR ('MHMMR'), Piece-Wise regression ('PWR'). For the advantages/differences of each of them, the user is referred to our mentioned paper references.
To make the semiparametric transformation models easier to apply in real studies, we introduce this R package, in which the MLE in transformation models via an EM algorithm proposed by Zeng D, Lin DY(2007) <doi:10.1111/j.1369-7412.2007.00606.x> and adaptive lasso method in transformation models proposed by Liu XX, Zeng D(2013) <doi:10.1093/biomet/ast029> are implemented. C++ functions are used to compute complex loops. The coefficient vector and cumulative baseline hazard function can be estimated, along with the corresponding standard errors and P values.
Simulate genotypes for case-parent triads, case-control, and quantitative trait samples with realistic linkage diequilibrium structure and allele frequency distribution. For studies of epistasis one can simulate models that involve specific SNPs at specific sets of loci, which we will refer to as "pathways". TriadSim generates genotype data by resampling triad genotypes from existing data. The details of the method is described in the manuscript under preparation "Simulating Autosomal Genotypes with Realistic Linkage Disequilibrium and a Spiked in Genetic Effect" Shi, M., Umbach, D.M., Wise A.S., Weinberg, C.R.
This package provides a fast and accurate analysis toolkit for single cell ATAC-seq (Assay for transposase-accessible chromatin using sequencing). Single cell ATAC-seq can resolve the heterogeneity of a complex tissue and reveal cell-type specific regulatory landscapes. However, the exceeding data sparsity has posed unique challenges for the data analysis. This package r-snapatac is an end-to-end bioinformatics pipeline for analyzing large- scale single cell ATAC-seq data which includes quality control, normalization, clustering analysis, differential analysis, motif inference and exploration of single cell ATAC-seq sequencing data.
This package provides a self-tuning spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density aware kernel that strengthens connections in the graph based on common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to integrate different data sources and reduce noise. Spectrum uses either the eigengap or multimodality gap heuristics to determine the number of clusters. The method is sufficiently flexible so that a wide range of Gaussian and non-Gaussian structures can be clustered with automatic selection of K.
This package provides utilities for computation and analysis of correlation/covariation in multiple sequence alignments and in side chain motions during molecular dynamics simulations. Features include the computation of correlation/covariation scores using a variety of scoring functions between either sequence positions in alignments or side chain dihedral angles in molecular dynamics simulations and utilities to analyze the correlation/covariation matrix through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation.
Outliers virtually exist in any datasets of any application field. To avoid the impact of outliers, we need to use robust estimators. Classical estimators of multivariate mean and covariance matrix are the sample mean and the sample covariance matrix. Outliers will affect the sample mean and the sample covariance matrix, and thus they will affect the classical factor analysis which depends on the classical estimators (Pison, G., Rousseeuw, P.J., Filzmoser, P. and Croux, C. (2003) <doi:10.1016/S0047-259X(02)00007-6>). So it is necessary to use the robust estimators of the sample mean and the sample covariance matrix. There are several robust estimators in the literature: Minimum Covariance Determinant estimator, Orthogonalized Gnanadesikan-Kettenring, Minimum Volume Ellipsoid, M, S, and Stahel-Donoho. The most direct way to make multivariate analysis more robust is to replace the sample mean and the sample covariance matrix of the classical estimators to robust estimators (Maronna, R.A., Martin, D. and Yohai, V. (2006) <doi:10.1002/0470010940>) (Todorov, V. and Filzmoser, P. (2009) <doi:10.18637/jss.v032.i03>), which is our choice of robust factor analysis. We created an object oriented solution for robust factor analysis based on new S4 classes.
Intelligently assign samples to batches in order to reduce batch effects. Batch effects can have a significant impact on data analysis, especially when the assignment of samples to batches coincides with the contrast groups being studied. By defining a batch container and a scoring function that reflects the contrasts, this package allows users to assign samples in a way that minimizes the potential impact of batch effects on the comparison of interest. Among other functionality, we provide an implementation for OSAT score by Yan et al. (2012, <doi:10.1186/1471-2164-13-689>).
The dfmirroR package allows users to input a data frame, simulate some number of observations based on specified columns of that data frame, and then outputs a string that contains the code to re-create the simulation. The goal is to both provide workable test data sets and provide users with the information they need to set up reproducible examples with team members. This package was created out of a need to share examples in cases where data are private and where a full data frame is not needed for testing or coordinating.
This package provides a ggplot2 based implementation of biplots, giving a representation of a dataset in a two dimensional space accounting for the greatest variance, together with variable vectors showing how the data variables relate to this space. It provides a replacement for stats::biplot(), but with many enhancements to control the analysis and graphical display. It implements biplot and scree plot methods which can be used with the results of prcomp(), princomp(), FactoMineR::PCA(), ade4::dudi.pca() or MASS::lda() and can be customized using ggplot2 techniques.
This package contains datasets and several smaller functions suitable for analysis of interval-censored data. The package complements the book Bogaerts, Komárek and Lesaffre (2017, ISBN: 978-1-4200-7747-6) "Survival Analysis with Interval-Censored Data: A Practical Approach" <https://www.routledge.com/Survival-Analysis-with-Interval-Censored-Data-A-Practical-Approach-with/Bogaerts-Komarek-Lesaffre/p/book/9781420077476>. Full R code related to the examples presented in the book can be found at <https://ibiostat.be/online-resources/icbook/supplemental>. Packages mentioned in the "Suggests" section are used in those examples.
Fits multivariate Ornstein-Uhlenbeck types of models to continues trait data from species related by a common evolutionary history. See K. Bartoszek, J, Pienaar, P. Mostad, S. Andersson, T. F. Hansen (2012) <doi:10.1016/j.jtbi.2012.08.005> and K. Bartoszek, and J. Tredgett Clarke, J. Fuentes-Gonzalez, V. Mitov, J. Pienaar, M. Piwczynski, R. Puchalka, K. Spalik, K. L. Voje (2024) <doi:10.1111/2041-210X.14376>. The suggested PCMBaseCpp package (which significantly speeds up the likelihood calculations) can be obtained from <https://github.com/venelin/PCMBaseCpp/>.
It computes arbitrary products moments (mean vector and variance-covariance matrix), for some double truncated (and folded) multivariate distributions. These distributions belong to the family of selection elliptical distributions, which includes well known skewed distributions as the unified skew-t distribution (SUT) and its particular cases as the extended skew-t (EST), skew-t (ST) and the symmetric student-t (T) distribution. Analogous normal cases unified skew-normal (SUN), extended skew-normal (ESN), skew-normal (SN), and symmetric normal (N) are also included. Density, probabilities and random deviates are also offered for these members.
The unique function of this package allows representing in a single graph the relative occurrence and co-occurrence of events measured in a sample. As examples, the package was applied to describe the occurrence and co-occurrence of different species of bacterial or viral symbionts infecting arthropods at the individual level. The graphics allows determining the prevalence of each symbiont and the patterns of multiple infections (i.e. how different symbionts share or not the same individual hosts). We named the package after the famous painter as the graphical output recalls Mondrianâ s paintings.