This package provides JAR to perform Markov chain Monte Carlo (MCMC) inference using the popular Bayesian Evolutionary Analysis by Sampling Trees BEAST X software library of Baele et al (2025) <doi:10.1038/s41592-025-02751-x>. BEAST X supports auto-tuning Metropolis-Hastings, slice, Hamiltonian Monte Carlo and Sequential Monte Carlo sampling for a large variety of composable standard and phylogenetic statistical models using high performance computing. By placing the BEAST X JAR in this package, we offer an efficient distribution system for BEAST X use by other R packages using CRAN.
This package provides a client for the Environmental Data Initiative repository REST API. The EDI data repository <https://portal.edirepository.org/nis/home.jsp> is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the PASTA+ software stack <https://pastaplus-core.readthedocs.io/en/latest/index.html#> and was developed in collaboration with the US LTER Network <https://lternet.edu/>. EDIutils includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
This package provides a general-purpose workflow for image segmentation using TensorFlow models based on the U-Net architecture by Ronneberger et al. (2015) <arXiv:1505.04597> and the U-Net++ architecture by Zhou et al. (2018) <arXiv:1807.10165>. We provide pre-trained models for assessing canopy density and understory vegetation density from vegetation photos. In addition, the package provides a workflow for easily creating model input and model architectures for general-purpose image segmentation based on grayscale or color images, both for binary and multi-class image segmentation.
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
This package provides a shiny application estimating the operating characteristics of the Student's t-test by Student (1908) <doi:10.1093/biomet/6.1.1>, Welch's t-test by Welch (1947) <doi:10.1093/biomet/34.1-2.28>, and Wilcoxon test by Wilcoxon (1945) <doi:10.2307/3001968> in one-sample or two-sample cases, in settings defined by the user (conditional distribution, sample size per group, location parameter per group, nuisance parameter per group), using Monte Carlo simulations Malvin H. Kalos, Paula A. Whitlock (2008) <doi:10.1002/9783527626212>.
To make the semiparametric transformation models easier to apply in real studies, we introduce this R package, in which the MLE in transformation models via an EM algorithm proposed by Zeng D, Lin DY(2007) <doi:10.1111/j.1369-7412.2007.00606.x> and adaptive lasso method in transformation models proposed by Liu XX, Zeng D(2013) <doi:10.1093/biomet/ast029> are implemented. C++ functions are used to compute complex loops. The coefficient vector and cumulative baseline hazard function can be estimated, along with the corresponding standard errors and P values.
Simulate genotypes for case-parent triads, case-control, and quantitative trait samples with realistic linkage diequilibrium structure and allele frequency distribution. For studies of epistasis one can simulate models that involve specific SNPs at specific sets of loci, which we will refer to as "pathways". TriadSim generates genotype data by resampling triad genotypes from existing data. The details of the method is described in the manuscript under preparation "Simulating Autosomal Genotypes with Realistic Linkage Disequilibrium and a Spiked in Genetic Effect" Shi, M., Umbach, D.M., Wise A.S., Weinberg, C.R.
Outliers virtually exist in any datasets of any application field. To avoid the impact of outliers, we need to use robust estimators. Classical estimators of multivariate mean and covariance matrix are the sample mean and the sample covariance matrix. Outliers will affect the sample mean and the sample covariance matrix, and thus they will affect the classical factor analysis which depends on the classical estimators (Pison, G., Rousseeuw, P.J., Filzmoser, P. and Croux, C. (2003) <doi:10.1016/S0047-259X(02)00007-6>). So it is necessary to use the robust estimators of the sample mean and the sample covariance matrix. There are several robust estimators in the literature: Minimum Covariance Determinant estimator, Orthogonalized Gnanadesikan-Kettenring, Minimum Volume Ellipsoid, M, S, and Stahel-Donoho. The most direct way to make multivariate analysis more robust is to replace the sample mean and the sample covariance matrix of the classical estimators to robust estimators (Maronna, R.A., Martin, D. and Yohai, V. (2006) <doi:10.1002/0470010940>) (Todorov, V. and Filzmoser, P. (2009) <doi:10.18637/jss.v032.i03>), which is our choice of robust factor analysis. We created an object oriented solution for robust factor analysis based on new S4 classes.
This package implements two methods for performing a constrained principal component analysis (PCA), where non-negativity and/or sparsity constraints are enforced on the principal axes (PAs). The function nsprcomp computes one principal component (PC) after the other. Each PA is optimized such that the corresponding PC has maximum additional variance not explained by the previous components. In contrast, the function nscumcomp jointly computes all PCs such that the cumulative variance is maximal. Both functions have the same interface as the prcomp function from the stats package (plus some extra parameters).
The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.
This package implements a bias-aware framework for evidence synthesis in systematic reviews and health technology assessments, as described in Kabali (2025) <doi:10.1111/jep.70272>. The package models study-level effect estimates by explicitly accounting for multiple sources of bias through prior distributions and propagates uncertainty using posterior simulation. Evidence across studies is combined using posterior mixture distributions rather than a single pooled likelihood, enabling probabilistic inference on clinically or policy-relevant thresholds. The methods are designed to support transparent decision-making when study relevance and bias vary across the evidence base.
Intelligently assign samples to batches in order to reduce batch effects. Batch effects can have a significant impact on data analysis, especially when the assignment of samples to batches coincides with the contrast groups being studied. By defining a batch container and a scoring function that reflects the contrasts, this package allows users to assign samples in a way that minimizes the potential impact of batch effects on the comparison of interest. Among other functionality, we provide an implementation for OSAT score by Yan et al. (2012, <doi:10.1186/1471-2164-13-689>).
The dfmirroR package allows users to input a data frame, simulate some number of observations based on specified columns of that data frame, and then outputs a string that contains the code to re-create the simulation. The goal is to both provide workable test data sets and provide users with the information they need to set up reproducible examples with team members. This package was created out of a need to share examples in cases where data are private and where a full data frame is not needed for testing or coordinating.
This package provides a ggplot2 based implementation of biplots, giving a representation of a dataset in a two dimensional space accounting for the greatest variance, together with variable vectors showing how the data variables relate to this space. It provides a replacement for stats::biplot(), but with many enhancements to control the analysis and graphical display. It implements biplot and scree plot methods which can be used with the results of prcomp(), princomp(), FactoMineR::PCA(), ade4::dudi.pca() or MASS::lda() and can be customized using ggplot2 techniques.
This package contains datasets and several smaller functions suitable for analysis of interval-censored data. The package complements the book Bogaerts, Komárek and Lesaffre (2017, ISBN: 978-1-4200-7747-6) "Survival Analysis with Interval-Censored Data: A Practical Approach" <https://www.routledge.com/Survival-Analysis-with-Interval-Censored-Data-A-Practical-Approach-with/Bogaerts-Komarek-Lesaffre/p/book/9781420077476>. Full R code related to the examples presented in the book can be found at <https://ibiostat.be/online-resources/icbook/supplemental>. Packages mentioned in the "Suggests" section are used in those examples.
Comparative evaluation of families and candidate variants in rare-variant association studies. The package can be used for two methodologically overlapping but distinct purposes. First, the prior to any genetic or genomic evaluation, evaluation of relative detection power of pedigrees, can direct recruitment efforts by showing which individuals not yet sampled would be the most meaningful additions to a study. Second, after sequencing and analysis, variants based on association with disease status and familial relationships of individuals, aids in variant prioritization. Methodology is described in Nugent (2025) <doi:10.1101/2025.10.06.25337426>.
The unique function of this package allows representing in a single graph the relative occurrence and co-occurrence of events measured in a sample. As examples, the package was applied to describe the occurrence and co-occurrence of different species of bacterial or viral symbionts infecting arthropods at the individual level. The graphics allows determining the prevalence of each symbiont and the patterns of multiple infections (i.e. how different symbionts share or not the same individual hosts). We named the package after the famous painter as the graphical output recalls Mondrianâ s paintings.
It computes arbitrary products moments (mean vector and variance-covariance matrix), for some double truncated (and folded) multivariate distributions. These distributions belong to the family of selection elliptical distributions, which includes well known skewed distributions as the unified skew-t distribution (SUT) and its particular cases as the extended skew-t (EST), skew-t (ST) and the symmetric student-t (T) distribution. Analogous normal cases unified skew-normal (SUN), extended skew-normal (ESN), skew-normal (SN), and symmetric normal (N) are also included. Density, probabilities and random deviates are also offered for these members.
Fits multivariate Ornstein-Uhlenbeck types of models to continues trait data from species related by a common evolutionary history. See K. Bartoszek, J, Pienaar, P. Mostad, S. Andersson, T. F. Hansen (2012) <doi:10.1016/j.jtbi.2012.08.005> and K. Bartoszek, and J. Tredgett Clarke, J. Fuentes-Gonzalez, V. Mitov, J. Pienaar, M. Piwczynski, R. Puchalka, K. Spalik, K. L. Voje (2024) <doi:10.1111/2041-210X.14376>. The suggested PCMBaseCpp package (which significantly speeds up the likelihood calculations) can be obtained from <https://github.com/venelin/PCMBaseCpp/>.
DNA methylation is an important epigenetic process that regulates gene activity through chemical modifications of DNA without changing its sequence. OpEnCAST is a plant-specific ensemble-based prediction package that identifies 4mC, 5mC and 6mA methylation sites directly from DNA sequences. It combines multiple machine learning algorithms trained on monocot (Oryza sp.) and dicot (Arabidopsis sp.) reference models to deliver accurate predictions. This methodology is being inspired by the ensemble algorithm for methylation prediction developed by Wang et al. (2022) <doi:10.1186/s12859-022-04756-1>.
Fitting models for, and simulation of, trend locally stationary wavelet (TLSW) time series models, which take account of time-varying trend and dependence structure in a univariate time series. The TLSW model, and its estimation, is described in McGonigle, Killick and Nunes (2022a) <doi:10.1111/jtsa.12643>, (2022b) <doi:10.1214/22-EJS2044>. Further information regarding the use of the package, along with detailed examples, can be found in McGonigle, Killick and Nunes (2025) <doi:10.18637/jss.v115.i10>. New users will likely want to start with the TLSW function.
SBGNview is a tool set for pathway based data visalization, integration and analysis. SBGNview is similar and complementary to the widely used Pathview, with the following key features: 1. Pathway definition by the widely adopted Systems Biology Graphical Notation (SBGN); 2. Supports multiple major pathway databases beyond KEGG (Reactome, MetaCyc, SMPDB, PANTHER, METACROP) and user defined pathways; 3. Covers 5,200 reference pathways and over 3,000 species by default; 4. Extensive graphics controls, including glyph and edge attributes, graph layout and sub-pathway highlight; 5. SBGN pathway data manipulation, processing, extraction and analysis.
Using site polymorphism is one of the ways to cluster DNA/protein sequences but it is possible for the sequences with the same polymorphism on a single site to be genetically distant. This package is aimed at clustering sequences using site polymorphism and their corresponding phylogenetic trees. By considering their location on the tree, only the structurally adjacent sequences will be clustered. However, the adjacent sequences may not necessarily have the same polymorphism. So a branch-and-bound like algorithm is used to minimize the entropy representing the purity of site polymorphism of each cluster.