This package provides a simple mechanism to specify a symmetric block diagonal matrices (often used for covariance matrices). This is based on the domain specific language implemented in nlmixr2 but expanded to create matrices in R generally instead of specifying parts of matrices to estimate. It has expanded to include some matrix manipulation functions that are generally useful for rxode2 and nlmixr2'.
The MCC-F1 analysis is a method to evaluate the performance of binary classifications. The MCC-F1 curve is more reliable than the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR)curve under imbalanced ground truth. The MCC-F1 analysis also provides the MCC-F1 metric that integrates classifier performance over varying thresholds, and the best threshold of binary classification.
Model stability and variable inclusion plots [Mueller and Welsh (2010, <doi:10.1111/j.1751-5823.2010.00108.x>); Murray, Heritier and Mueller (2013, <doi:10.1002/sim.5855>)] as well as the adaptive fence [Jiang et al. (2008, <doi:10.1214/07-AOS517>); Jiang et al. (2009, <doi:10.1016/j.spl.2008.10.014>)] for linear and generalised linear models.
This package provides functions for manipulating nested data frames in a list-column using dplyr <https://dplyr.tidyverse.org/> syntax. Rather than unnesting, then manipulating a data frame, nplyr allows users to manipulate each nested data frame directly. nplyr is a wrapper for dplyr functions that provide tools for common data manipulation steps: filtering rows, selecting columns, summarising grouped data, among others.
This package implements the Phylogeny-Guided Microbiome OTU-Specific Association Test method, which boosts the testing power by adaptively borrowing information from phylogenetically close OTUs (operational taxonomic units) of the target OTU. This method is built on a kernel machine regression framework and allows for flexible modeling of complex microbiome effects, adjustments for covariates, and can accommodate both continuous and binary outcomes.
This package implements statistical methods for detecting evolutionary shifts in both the optimal trait value (mean) and evolutionary diffusion variance. The method uses an L1-penalized optimization framework to identify branches where shifts occur, and the shift magnitudes. It also supports the inclusion of measurement error. For more details, see Zhang, Ho, and Kenney (2023) <doi:10.48550/arXiv.2312.17480>.
Efficient implementations for Sorted L-One Penalized Estimation (SLOPE): generalized linear models regularized with the sorted L1-norm (Bogdan et al. 2015). Supported models include ordinary least-squares regression, binomial regression, multinomial regression, and Poisson regression. Both dense and sparse predictor matrices are supported. In addition, the package features predictor screening rules that enable fast and efficient solutions to high-dimensional problems.
This package provides a pipeline for short tandem repeat instability analysis from fragment analysis data. Inputs of fsa files or peak tables, and a user supplied metadata data-frame. The package identifies ladders, calls peaks, identifies the modal peaks, calls repeats, then calculates repeat instability metrics (e.g. expansion index from Lee et al. (2010) <doi:10.1186/1752-0509-4-29>).
Download, prepare and analyze data from large-scale assessments and surveys with complex sampling and assessment design (see Rutkowski', 2010 <doi:10.3102/0013189X10363170>). Such studies are, for example, international assessments like TIMSS', PIRLS and PISA'. A graphical interface is available for the non-technical user.The package includes functions to covert the original data from SPSS into R data sets keeping the user-defined missing values, merge data from different respondents and/or countries, generate variable dictionaries, modify data, produce descriptive statistics (percentages, means, percentiles, benchmarks) and multivariate statistics (correlations, linear regression, binary logistic regression). The number of supported studies and analysis types will increase in future. For a general presentation of the package, see Mirazchiyski', 2021a (<doi:10.1186/s40536-021-00114-4>). For detailed technical aspects of the package, see Mirazchiyski', 2021b (<doi:10.3390/psych3020018>).
This package provides a framework for estimating ensembles of meta-analytic, meta-regression, and multilevel models (assuming either presence or absence of the effect, heterogeneity, publication bias, and moderators). The RoBMA framework uses Bayesian model-averaging to combine the competing meta-analytic models into a model ensemble, weights the posterior parameter distributions based on posterior model probabilities and uses Bayes factors to test for the presence or absence of the individual components (e.g., effect vs. no effect; Bartoš et al., 2022, <doi:10.1002/jrsm.1594>; Maier, Bartoš & Wagenmakers, 2022, <doi:10.1037/met0000405>; Bartoš et al., 2025, <doi:10.1037/met0000737>). Users can define a wide range of prior distributions for the effect size, heterogeneity, publication bias (including selection models and PET-PEESE), and moderator components. The package provides convenient functions for summary, visualizations, and fit diagnostics.
BLASE is a method for finding where bulk RNA-seq data lies on a single-cell pseudotime trajectory. It uses a fast and understandable approach based on Spearman correlation, with bootstrapping to provide confidence. BLASE can be used to "date" bulk RNA-seq data, annotate cell types in scRNA-seq, and help correct for developmental phenotype differences in bulk RNA-seq experiments.
Base-resolution copy number analysis of viral genome. Utilizes base-resolution read depth data over viral genome to find copy number segments with two-dimensional segmentation approach. Provides publish-ready figures, including histograms of read depths, coverage line plots over viral genome annotated with copy number change events and viral genes, and heatmaps showing multiple types of data with integrative clustering of samples.
An R package that tests for enrichment and depletion of user-defined pathways using a Fisher's exact test. The method is designed for versatile pathway annotation formats (eg. gmt, txt, xlsx) to allow the user to run pathway analysis on custom annotations. This package is also integrated with Cytoscape to provide network-based pathway visualization that enhances the interpretability of the results.
This package defines interfaces from R to scvi-tools. A vignette works through the totalVI tutorial for analyzing CITE-seq data. Another vignette compares outputs of Chapter 12 of the OSCA book with analogous outputs based on totalVI quantifications. Future work will address other components of scvi-tools, with a focus on building understanding of probabilistic methods based on variational autoencoders.
SimBu can be used to simulate bulk RNA-seq datasets with known cell type fractions. You can either use your own single-cell study for the simulation or the sfaira database. Different pre-defined simulation scenarios exist, as are options to run custom simulations. Additionally, expression values can be adapted by adding an mRNA bias, which produces more biologically relevant simulations.
The TMSig package contains tools to prepare, analyze, and visualize named lists of sets, with an emphasis on molecular signatures (such as gene or kinase sets). It includes fast, memory efficient functions to construct sparse incidence and similarity matrices and filter, cluster, invert, and decompose sets. Additionally, bubble heatmaps can be created to visualize the results of any differential or molecular signatures analysis.
R functions for "The Basics of Item Response Theory Using R" by Frank B. Baker and Seock-Ho Kim (Springer, 2017, ISBN-13: 978-3-319-54204-1) including iccplot(), icccal(), icc(), iccfit(), groupinv(), tcc(), ability(), tif(), and rasch(). For example, iccplot() plots an item characteristic curve under the two-parameter logistic model.
Arrays of structured data types can require large volumes of disk space to store. Blosc is a library that provides a fast and efficient way to compress such data. It is often applied in storage of n-dimensional arrays, such as in the case of the geo-spatial zarr file format. This package can be used to compress and decompress data using Blosc'.
Estimation and interpretation of Bayesian distributed lag interaction models (BDLIMs). A BDLIM regresses a scalar outcome on repeated measures of exposure and allows for modification by a categorical variable under four specific patterns of modification. The main function is bdlim(). There are also summary and plotting files. Details on methodology are described in Wilson et al. (2017) <doi:10.1093/biostatistics/kxx002>.
Different tools for describing and analysing paired comparison data are presented. Main methods are estimation of products scores according Bradley Terry Luce model. A segmentation of the individual could be conducted on the basis of a mixture distribution approach. The number of classes can be tested by the use of Monte Carlo simulations. This package deals also with multi-criteria paired comparison data.
Ecological Metadata Language or EML is a long-established format for describing ecological datasets to facilitate sharing and re-use. Because EML is effectively a modified xml schema, however, it is challenging to write and manipulate for non-expert users. delma supports users to write metadata statements in R Markdown or Quarto markdown format, and parse them to EML and (optionally) back again.
Analysis of items and persons in data. To identify and remove person misfit in polytomous item-response data using either mokken or a graded response model (GRM, via mirt'). Provides automatic thresholds, visual diagnostics (2D/3D), and export utilities. Methods build on Mokken scaling as in Mokken (1971, ISBN:9789027968821) and on the graded response model of Samejima (1969) <doi:10.1007/BF03372160>.
Allows calculation on, and sampling from Gibbs Random Fields, and more precisely general homogeneous Potts model. The primary tool is the exact computation of the intractable normalising constant for small rectangular lattices. Beside the latter function, it contains method that give exact sample from the likelihood for small enough rectangular lattices or approximate sample from the likelihood using MCMC samplers for large lattices.
This package creates and plots 2D and 3D hive plots. Hive plots are a unique method of displaying networks of many types in which node properties are mapped to axes using meaningful properties rather than being arbitrarily positioned. The hive plot concept was invented by Martin Krzywinski at the Genome Science Center (www.hiveplot.net/). Keywords: networks, food webs, linnet, systems biology, bioinformatics.