This package aggregateBioVar contains tools to summarize single cell gene expression profiles at the level of subject for single cell RNA-seq data collected from more than one subject (e.g. biological sample or technical replicates). A SingleCellExperiment object is taken as input and converted to a list of SummarizedExperiment objects, where each list element corresponds to an assigned cell type. The SummarizedExperiment objects contain aggregate gene-by-subject count matrices and inter-subject column metadata for individual subjects that can be processed using downstream bulk RNA-seq tools.
This package provides generic data structures and algorithms for use with forest mensuration data in a consistent framework. The functions and objects included are a collection of broadly applicable tools. More specialized applications should be implemented in separate packages that build on this foundation. Documentation about ForestElementsR is provided by three vignettes included in this package. For an introduction to the field of forest mensuration, refer to the textbooks by Kershaw et al. (2017) <doi:10.1002/9781118902028>, and van Laar and Akca (2007) <doi:10.1007/978-1-4020-5991-9>.
This package provides automated methods for generating initial parameter estimates in population pharmacokinetic modeling. The pipeline integrates adaptive single-point methods, naive pooled graphic approaches, noncompartmental analysis methods, and parameter sweeping across pharmacokinetic models. It estimates residual unexplained variability using either data-driven or fixed-fraction approaches and assigns pragmatic initial values for inter-individual variability. These strategies are designed to improve model robustness and convergence in nlmixr2 workflows. For more details see Huang Z, Fidler M, Lan M, Cheng IL, Kloprogge F, Standing JF (2025) <doi:10.1007/s10928-025-10000-z>.
This package provides methods for analysis of compositional data including robust methods (<doi:10.1007/978-3-319-96422-5>), imputation of missing values (<doi:10.1016/j.csda.2009.11.023>), methods to replace rounded zeros (<doi:10.1080/02664763.2017.1410524>, <doi:10.1016/j.chemolab.2016.04.011>, <doi:10.1016/j.csda.2012.02.012>), count zeros (<doi:10.1177/1471082X14535524>), methods to deal with essential zeros (<doi:10.1080/02664763.2016.1182135>), (robust) outlier detection for compositional data, (robust) principal component analysis for compositional data, (robust) factor analysis for compositional data, (robust) discriminant analysis for compositional data (Fisher rule), robust regression with compositional predictors, functional data analysis (<doi:10.1016/j.csda.2015.07.007>) and p-splines (<doi:10.1016/j.csda.2015.07.007>), contingency (<doi:10.1080/03610926.2013.824980>) and compositional tables (<doi:10.1111/sjos.12326>, <doi:10.1111/sjos.12223>, <doi:10.1080/02664763.2013.856871>) and (robust) Anderson-Darling normality tests for compositional data as well as popular log-ratio transformations (addLR, cenLR, isomLR, and their inverse transformations). In addition, visualisation and diagnostic tools are implemented as well as high and low-level plot functions for the ternary diagram.
This package provides implementations of functions that can be used to test multivariate integration routines. The package covers six different integration domains (unit hypercube, unit ball, unit sphere, standard simplex, non-negative real numbers and R^n). For each domain several functions with different properties (smooth, non-differentiable, ...) are available. The functions are available in all dimensions n >= 1. For each function the exact value of the integral is known and implemented to allow testing the accuracy of multivariate integration routines. Details on the available test functions can be found at on the development website.
In population management, data come at more or less regular intervals over time in sampling batches (bouts) and decisions should be made with the minimum number of samples and as quickly as possible. This package provides tools to implement, produce charts with stop lines, summarize results and assess sequential analyses that test hypotheses about population sizes. Two approaches are included: the sequential test of Bayesian posterior probabilities (Rincon, D.F. et al. 2025 <doi:10.1111/2041-210X.70053>), and the sequential probability ratio test (Wald, A. 1945 <http://www.jstor.org/stable/2235829>).
The package uses collectbox to define variants of common box related macros which read the content as real box and not as macro argument. This enables the use of verbatim or other special material as part of this content. The provided macros have the same names as the original versions but start with an upper-case letter instead. The long-form macros, like \Makebox, can also be used as environments, but not the short-form macros, like \Mbox. However, normally the long form uses the short form anyway when no optional arguments are used.
Modular and unified R6-based interface for counterfactual explanation methods. The following methods are currently implemented: Burghmans et al. (2022) <doi:10.48550/arXiv.2104.07411>, Dandl et al. (2020) <doi:10.1007/978-3-030-58112-1_31> and Wexler et al. (2019) <doi:10.1109/TVCG.2019.2934619>. Optional extensions allow these methods to be applied to a variety of models and use cases. Once generated, the counterfactuals can be analyzed and visualized by provided functionalities. The package is described in detail in Dandl et al. (2025) <doi:10.18637/jss.v115.i09>.
Tests for block-diagonal structure in symmetric matrices (e.g. correlation matrices) under the null hypothesis of exchangeable off-diagonal elements. As described in Segal et al. (2019), these tests can be useful for construct validation either by themselves or as a complement to confirmatory factor analysis. Monte Carlo methods are used to approximate the permutation p-value with Hubert's Gamma (Hubert, 1976) and a t-statistic. This package also implements the chi-squared statistic described by Steiger (1980). Please see Segal, et al. (2019) <doi:10.1007/s11336-018-9647-4> for more information.
This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.
systemPipeTools package extends the widely used systemPipeR (SPR) workflow environment with an enhanced toolkit for data visualization, including utilities to automate the data visualizaton for analysis of differentially expressed genes (DEGs). systemPipeTools provides data transformation and data exploration functions via scatterplots, hierarchical clustering heatMaps, principal component analysis, multidimensional scaling, generalized principal components, t-Distributed Stochastic Neighbor embedding (t-SNE), and MA and volcano plots. All these utilities can be integrated with the modular design of the systemPipeR environment that allows users to easily substitute any of these features and/or custom with alternatives.
Discrete event simulation using both R and C++ (Karlsson et al 2016; <doi:10.1109/eScience.2016.7870915>). The C++ code is adapted from the SSIM library <https://www.inf.usi.ch/carzaniga/ssim/>, allowing for event-oriented simulation. The code includes a SummaryReport class for reporting events and costs by age and other covariates. The C++ code is available as a static library for linking to other packages. A priority queue implementation is given in C++ together with an S3 closure and a reference class implementation. Finally, some tools are provided for cost-effectiveness analysis.
Takes the outputs of a caret confusion matrix and allows for the quick conversion of these list items to lists. The intended usage is to allow the tool to work with the outputs of machine learning classification models. This tool works with classification problems for binary and multi-classification problems and allows for the record level conversion of the confusion matrix outputs. This is useful, as it allows quick conversion of these objects for storage in database systems and to track ML model performance over time. Traditionally, this approach has been used for highlighting model representation and feature slippage.
Precompiled and processed miRNA-overexpression fold-changes from 84 Gene Expression Omnibus (GEO) series corresponding to 6 platforms, 77 human cells or tissues, and 113 distinct miRNAs. Accompanied with the data, we also included in this package the sequence feature scores from TargetScanHuman 6.1 including the context+ score and the probabilities of conserved targeting for each miRNA-mRNA interaction. Thus, the user can use these static sequence-based scores together with user-supplied tissue/cell-specific fold-change due to miRNA overexpression to predict miRNA targets using the package TargetScore (download separately).
It provides functions for estimating parameters in linear spatial models with censored or missing responses using the Expectation-Maximization (EM), Stochastic Approximation EM (SAEM), and Monte Carlo EM (MCEM) algorithms. These methods are widely used to obtain maximum likelihood (ML) estimates in the presence of incomplete data. The EM algorithm computes ML estimates when a closed-form expression for the conditional expectation of the complete-data log-likelihood is available. The MCEM algorithm replaces this expectation with a Monte Carlo approximation based on independent simulations of the missing data. In contrast, the SAEM algorithm decomposes the E-step into simulation and stochastic approximation steps, improving computational efficiency in complex settings. In addition, the package provides standard error estimation based on the Louis method. It also includes functionality for spatial prediction at new locations. References used for this package: Galarza, C. E., Matos, L. A., Castro, L. M., & Lachos, V. H. (2022). Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-t distribution. Journal of Multivariate Analysis, 189, 104944 <doi:10.1016/j.jmva.2021.104944>; Valeriano, K. A., Galarza, C. E., & Matos, L. A. (2023). Moments and random number generation for the truncated elliptical family of distributions. Statistics and Computing, 33(1), 32 <doi:10.1007/s11222-022-10200-4>.
The normal process of creating clinical study slides is that a statistician manually type in the numbers from outputs and a separate statistician to double check the typed in numbers. This process is time consuming, resource intensive, and error prone. Automatic slide generation is a solution to address these issues. It reduces the amount of work and the required time when creating slides, and reduces the risk of errors from manually typing or copying numbers from the output to slides. It also helps users to avoid unnecessary stress when creating large amounts of slide decks in a short time window.
Machine learning algorithms for predictor variables that are compositional data and the response variable is either continuous or categorical. Specifically, the Boruta variable selection algorithm, random forest, support vector machines and projection pursuit regression are included. Relevant papers include: Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> and Alenazi, A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>.
Convert one biological ID to another of rice (Oryza sativa). Rice(Oryza sativa) has more than one form gene ID for the genome. The two main gene ID for rice genome are the RAP (The Rice Annotation Project, <https://rapdb.dna.affrc.go.jp/>, and the MSU(The Rice Genome Annotation Project, <http://rice.plantbiology.msu.edu/>. All RAP rice gene IDs are of the form Os##g####### as explained on the website <https://rapdb.dna.affrc.go.jp/>. All MSU rice gene IDs are of the form LOC_Os##g##### as explained on the website <http://rice.plantbiology.msu.edu/analyses_nomenclature.shtml>. All SYMBOL rice gene IDs are the unique name on the NCBI(National Center for Biotechnology Information, <https://www.ncbi.nlm.nih.gov/>. The TRANSCRIPTID, is the transcript id of rice, are of the form Os##t#######. The researchers usually need to converter between various IDs. Such as converter RAP to SYMBOLS for function searching on NCBI. There are a lot of websites with the function for converting RAP to MSU or MSU to RA, such as ID Converter <https://rapdb.dna.affrc.go.jp/tools/converter>. But it is difficult to convert super multiple IDs on these websites. The package can convert all IDs between the three IDs (RAP, MSU and SYMBOL) regardless of the number.
The goal of automatedRecLin is to perform record linkage (also known as entity resolution) in unsupervised or supervised settings. It compares pairs of records from two datasets using selected comparison functions to estimate the probability or density ratio between matched and non-matched records. Based on these estimates, it predicts a set of matches that maximizes entropy. For details see: Lee et al. (2022) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022001/article/00007-eng.htm>, Vo et al. (2023) <https://ideas.repec.org/a/eee/csdana/v179y2023ics0167947322002365.html>, Sugiyama et al. (2008) <doi:10.1007/s10463-008-0197-x>.
Perform fast and memory efficient time-weighted averaging of values measured over intervals into new arbitrary intervals. This package is useful in the context of data measured or represented as constant values over intervals on a one-dimensional discrete axis (e.g. time-integrated averages of a curve over defined periods). This package was written specifically to deal with air pollution data recorded or predicted as averages over sampling periods. Data in this format often needs to be shifted to non-aligned periods or averaged up to periods of longer duration (e.g. averaging data measured over sequential non-overlapping periods to calendar years).
An implementation of the additive (Gurevitch et al., 2000 <doi:10.1086/303337>) and multiplicative (Lajeunesse, 2011 <doi:10.1890/11-0423.1>) factorial null models for multiple stressor data (Burgess et al., 2021 <doi:10.1101/2021.07.21.453207>). Effect sizes are able to be calculated for either null model, and subsequently classified into one of four different interaction classifications (e.g., antagonistic or synergistic interactions). Analyses can be conducted on data for single experiments through to large meta-analytical datasets. Minimal input (or statistical knowledge) is required, with any output easily understood. Summary figures are also able to be easily generated.
The price action at any given time is determined by investor sentiment and market conditions. Although there is no established principle, over a long period of time, things often move with a certain periodicity. This is sometimes referred to as anomaly. The seasonPlot() function in this package calculates and visualizes the average value of price movements over a year for any given period. In addition, the monthly increase or decrease in price movement is represented with a colored background. This seasonPlot() function can use the same symbols as the quantmod package (e.g. ^IXIC, ^DJI, SPY, BTC-USD, and ETH-USD etc).
This package performs sensitivity analysis for publication bias in meta-analyses (per Mathur & VanderWeele, 2020 [<doi:10.31219/osf.io/s9dp6>]). These analyses enable statements such as: "For publication bias to shift the observed point estimate to the null, significant results would need to be at least 30-fold more likely to be published than negative or nonsignificant results." Comparable statements can be made regarding shifting to a chosen non-null value or shifting the confidence interval. Provides a worst-case meta-analytic point estimate under maximal publication bias obtained simply by conducting a standard meta-analysis of only the negative and "nonsignificant" studies.
This package provides a system for batch-marking data analysis to estimate survival probabilities, capture probabilities, and enumerate the population abundance for both marked and unmarked individuals. The estimation of only marked individuals can be achieved through the batchMarkOptim() function. Similarly, the combined marked and unmarked can be achieved through the batchMarkUnmarkOptim() function. The algorithm was also implemented for the hidden Markov model encapsulated in batchMarkUnmarkOptim() to estimate the abundance of both marked and unmarked individuals in the population. The package is based on the paper: "Hidden Markov Models for Extended Batch Data" of Cowen et al. (2017) <doi:10.1111/biom.12701>.