This package provides methods for statistical analysis of count data and quantal data. For the analysis of count data an implementation of the Closure Principle Computational Approach Test ("CPCAT") is provided (Lehmann, R et al. (2016) <doi:10.1007/s00477-015-1079-4>), as well as an implementation of a "Dunnett GLM" approach using a Quasi-Poisson regression (Hothorn, L, Kluxen, F (2020) <doi:10.1101/2020.01.15.907881>). For the analysis of quantal data an implementation of the Closure Principle Fisherâ Freemanâ Halton test ("CPFISH") is provided (Lehmann, R et al. (2018) <doi:10.1007/s00477-017-1392-1>). P-values and no/lowest observed (adverse) effect concentration values are calculated. All implemented methods include further functions to evaluate the power and the minimum detectable difference using a bootstrapping approach.
`SPOTlight` provides a method to deconvolute spatial transcriptomics spots using a seeded NMF approach along with visualization tools to assess the results. Spatially resolved gene expression profiles are key to understand tissue organization and function. However, novel spatial transcriptomics (ST) profiling techniques lack single-cell resolution and require a combination with single-cell RNA sequencing (scRNA-seq) information to deconvolute the spatially indexed datasets. Leveraging the strengths of both data types, we developed SPOTlight, a computational tool that enables the integration of ST with scRNA-seq data to infer the location of cell types and states within a complex tissue. SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes and non-negative least squares (NNLS) to subsequently deconvolute ST capture locations (spots).
Analysis of complex plant root system architectures (RSA) using the output files created by Data Analysis of Root Tracings (DART), an open-access software dedicated to the study of plant root architecture and development across time series (Le Bot et al (2010) "DART: a software to analyse root system architecture and development from captured images", Plant and Soil, <DOI:10.1007/s11104-009-0005-2>), and RSA data encoded with the Root System Markup Language (RSML) (Lobet et al (2015) "Root System Markup Language: toward a unified root architecture description language", Plant Physiology, <DOI:10.1104/pp.114.253625>). More information can be found in Delory et al (2016) "archiDART: an R package for the automated computation of plant root architectural traits", Plant and Soil, <DOI:10.1007/s11104-015-2673-4>.
The primary goal of phase I clinical trials is to find the maximum tolerated dose (MTD). To reach this objective, we introduce a new design for phase I clinical trials, the posterior predictive (PoP) design. The PoP design is an innovative model-assisted design that is as simply as the conventional algorithmic designs as its decision rules can be pre-tabulated prior to the onset of trial, but is of more flexibility of selecting diverse target toxicity rates and cohort sizes. The PoP design has desirable properties, such as coherence and consistency. Moreover, the PoP design provides better empirical performance than the BOIN and Keyboard design with respect to high average probabilities of choosing the MTD and slightly lower risk of treating patients at subtherapeutic or overly toxic doses.
This package provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.
Fit Generalized Linear Models to continuous and count outcomes, as well as estimate the prevalence of misrepresentation of an important binary predictor. Misrepresentation typically arises when there is an incentive for the binary factor to be misclassified in one direction (e.g., in insurance settings where policy holders may purposely deny a risk status in order to lower the insurance premium). This is accomplished by treating a subset of the response variable as resulting from a mixture distribution. Model parameters are estimated via the Expectation Maximization algorithm and standard errors of the estimates are obtained from closed forms of the Observed Fisher Information. For an introduction to the models and the misrepresentation framework, see Xia et. al., (2023) <https://variancejournal.org/article/73151-maximum-likelihood-approaches-to-misrepresentation-models-in-glm-ratemaking-model-comparisons>.
Implementation of various estimation methods for dynamic factor models (DFMs) including principal components analysis (PCA) Stock and Watson (2002) <doi:10.1198/016214502388618960>, 2Stage Giannone et al. (2008) <doi:10.1016/j.jmoneco.2008.05.010>, expectation-maximisation (EM) Banbura and Modugno (2014) <doi:10.1002/jae.2306>, and the novel EM-sparse approach for sparse DFMs Mosley et al. (2023) <arXiv:2303.11892>. Options to use classic multivariate Kalman filter and smoother (KFS) equations from Shumway and Stoffer (1982) <doi:10.1111/j.1467-9892.1982.tb00349.x> or fast univariate KFS equations from Koopman and Durbin (2000) <doi:10.1111/1467-9892.00186>, and options for independent and identically distributed (IID) white noise or auto-regressive (AR(1)) idiosyncratic errors. Algorithms coded in C++ and linked to R via RcppArmadillo'.
ExCluster flattens Ensembl and GENCODE GTF files into GFF files, which are used to count reads per non-overlapping exon bin from BAM files. This read counting is done using the function featureCounts from the package Rsubread. Library sizes are normalized across all biological replicates, and ExCluster then compares two different conditions to detect signifcantly differentially spliced genes. This process requires at least two independent biological repliates per condition, and ExCluster accepts only exactly two conditions at a time. ExCluster ultimately produces false discovery rates (FDRs) per gene, which are used to detect significance. Exon log2 fold change (log2FC) means and variances may be plotted for each significantly differentially spliced gene, which helps scientists develop hypothesis and target differential splicing events for RT-qPCR validation in the wet lab.
Collection of routines for efficient scientific computations in physics and astrophysics. These routines include utility functions, numerical computation tools, as well as visualisation tools. They can be used, for example, for generating random numbers from spherical and custom distributions, information and entropy analysis, special Fourier transforms, two-point correlation estimation (e.g. as in Landy & Szalay (1993) <doi:10.1086/172900>), binning & gridding of point sets, 2D interpolation, Monte Carlo integration, vector arithmetic and coordinate transformations. Also included is a non-exhaustive list of important constants and cosmological conversion functions. The graphics routines can be used to produce and export publication-ready scientific plots and movies, e.g. as used in Obreschkow et al. (2020, MNRAS Vol 493, Issue 3, Pages 4551â 4569). These routines include special color scales, projection functions, and bitmap handling routines.
Create lipidome-wide heatmaps of statistics with the lipidomeR'. The lipidomeR provides a streamlined pipeline for the systematic interpretation of the lipidome through publication-ready visualizations of regression models fitted on lipidomics data. With lipidomeR', associations between covariates and the lipidome can be interpreted systematically and intuitively through heatmaps, where lipids are categorized by the lipid class and are presented on two-dimensional maps organized by the lipid size and level of saturation. This way, the lipidomeR helps you gain an immediate understanding of the multivariate patterns in the lipidome already at first glance. You can create lipidome-wide heatmaps of statistical associations, changes, differences, variation, or other lipid-specific values. The heatmaps are provided with publication-ready quality and the results behind the visualizations are based on rigorous statistical models.
The Mutual Information Index (M) introduced to social science literature by Theil and Finizza (1971) <doi:10.1080/0022250X.1971.9989795> is a multigroup segregation measure that is highly decomposable and that according to Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008> and Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> satisfies the Strong Unit Decomposability and Strong Group Decomposability properties. This package allows computing and decomposing the total index value into its "between" and "within" terms. These last terms can also be decomposed into their contributions, either by group or unit characteristics. The factors that produce each "within" term can also be displayed at the user's request. The results can be computed considering a variable or sets of variables that define separate clusters.
The biases introduced in association measures, particularly mutual information, are influenced by factors such as tumor purity, mutation burden, and hypermethylation. This package provides the estimation of conditional mutual information (CMI) and its statistical significance with a focus on its application to multi-omics data. Utilizing B-spline functions (inspired by Daub et al. (2004) <doi:10.1186/1471-2105-5-118>), the package offers tools to estimate the association between heterogeneous multi- omics data, while removing the effects of confounding factors. This helps to unravel complex biological interactions. In addition, it includes methods to evaluate the statistical significance of these associations, providing a robust framework for multi-omics data integration and analysis. This package is ideal for researchers in computational biology, bioinformatics, and systems biology seeking a comprehensive tool for understanding interdependencies in omics data.
This package provides a metric called Density-Based Clustering Validation index (DBCV) index to evaluate clustering results, following the <https://github.com/pajaskowiak/clusterConfusion/blob/main/R/dbcv.R> R implementation by Pablo Andretta Jaskowiak. Original DBCV index article: Moulavi, D., Jaskowiak, P. A., Campello, R. J., Zimek, A., and Sander, J. (April 2014), "Density-based clustering validation", Proceedings of SDM 2014 -- the 2014 SIAM International Conference on Data Mining (pp. 839-847), <doi:10.1137/1.9781611973440.96>. A more recent article on the DBCV index: Chicco, D., Sabino, G.; Oneto, L.; Jurman, G. (August 2025), "The DBCV index is more informative than DCSI, CDbw, and VIASCKDE indices for unsupervised clustering internal assessment of concave-shaped and density-based clusters", PeerJ Computer Science 11:e3095 (pp. 1-), <doi:10.7717/peerj-cs.3095>.
This is the first package allowing for the estimation, visualization and prediction of the most well-known football models: double Poisson, bivariate Poisson, Skellam, student_t, diagonal-inflated bivariate Poisson, and zero-inflated Skellam. It supports both maximum likelihood estimation (MLE, for static models only) and Bayesian inference. For Bayesian methods, it incorporates several techniques: MCMC sampling with Hamiltonian Monte Carlo, variational inference using either the Pathfinder algorithm or Automatic Differentiation Variational Inference (ADVI), and the Laplace approximation. The package compiles all the CmdStan models once during installation using the instantiate package. The model construction relies on the most well-known football references, such as Dixon and Coles (1997) <doi:10.1111/1467-9876.00065>, Karlis and Ntzoufras (2003) <doi:10.1111/1467-9884.00366> and Egidi, Pauli and Torelli (2018) <doi:10.1177/1471082X18798414>.
Facilitates modeling species ecological niches and geographic distributions based on occurrences and environments that have a vertical as well as horizontal component, and projecting models into three-dimensional geographic space. Working in three dimensions is useful in an aquatic context when the organisms one wishes to model can be found across a wide range of depths in the water column. The package also contains functions to automatically generate marine training model training regions using machine learning, and interpolate and smooth patchily sampled environmental rasters using thin plate splines. Davis Rabosky AR, Cox CL, Rabosky DL, Title PO, Holmes IA, Feldman A, McGuire JA (2016) <doi:10.1038/ncomms11484>. Nychka D, Furrer R, Paige J, Sain S (2021) <doi:10.5065/D6W957CT>. Pateiro-Lopez B, Rodriguez-Casal A (2022) <https://CRAN.R-project.org/package=alphahull>.
Multidimensional scaling models and methods for the visualization and analysis of asymmetric proximity data. An asymmetric data matrix has the same number of rows and columns, and these rows and columns refer to the same set of objects. At least some elements in the upper-triangle are different from the corresponding elements in the lower triangle. An example of an asymmetric matrix is a student migration table, where the rows correspond to the countries of origin of the students and the columns to the destination countries. This package provides algorithms for three multidimensional scaling models, the slide-vector model, a scaling model with unique dimensions and the asymscal model.Furthermore, some other procedures, such as a heat map for skew-symmetric data, and the decomposition of asymmetry are also provided for the exploratory analysis of asymmetric tables.
Maximum likelihood estimation of copula-based zero-inflated (and non-inflated) Poisson and negative binomial count models, based on the article <doi:10.18637/jss.v109.i01>. Supports Frank and Gaussian copulas. Allows for mixed margins (e.g., one margin Poisson, the other zero-inflated negative binomial), and several marginal link functions. Built-in methods for publication-quality tables using texreg', post-estimation diagnostics using DHARMa', and testing for marginal zero-modification via <doi:10.1177/0962280217749991>. For information on copula regression for count data, see Genest and Nešlehová (2007) <doi:10.1017/S0515036100014963> as well as Nikoloulopoulos (2013) <doi:10.1007/978-3-642-35407-6_11>. For information on zero-inflated count regression generally, see Lambert (1992) <https://www.jstor.org/stable/1269547>. The author acknowledges support by NSF DMS-1925119 and DMS-212324.
This package contains a suite of functions for health economic evaluations with missing outcome data. The package can fit different types of statistical models under a fully Bayesian approach using the software JAGS (which should be installed locally and which is loaded in missingHE via the R package R2jags'). Three classes of models can be fitted under a variety of missing data assumptions: selection models, pattern mixture models and hurdle models. In addition to model fitting, missingHE provides a set of specialised functions to assess model convergence and fit, and to summarise the statistical and economic results using different types of measures and graphs. The methods implemented are described in Mason (2018) <doi:10.1002/hec.3793>, Molenberghs (2000) <doi:10.1007/978-1-4419-0300-6_18> and Gabrio (2019) <doi:10.1002/sim.8045>.
This package provides tools for the integration, visualisation, and modelling of spatial epidemiological data using the method described in Azeez, A., & Noel, C. (2025). Predictive Modelling and Spatial Distribution of Pancreatic Cancer in Africa Using Machine Learning-Based Spatial Model <doi:10.5281/zenodo.16529986> and <doi:10.5281/zenodo.16529016>. It facilitates the analysis of geographic health data by combining modern spatial mapping tools with advanced machine learning (ML) algorithms. mlspatial enables users to import and pre-process shapefile and associated demographic or disease incidence data, generate richly annotated thematic maps, and apply predictive models, including Random Forest, XGBoost', and Support Vector Regression, to identify spatial patterns and risk factors. It is suited for spatial epidemiologists, public health researchers, and GIS analysts aiming to uncover hidden geographic patterns in health-related outcomes and inform evidence-based interventions.
This package implements confidence interval and sample size methods that are especially useful in psychological research. The methods can be applied in 1-group, 2-group, paired-samples, and multiple-group designs and to a variety of parameters including means, medians, proportions, slopes, standardized mean differences, standardized linear contrasts of means, plus several measures of correlation and association. Confidence interval and sample size functions are given for single parameters as well as differences, ratios, and linear contrasts of parameters. The sample size functions can be used to approximate the sample size needed to estimate a parameter or function of parameters with desired confidence interval precision or to perform a variety of hypothesis tests (directional two-sided, equivalence, superiority, noninferiority) with desired power. For details see: Statistical Methods for Psychologists, Volumes 1 â 4, <https://dgbonett.sites.ucsc.edu/>.
Package test2norm contains functions to generate formulas for normative standards applied to cognitive tests. It takes raw test scores (e.g., number of correct responses) and converts them to scaled scores and demographically adjusted scores, using methods described in Heaton et al. (2003) <doi:10.1016/B978-012703570-3/50010-9> & Heaton et al. (2009, ISBN:9780199702800). The scaled scores are calculated as quantiles of the raw test scores, scaled to have the mean of 10 and standard deviation of 3, such that higher values always correspond to better performance on the test. The demographically adjusted scores are calculated from the residuals of a model that regresses scaled scores on demographic predictors (e.g., age). The norming procedure makes use of the mfp2() function from the mfp2 package to explore nonlinear associations between cognition and demographic variables.
This package provides a dataframe-friendly implementation of ComBat Harmonization which uses an empirical Bayesian framework to remove batch effects. Johnson WE & Li C (2007) <doi:10.1093/biostatistics/kxj037> "Adjusting batch effects in microarray expression data using empirical Bayes methods." Fortin J-P, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA, Adams P, Cooper C, Fava M, McGrath PJ, McInnes M, Phillips ML, Trivedi MH, Weissman MM, & Shinohara RT (2017) <doi:10.1016/j.neuroimage.2017.11.024> "Harmonization of cortical thickness measurements across scanners and sites." Fortin J-P, Parker D, Tun<e7> B, Watanabe T, Elliott MA, Ruparel K, Roalf DR, Satterthwaite TD, Gur RC, Gur RE, Schultz RT, Verma R, & Shinohara RT (2017) <doi:10.1016/j.neuroimage.2017.08.047> "Harmonization of multi-site diffusion tensor imaging data.".
Generates blocked designs for mixed-level factorial experiments for a given block size. Internally, it uses finite-field based, collapsed, and heuristic methods to construct block structures that minimize confounding between block effects and factorial effects. The package creates the full treatment combination table, partitions runs into blocks, and computes detailed confounding diagnostics for main effects and two-factor interactions. It also checks orthogonal factorial structure (OFS) and computes efficiencies of factorial effects using the methods of Nair and Rao (1948) <doi:10.1111/j.2517-6161.1948.tb00005.x>. When OFS is not satisfied but the design has equal treatment replications and equal block sizes, a general method based on the C-matrix and custom contrast vectors is used to compute efficiencies. The output includes the generated design, finite-field metadata, confounding summaries, OFS diagnostics, and efficiency results.
Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.