Returns an edit-distance based clusterization of an input vector of strings. Each cluster will contain a set of strings w/ small mutual edit-distance (e.g., Levenshtein, optimum-sequence-alignment, Damerau-Levenshtein), as computed by stringdist::stringdist(). The set of all mutual edit-distances is then used by graph algorithms (from package igraph') to single out subsets of high connectivity.
This package performs the cross-match test that is an exact, distribution free test of equality of 2 high dimensional multivariate distributions. The input is a distance matrix and the labels of the two groups to be compared, the output is the number of cross-matches and a p-value. See Rosenbaum (2005) <doi:10.1111/j.1467-9868.2005.00513.x>.
Computes a range of scatterplot diagnostics (scagnostics) on pairs of numerical variables in a data set. A range of scagnostics, including graph and association-based scagnostics described by Leland Wilkinson and Graham Wills (2008) <doi:10.1198/106186008X320465> and association-based scagnostics described by Katrin Grimm (2016,ISBN:978-3-8439-3092-5) can be computed. Summary and plotting functions are provided.
This package provides a set of extensions for the ergm package to fit multilayer/multiplex/multirelational networks and samples of multiple networks. ergm.multi is a part of the Statnet suite of packages for network analysis. See Krivitsky, Koehly, and Marcum (2020) <doi:10.1007/s11336-020-09720-7> and Krivitsky, Coletti, and Hens (2023) <doi:10.1080/01621459.2023.2242627>.
Fits extreme value mixture models, which are models for tails not requiring selection of a threshold, for continuous data. It includes functions for model comparison, estimation of quantity of interest in extreme value analysis and plotting. Reference: CN Behrens, HF Lopes, D Gamerman (2004) <doi:10.1191/1471082X04st075oa>. FF do Nascimento, D. Gamerman, HF Lopes <doi:10.1007/s11222-011-9270-z>.
Small set of functions designed to speed up the computation of certain matrix operations that are commonly used in statistics and econometrics. It provides efficient implementations for the computation of several structured matrices, matrix decompositions and statistical procedures, many of which have minimal memory overhead. Furthermore, the package provides interfaces to C code callable by another C code from other R packages.
Fits hidden Markov models with discrete non-parametric observation distributions to data sets. The observations may be univariate or bivariate. Simulates data from such models. Finds most probable underlying hidden states, the most probable sequences of such states, and the log likelihood of a collection of observations given the parameters of the model. Auxiliary predictors are accommodated in the univariate setting.
Multi-Fidelity emulator for data from computer simulations of the same underlying system but at different input locations and fidelity level, where both the input locations and fidelity level can be continuous. Active Learning can be performed with an implementation of the Integrated Mean Square Prediction Error (IMSPE) criterion developed by Boutelet and Sung (2025, <doi:10.48550/arXiv.2503.23158>).
Fits a Bayesian Regression Model for multivariate count data. This model assumes that the data is distributed according to the Conway-Maxwell-Poisson distribution, and for each response variable it is associate different covariates. This model allows to account for correlations between the counts by using latent effects based on the Chib and Winkelmann (2001) <http://www.jstor.org/stable/1392277> proposal.
Multivariate generalized Gaussian distribution, Multivariate Cauchy distribution, Multivariate t distribution. Distance between two distributions (see N. Bouhlel and A. Dziri (2019): <doi:10.1109/LSP.2019.2915000>, N. Bouhlel and D. Rousseau (2022): <doi:10.3390/e24060838>, N. Bouhlel and D. Rousseau (2023): <doi:10.1109/LSP.2023.3324594>). Manipulation of these multivariate probability distributions. This package replaces mggd', mcauchyd and mstudentd'.
This package provides deterministic approximation methods for use with the nimble package. These include Laplace approximation and higher-order extension of Laplace approximation using adaptive Gauss-Hermite quadrature (AGHQ), plus nested deterministic approximation methods related to the INLA approach. Additional information is available in the NIMBLE User Manual and a nimbleQuad tutorial, both available at <https://r-nimble.org/documentation.html>.
This package provides functions to design and simulate optimal two-stage randomized controlled trials (RCTs) with ordered categorical outcomes, supporting rank-based tests and group-sequential decision rules. Methods build on classical and modern rank tests and two-stage/Group-Sequential designs, e.g., Park (2025) <doi: 10.1371/journal.pone.0318211>. Please see the package reference manual and vignettes for details.
Discovery of spatial patterns with Hidden Markov Random Field. This package is designed for spatial transcriptomic data and single molecule fluorescent in situ hybridization (FISH) data such as sequential fluorescence in situ hybridization (seqFISH) and multiplexed error-robust fluorescence in situ hybridization (MERFISH). The methods implemented in this package are described in Zhu et al. (2018) <doi:10.1038/nbt.4260>.
This package provides a pipeline of tools for analysing circadian time-series data using functional data analysis (FDA). The package supports smoothing of rhythmic time series, functional principle component analysis (FPCA), and extraction of group-level traits from functional representations. Analyses can incorporate multiple curve derivatives and optional temporal segmentation, enabling comparative analysis of circadian dynamics across experimental groups and time windows.
This package provides a tool to fit and compare the wind turbine power curves with successful curve fitting techniques. Facilitates to examine and compare the performance of a user-defined power curve fitting techniques. Also, provide features to generate power curve discrete points from a graphical power curves. Data on the power curves of the wind turbine from major manufacturers are provided.
Ranked set sampling (RSS) is introduced as an advanced method for data collection which is substantial for the statistical and methodological analysis in scientific studies by McIntyre (1952) (reprinted in 2005) <doi:10.1198/000313005X54180>. This package introduces the first package that implements the RSS and its modified versions for sampling. With RSSampling', the researchers can sample with basic RSS and the modified versions, namely, Median RSS, Extreme RSS, Percentile RSS, Balanced groups RSS, Double RSS, L-RSS, Truncation-based RSS, Robust extreme RSS. The RSSampling also allows imperfect ranking using an auxiliary variable (concomitant) which is widely used in the real life applications. Applicants can also use this package for parametric and nonparametric inference such as mean, median and variance estimation, regression analysis and some distribution-free tests where the the samples are obtained via basic RSS.
This package provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.
An implementation of revised functional regression models for multiple genetic variation data, such as single nucleotide polymorphism (SNP) data, which provides revised functional linear regression models, partially functional interaction regression analysis with penalty-based techniques and corresponding drawing functions, etc.(Ruzong Fan, Yifan Wang, James L. Mills, Alexander F. Wilson, Joan E. Bailey-Wilson, and Momiao Xiong (2013) <doi:10.1002/gepi.21757>).
Aggregates a set of trees with the same leaves to create a consensus tree. The trees are typically obtained via hierarchical clustering, hence the hclust format is used to encode both the aggregated trees and the final consensus tree. The method is exact and proven to be O(nqlog(n)), n being the individuals and q being the number of trees to aggregate.
Given a set of data points, a clustering is defined as a disjoint partition where each pair of sets in a partition has no overlapping elements. This package provides 25 methods that play a role somewhat similar to distance or metric that measures similarity of two clusterings - or partitions. For a more detailed description, see Meila, M. (2005) <doi:10.1145/1102351.1102424>.
The second version (0.2.0) contains implementation for exact matching which is an alternative to propensity score matching (see Glimm & Yau (2025)). The initial version (0.1.2) contains a collection of easy-to-implement tools for checking whether a MAIC can be conducted, as well as an alternative way of calculating weights (see Glimm & Yau (2021) <doi:10.1002/pst.2210>.).
Conduct a noncompartmental analysis with industrial strength. Some features are 1) Use of CDISC SDTM terms 2) Automatic or manual slope selection 3) Supporting both linear-up linear-down and linear-up log-down method 4) Interval(partial) AUCs with linear or log interpolation method * Reference: Gabrielsson J, Weiner D. Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications. 5th ed. 2016. (ISBN:9198299107).
This package implements ordered beta regression models, which are for modeling continuous variables with upper and lower bounds, such as survey sliders, dose-response relationships and indexes. For more information, see Kubinec (2023) <doi:10.31235/osf.io/2sx6y>. The package is a front-end to the R package brms', which facilitates a range of regression specifications, including hierarchical, dynamic and multivariate modeling.
This package implements statistical methods for estimating disease penetrance in family-based studies. Penetrance refers to the probability of disease§ manifestation in individuals carrying specific genetic variants. The package provides tools for age-specific penetrance estimation, handling missing data, and accounting for ascertainment bias in family studies. Cite as: Kubista, N., Braun, D. & Parmigiani, G. (2024) <doi:10.48550/arXiv.2411.18816>.