Scalable Bayesian clustering of categorical datasets. The package implements a hierarchical Dirichlet (Process) mixture of multinomial distributions. It is thus a probabilistic latent class model (LCM) and can be used to reduce the dimensionality of hierarchical data and cluster individuals into latent classes. It can automatically infer an appropriate number of latent classes or find k classes, as defined by the user. The model is based on a paper by Dunson and Xing (2009) <doi:10.1198/jasa.2009.tm08439>, but implements a scalable variational inference algorithm so that it is applicable to large datasets. It is described and tested in the accompanying paper by Ahlmann-Eltze and Yau (2018) <doi:10.1109/DSAA.2018.00068>.
Carries out model-based clustering, classification and discriminant analysis using five different models. The models are all based on the generalized hyperbolic distribution. The first model MGHD (Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>) is the classical mixture of generalized hyperbolic distributions. The MGHFA (Tortora et al. (2016) <doi:10.1007/s11634-015-0204-z>) is the mixture of generalized hyperbolic factor analyzers for high dimensional data sets. The MSGHD is the mixture of multiple scaled generalized hyperbolic distributions, the cMSGHD
is a MSGHD with convex contour plots and the MCGHD', mixture of coalesced generalized hyperbolic distributions is a new more flexible model (Tortora et al. (2019)<doi:10.1007/s00357-019-09319-3>. The paper related to the software can be found at <doi:10.18637/jss.v098.i03>.
Tool for exploring DNA and amino acid variation and inferring the presence of target lineages from microbial high-throughput genomic DNA samples that potentially contain mixtures of variants/lineages. MixviR
was originally created to help analyze environmental SARS-CoV-2/Covid-19
samples from environmental sources such as wastewater or dust, but can be applied to any microbial group. Inputs include reference genome information in commonly-used file formats (fasta, bed) and one or more variant call format (VCF) files, which can be generated with programs such as Illumina's DRAGEN, the Genome Analysis Toolkit, or bcftools. See DePristo
et al (2011) <doi:10.1038/ng.806> and Danecek et al (2021) <doi:10.1093/gigascience/giab008> for these tools, respectively. Available outputs include a table of mutations observed in the sample(s), estimates of proportions of target lineages in the sample(s), and an R Shiny dashboard to interactively explore the data.
We provide detailed functions for univariate Mixed Tempered Stable distribution.
Implementation of parametric and semiparametric mixture cure models based on existing R packages. See details of the models in Peng and Yu (2020) <ISBN: 9780367145576>.
Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newton-type algorithm and the EM algorithm.
This package provides a collection of functions to perform various meta-analytical models through a unified mixed-effects framework, including standard univariate fixed and random-effects meta-analysis and meta-regression, and non-standard extensions such as multivariate, multilevel, longitudinal, and dose-response models.
Statistical framework for comparing sets of trees using hypothesis testing methods. Designed for transmission trees, phylogenetic trees, and directed acyclic graphs (DAGs), the package implements chi-squared tests to compare edge frequencies between sets and PERMANOVA to analyse topological dissimilarities with customisable distance metrics, following Anderson (2001) <doi:10.1111/j.1442-9993.2001.01070.pp.x>.
This package provides functions to fit finite mixture of scale mixture of skew-normal (FM-SMSN) distributions, details in Prates, Lachos and Cabral (2013) <doi: 10.18637/jss.v054.i12>, Cabral, Lachos and Prates (2012) <doi:10.1016/j.csda.2011.06.026> and Basso, Lachos, Cabral and Ghosh (2010) <doi:10.1016/j.csda.2009.09.031>.
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.
An implementation of 14 parsimonious mixture models for model-based clustering or model-based classification. Gaussian, Student's t, generalized hyperbolic, variance-gamma or skew-t mixtures are available. All approaches work with missing data. Celeux and Govaert (1995) <doi:10.1016/0031-3203(94)00125-6>, Browne and McNicholas
(2014) <doi:10.1007/s11634-013-0139-1>, Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>.
This package creates and runs Bayesian mixing models to analyze biological tracer data (i.e. stable isotopes, fatty acids), which estimate the proportions of source (prey) contributions to a mixture (consumer). MixSIAR
is not one model, but a framework that allows a user to create a mixing model based on their data structure and research questions, via options for fixed/ random effects, source data types, priors, and error terms. MixSIAR
incorporates several years of advances since MixSIR
and SIAR'.
Utilizing model-based clustering (unsupervised) for functional magnetic resonance imaging (fMRI
) data. The developed methods (Chen and Maitra (2023) <doi:10.1002/hbm.26425>) include 2D and 3D clustering analyses (for p-values with voxel locations) and segmentation analyses (for p-values alone) for fMRI
data where p-values indicate significant level of activation responding to stimulate of interesting. The analyses are mainly identifying active voxel/signal associated with normal brain behaviors. Analysis pipelines (R scripts) utilizing this package (see examples in inst/workflow/') is also implemented with high performance techniques.
This package provides a facility to generate various classes of fractional designs for order-of-addition experiments namely fractional order-of-additions orthogonal arrays, see Voelkel, Joseph G. (2019). "The design of order-of-addition experiments." Journal of Quality Technology 51:3, 230-241, <doi:10.1080/00224065.2019.1569958>. Provides facility to construct component orthogonal arrays, see Jian-Feng Yang, Fasheng Sun and Hongquan Xu (2020). "A Component Position Model, Analysis and Design for Order-of-Addition Experiments." Technometrics, <doi:10.1080/00401706.2020.1764394>. Supports generation of fractional designs for order-of-addition mixture experiments. Analysis of data from order-of-addition mixture experiments is also supported.
Deconvolution of thermal decay curves allows you to quantify proportions of biomass components in plant litter. Thermal decay curves derived from thermogravimetric analysis (TGA) are imported, modified, and then modelled in a three- or four- part mixture model using the Fraser-Suzuki function. The output is estimates for weights of pseudo-components corresponding to hemicellulose, cellulose, and lignin. For more information see: Müller-Hagedorn, M. and Bockhorn, H. (2007) <doi:10.1016/j.jaap.2006.12.008>, à rfão, J. J. M. and Figueiredo, J. L. (2001) <doi:10.1016/S0040-6031(01)00634-7>, and Yang, H. and Yan, R. and Chen, H. and Zheng, C. and Lee, D. H. and Liang, D. T. (2006) <doi:10.1021/ef0580117>.
This package provides a collection of R functions for analyzing finite mixture models.
Inference on stochastic differential models Ornstein-Uhlenbeck or Cox-Ingersoll-Ross, with one or two random effects in the drift function.
Simple tools to perform mixture optimization based on the desirability package by Max Kuhn. It also provides a plot routine using ggplot2 and patchwork'.
Mixed, low-rank, and sparse multivariate regression ('mixedLSR
') provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. mixedLSR
allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix.
Semi-parametric approach for sparse canonical correlation analysis which can handle mixed data types: continuous, binary and truncated continuous. Bridge functions are provided to connect Kendall's tau to latent correlation under the Gaussian copula model. The methods are described in Yoon, Carroll and Gaynanova (2020) <doi:10.1093/biomet/asaa007> and Yoon, Mueller and Gaynanova (2021) <doi:10.1080/10618600.2021.1882468>.
This package implements large-scale hypothesis testing by variance mixing. It takes two statistics per testing unit -- an estimated effect and its associated squared standard error -- and fits a nonparametric, shape-constrained mixture separately on two latent parameters. It reports local false discovery rates (lfdr) and local false sign rates (lfsr). Manuscript describing algorithm of MixTwice
: Zheng et al(2021) <doi: 10.1093/bioinformatics/btab162>.
This package provides tools for the analysis of psychophysical data in R. This package allows to estimate the Point of Subjective Equivalence (PSE) and the Just Noticeable Difference (JND), either from a psychometric function or from a Generalized Linear Mixed Model (GLMM). Additionally, the package allows plotting the fitted models and the response data, simulating psychometric functions of different shapes, and simulating data sets. For a description of the use of GLMMs applied to psychophysical data, refer to Moscatelli et al. (2012).
Fits mixed membership models with discrete multivariate data (with or without repeated measures) following the general framework of Erosheva et al (2004). This package uses a Variational EM approach by approximating the posterior distribution of latent memberships and selecting hyperparameters through a pseudo-MLE procedure. Currently supported data types are Bernoulli, multinomial and rank (Plackett-Luce). The extended GoM
model with fixed stayers from Erosheva et al (2007) is now also supported. See Airoldi et al (2014) for other examples of mixed membership models.
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.