Estimation/multiple imputation programs for mixed categorical and continuous data.
Specification and estimation of multinomial logit models. Large datasets and complex models are supported, with an intuitive syntax. Multinomial Logit Models, Mixed models, random coefficients and Hybrid Choice are all supported. For more information, see Molloy et al. (2021) <https://www.research-collection.ethz.ch/handle/20.500.11850/477416>.
This package performs maximum likelihood estimation for finite mixture models for families including Normal, Weibull, Gamma and Lognormal by using EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary. It also conducts mixture model selection by using information criteria or bootstrap likelihood ratio test. The data used for mixture model fitting can be raw data or binned data. The model fitting process is accelerated by using R package Rcpp'.
It offers random-forest-based functions to impute clustered incomplete data. The package is tailored for but not limited to imputing multitissue expression data, in which a gene's expression is measured on the collected tissues of an individual but missing on the uncollected tissues.
Model time series using mixture autoregressive (MAR) models. Implemented are frequentist (EM) and Bayesian methods for estimation, prediction and model evaluation. See Wong and Li (2002) <doi:10.1111/1467-9868.00222>, Boshnakov (2009) <doi:10.1016/j.spl.2009.04.009>), and the extensive references in the documentation.
The main functions perform mixed models analysis by least squares or REML by adding the function r()
to formulas of lm()
and glm()
. A collection of text-book statistics for higher education is also included, e.g. modifications of the functions lm()
, glm()
and associated summaries from the package stats'.
Multiple imputation using XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2023) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.
This package contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models. The package is primarily related to the publications Komárek (2009, Comp. Stat. and Data Anal.) <doi:10.1016/j.csda.2009.05.006> and Komárek and Komárková (2014, J. of Stat. Soft.) <doi:10.18637/jss.v059.i12>. It also implements methods published in Komárek and Komárková (2013, Ann. of Appl. Stat.) <doi:10.1214/12-AOAS580>, Hughes, Komárek, Bonnett, Czanner, Garcà a-Fiñana (2017, Stat. in Med.) <doi:10.1002/sim.7397>, Jaspers, Komárek, Aerts (2018, Biom. J.) <doi:10.1002/bimj.201600253> and Hughes, Komárek, Czanner, Garcà a-Fiñana (2018, Stat. Meth. in Med. Res) <doi:10.1177/0962280216674496>.
This package provides functions for creating designs for mixture experiments, making ternary contour plots, and making mixture effect plots.
Mixed variable optimization for non-linear functions. Can optimize function whose inputs are a combination of continuous, ordered, and unordered variables.
Evaluation and optimization of the Fisher Information Matrix in NonLinear
Mixed Effect Models using Markov Chains Monte Carlo for continuous and discrete data.
Mixed effects cumulative and baseline logit link models for the analysis of ordinal or nominal responses, with non-parametric distribution for the random effects.
Curve Fitting of monotonic(sigmoidal) & non-monotonic(J-shaped) dose-response data. Predicting mixture toxicity based on reference models such as concentration addition', independent action', and generalized concentration addition'.
Highly variable gene selection methods, including popular public available methods, and also the mixture of multiple highly variable gene selection methods, <https://github.com/RuzhangZhao/mixhvg>
. Reference: <doi:10.1101/2024.08.25.608519>.
Developed for model-based clustering using the finite mixtures of skewed sub-Gaussian stable distributions developed by Teimouri (2022) <arXiv:2205.14067>
and estimating parameters of the symmetric stable distribution within the Bayesian framework.
Mixtures of skewed and elliptical distributions are implemented using mixtures of multivariate skew power exponential and power exponential distributions, respectively. A generalized expectation-maximization framework is used for parameter estimation. See citation()
for how to cite.
This package provides a function for the estimation of mixture of longitudinal factor analysis models using the iterative expectation-maximization algorithm (Ounajim, Slaoui, Louis, Billot, Frasca, Rigoard (2023) <doi:10.1002/sim.9804>) and several tools for visualizing and interpreting the models parameters.
Algorithms and methods for model-based clustering and classification. It supports various types of data: continuous, categorical and counting and can handle mixed data of these types. It can fit Gaussian (with diagonal covariance structure), gamma, categorical and Poisson models. The algorithms also support missing values.
Fits multiple variable mixtures of various parametric proportional hazard models using the EM-Algorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables.
This package provides a set of utility functions for analysing and modelling data from continuous report short-term memory experiments using either the 2-component mixture model of Zhang and Luck (2008) <doi:10.1038/nature06860> or the 3-component mixture model of Bays et al. (2009) <doi:10.1167/9.10.7>. Users are also able to simulate from these models.
The current version of the MixSAL
package allows users to generate data from a multivariate SAL distribution or a mixture of multivariate SAL distributions, evaluate the probability density function of a multivariate SAL distribution or a mixture of multivariate SAL distributions, and fit a mixture of multivariate SAL distributions using the Expectation-Maximization (EM) algorithm (see Franczak et. al, 2014, <doi:10.1109/TPAMI.2013.216>, for details).
Developed for the following tasks. 1- simulating realizations from the canonical, restricted, and unrestricted finite mixture models. 2- Monte Carlo approximation for density function of the finite mixture models. 3- Monte Carlo approximation for the observed Fisher information matrix, asymptotic standard error, and the corresponding confidence intervals for parameters of the mixture models sing the method proposed by Basford et al. (1997) <https://espace.library.uq.edu.au/view/UQ:57525>.
This package provides an optimization method based on sequential quadratic programming for maximum likelihood estimation of the mixture proportions in a finite mixture model where the component densities are known. The algorithm is expected to obtain solutions that are at least as accurate as the state-of-the-art MOSEK interior-point solver, and they are expected to arrive at solutions more quickly when the number of samples is large and the number of mixture components is not too large.
The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim
', there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.