This package provides an optimization method based on sequential quadratic programming for maximum likelihood estimation of the mixture proportions in a finite mixture model where the component densities are known. The algorithm is expected to obtain solutions that are at least as accurate as the state-of-the-art MOSEK interior-point solver, and they are expected to arrive at solutions more quickly when the number of samples is large and the number of mixture components is not too large.
Addons for the mice package to perform multiple imputation using chained equations with two-level data. Includes imputation methods dedicated to sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2018) <doi:10.1214/18-STS646>, the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation and overimputation for mice'.
This package offers three important components: (1) to construct a use-defined linear mixed model, (2) to employ one of linear mixed model approaches: minimum norm quadratic unbiased estimation (MINQUE) (Rao, 1971) for variance component estimation and random effect prediction; and (3) to employ a jackknife resampling technique to conduct various statistical tests. In addition, this package provides the function for model or data evaluations.This R package offers fast computations for large data sets analyses for various irregular data structures.
This package provides a framework that boosts the imputation of missForest
by Stekhoven, D.J. and Bühlmann, P. (2012) <doi:10.1093/bioinformatics/btr597> by harnessing parallel processing and through the fast Gradient Boosted Decision Trees (GBDT) implementation LightGBM
by Ke, Guolin et al.(2017) <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision>. misspi has the following main advantages: 1. Allows embrassingly parallel imputation on large scale data. 2. Accepts a variety of machine learning models as methods with friendly user portal. 3. Supports multiple initializations methods. 4. Supports early stopping that prohibits unnecessary iterations.
Test for overall association between microbiome composition data and phenotypes via phylogenetic kernels. The phenotype can be univariate continuous or binary (Zhao et al. (2015) <doi:10.1016/j.ajhg.2015.04.003>), survival outcomes (Plantinga et al. (2017) <doi:10.1186/s40168-017-0239-9>), multivariate (Zhan et al. (2017) <doi:10.1002/gepi.22030>) and structured phenotypes (Zhan et al. (2017) <doi:10.1111/biom.12684>). The package can also use robust regression (unpublished work) and integrated quantile regression (Wang et al. (2021) <doi:10.1093/bioinformatics/btab668>). In each case, the microbiome community effect is modeled nonparametrically through a kernel function, which can incorporate phylogenetic tree information.
The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim
', there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.
Scalable Bayesian clustering of categorical datasets. The package implements a hierarchical Dirichlet (Process) mixture of multinomial distributions. It is thus a probabilistic latent class model (LCM) and can be used to reduce the dimensionality of hierarchical data and cluster individuals into latent classes. It can automatically infer an appropriate number of latent classes or find k classes, as defined by the user. The model is based on a paper by Dunson and Xing (2009) <doi:10.1198/jasa.2009.tm08439>, but implements a scalable variational inference algorithm so that it is applicable to large datasets. It is described and tested in the accompanying paper by Ahlmann-Eltze and Yau (2018) <doi:10.1109/DSAA.2018.00068>.
Carries out model-based clustering, classification and discriminant analysis using five different models. The models are all based on the generalized hyperbolic distribution. The first model MGHD (Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>) is the classical mixture of generalized hyperbolic distributions. The MGHFA (Tortora et al. (2016) <doi:10.1007/s11634-015-0204-z>) is the mixture of generalized hyperbolic factor analyzers for high dimensional data sets. The MSGHD is the mixture of multiple scaled generalized hyperbolic distributions, the cMSGHD
is a MSGHD with convex contour plots and the MCGHD', mixture of coalesced generalized hyperbolic distributions is a new more flexible model (Tortora et al. (2019)<doi:10.1007/s00357-019-09319-3>. The paper related to the software can be found at <doi:10.18637/jss.v098.i03>.
The rapid screening of effective and optimal therapies from large numbers of candidate combinations, as well as exploring subgroup efficacy, remains challenging, which necessitates innovative, integrated, and efficient trial designs(Yuan, Y., et al. (2016) <doi:10.1002/sim.6971>). MIDAS-2 package enables quick and continuous screening of promising combination strategies and exploration of their subgroup effects within a unified platform design framework. We used a regression model to characterize the efficacy pattern in subgroups. Information borrowing was applied through Bayesian hierarchical model to improve trial efficiency considering the limited sample size in subgroups(Cunanan, K. M., et al. (2019) <doi:10.1177/1740774518812779>). MIDAS-2 provides an adaptive drug screening and subgroup exploring framework to accelerate immunotherapy development in an efficient, accurate, and integrated fashion(Wathen, J. K., & Thall, P. F. (2017) <doi: 10.1177/1740774517692302>).
Tool for exploring DNA and amino acid variation and inferring the presence of target lineages from microbial high-throughput genomic DNA samples that potentially contain mixtures of variants/lineages. MixviR
was originally created to help analyze environmental SARS-CoV-2/Covid-19
samples from environmental sources such as wastewater or dust, but can be applied to any microbial group. Inputs include reference genome information in commonly-used file formats (fasta, bed) and one or more variant call format (VCF) files, which can be generated with programs such as Illumina's DRAGEN, the Genome Analysis Toolkit, or bcftools. See DePristo
et al (2011) <doi:10.1038/ng.806> and Danecek et al (2021) <doi:10.1093/gigascience/giab008> for these tools, respectively. Available outputs include a table of mutations observed in the sample(s), estimates of proportions of target lineages in the sample(s), and an R Shiny dashboard to interactively explore the data.
mistyR
is an implementation of the Multiview Intercellular SpaTialmodeling
framework (MISTy). MISTy is an explainable machine learning framework for knowledge extraction and analysis of single-cell, highly multiplexed, spatially resolved data. MISTy facilitates an in-depth understanding of marker interactions by profiling the intra- and intercellular relationships. MISTy is a flexible framework able to process a custom number of views. Each of these views can describe a different spatial context, i.e., define a relationship among the observed expressions of the markers, such as intracellular regulation or paracrine regulation, but also, the views can also capture cell-type specific relationships, capture relations between functional footprints or focus on relations between different anatomical regions. Each MISTy view is considered as a potential source of variability in the measured marker expressions. Each MISTy view is then analyzed for its contribution to the total expression of each marker and is explained in terms of the interactions with other measurements that led to the observed contribution.
The tools for MicroRNA
Set Enrichment Analysis can identify risk pathways(or prior gene sets) regulated by microRNA
set in the context of microRNA
expression data. (1) This package constructs a correlation profile of microRNA
and pathways by the hypergeometric statistic test. The gene sets of pathways derived from the three public databases (Kyoto Encyclopedia of Genes and Genomes ('KEGG'); Reactome'; Biocarta') and the target gene sets of microRNA
are provided by four databases('TarBaseV6.0
'; mir2Disease'; miRecords
'; miRTarBase
';). (2) This package can quantify the change of correlation between microRNA
for each pathway(or prior gene set) based on a microRNA
expression data with cases and controls. (3) This package uses the weighted Kolmogorov-Smirnov statistic to calculate an enrichment score (ES) of a microRNA
set that co-regulate to a pathway , which reflects the degree to which a given pathway is associated with the specific phenotype. (4) This package can provide the visualization of the results.
Easily visualize and inspect microarrays for spatial artifacts.
Package for combined miRNA
- and mRNA-testing
.
We provide detailed functions for univariate Mixed Tempered Stable distribution.
This package provides functions for estimating structural equation models using instrumental variables.
This package provides tools to perform analyses and combine results from multiple-imputation datasets.
Various reliability analysis methods for rare event inference (computing failure probability and quantile from model/function outputs).
Create tile grid maps, which are like choropleth maps except each region is represented with equal visual space.
Quick and simple Tcl/Tk Graphical User Interface to call functions. Also comprises a very simple experimental GUI framework.
Generic functions to produce area/bar/box/line plots of data following IAMC (Integrated Assessment Modeling Consortium) submission format.
The minimax family of distributions is a two-parameter family like the beta family, but computationally a lot more tractible.
Implementation of parametric and semiparametric mixture cure models based on existing R packages. See details of the models in Peng and Yu (2020) <ISBN: 9780367145576>.
Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newton-type algorithm and the EM algorithm.