Multivariate Information-based Inductive Causation, better known by its acronym MIIC, is a causal discovery method, based on information theory principles, which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The recent more interpretable MIIC extension (iMIIC
) further distinguishes genuine causes from putative and latent causal effects, while scaling to very large datasets (hundreds of thousands of samples). Since the version 2.0, MIIC also includes a temporal mode (tMIIC
) to learn temporal causal graphs from stationary time series data. MIIC has been applied to a wide range of biological and biomedical data, such as single cell gene expression data, genomic alterations in tumors, live-cell time-lapse imaging data (CausalXtract
), as well as medical records of patients. MIIC brings unique insights based on causal interpretation and could be used in a broad range of other data science domains (technology, climatology, economy, ...). For more information, you can refer to: Simon et al., eLife
2024, <doi:10.1101/2024.02.06.579177>, Ribeiro-Dantas et al., iScience
2024, <doi:10.1016/j.isci.2024.109736>, Cabeli et al., NeurIPS
2021, <https://why21.causalai.net/papers/WHY21_24.pdf>, Cabeli et al., Comput. Biol. 2020, <doi:10.1371/journal.pcbi.1007866>, Li et al., NeurIPS
2019, <https://papers.nips.cc/paper/9573-constraint-based-causal-structure-learning-with-consistent-separating-sets>, Verny et al., PLoS
Comput. Biol. 2017, <doi:10.1371/journal.pcbi.1005662>, Affeldt et al., UAI 2015, <https://auai.org/uai2015/proceedings/papers/293.pdf>. Changes from the previous 1.5.3 release on CRAN are available at <https://github.com/miicTeam/miic_R_package/blob/master/NEWS.md>
.
This package provides tools to analyze and visualize Illumina Infinium methylation arrays.
This package implements various algorithms for inferring mutual information networks from data.
Various functions for random number generation, density estimation, classification, curve fitting, and spatial data analysis.
This package is intended to help users to efficiently analyze genomic data resulting from various experiments.
This package provides a flexible computational framework for mixture distributions with the focus on the composite models.
Generates multivariate imputations using sequential regression with L2 penalty. For more details see Zahid and Heumann (2018) <doi:10.1177/0962280218755574>.
This GUI for the mi package walks the user through the steps of multiple imputation and the analysis of completed data.
This package provides a derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell.
This is a port of the type guesser from the readr
package, the so-called readr first edition parsing engine, now superseded by vroom.
Implementation of methods for minimizing ill-conditioned problems. Currently only includes regularized (quasi-)newton optimization (Kanzow and Steck et al. (2023), <doi:10.1007/s12532-023-00238-4>).
This package provides a set of classes and methods to set up and run multi-species, trait based and community size spectrum ecological models, focused on the marine environment.
Milo performs single-cell differential abundance testing. Cell states are modelled as representative neighbourhoods on a nearest neighbour graph. Hypothesis testing is performed using a negative bionomial generalized linear model.
This package contains functions for converting existing HTML/JavaScript
source into equivalent shiny functions. Bootstraps the process of making new shiny functions by allowing us to turn HTML snippets directly into R functions.
Imputes missing values of an incomplete data matrix by minimizing the Mahalanobis distance of each sample from the overall mean [Labita, GJ.D. and Tubo, B.F. (2024) <doi:10.24412/1932-2321-2024-278-115-123>].
This package provides tools for multiple imputation of missing data in multilevel modeling. It includes a user-friendly interface to the packages pan and jomo, and several functions for visualization, data management and the analysis of multiply imputed data sets.
It offers random-forest-based functions to impute clustered incomplete data. The package is tailored for but not limited to imputing multitissue expression data, in which a gene's expression is measured on the collected tissues of an individual but missing on the uncollected tissues.
Model time series using mixture autoregressive (MAR) models. Implemented are frequentist (EM) and Bayesian methods for estimation, prediction and model evaluation. See Wong and Li (2002) <doi:10.1111/1467-9868.00222>, Boshnakov (2009) <doi:10.1016/j.spl.2009.04.009>), and the extensive references in the documentation.
An implementation of the iterative proportional fitting (IPFP), maximum likelihood, minimum chi-square and weighted least squares procedures for updating a N-dimensional array with respect to given target marginal distributions (which, in turn can be multidimensional). The package also provides an application of the IPFP to simulate multivariate Bernoulli distributions.
The main functions perform mixed models analysis by least squares or REML by adding the function r()
to formulas of lm()
and glm()
. A collection of text-book statistics for higher education is also included, e.g. modifications of the functions lm()
, glm()
and associated summaries from the package stats'.
Multiple imputation using XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2023) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.
Mica is a server application used to create data web portals for large-scale epidemiological studies or multiple-study consortia. Mica helps studies to provide scientifically robust data visibility and web presence without significant information technology effort. Mica provides a structured description of consortia, studies, annotated and searchable data dictionaries, and data access request management. This Mica client allows to perform data extraction for reporting purposes.
Extract, transform and load MITRE standards. This package gives you an approach to cybersecurity data sets. All data sets are build on runtime downloading raw data from MITRE public services. MITRE <https://www.mitre.org/> is a government-funded research organization based in Bedford and McLean
. Current version includes most used standards as data frames. It also provide a list of nodes and edges with all relationships.
This package contains functions for data analysis of Repeated measurement using GEE. Data may contain missing value in response and covariates. For parameter estimation through Fisher Scoring algorithm, Mean Score and Inverse Probability Weighted method combining with Multiple Imputation are used when there is missing value in covariates/response. Reference for mean score method, inverse probability weighted method is Wang et al(2007)<doi:10.1093/biostatistics/kxl024>.