Fits mixed membership models with discrete multivariate data (with or without repeated measures) following the general framework of Erosheva et al (2004). This package uses a Variational EM approach by approximating the posterior distribution of latent memberships and selecting hyperparameters through a pseudo-MLE procedure. Currently supported data types are Bernoulli, multinomial and rank (Plackett-Luce). The extended GoM
model with fixed stayers from Erosheva et al (2007) is now also supported. See Airoldi et al (2014) for other examples of mixed membership models.
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.
Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors, making it difficult to ascertain a final active set without resorting to ad hoc combination rules. miselect presents Stacked Adaptive Elastic Net (saenet) and Grouped Adaptive LASSO (galasso) for continuous and binary outcomes, developed by Du et al (2022) <doi:10.1080/10618600.2022.2035739>. They, by construction, force selection of the same variables across multiply imputed data. miselect also provides cross validated variants of these methods.
Efficient procedures for fitting conditional graphical lasso models that link a set of predictor variables to a set of response variables (or tasks), even when the response data may contain missing values. missoNet
simultaneously estimates the predictor coefficients for all tasks by leveraging information from one another, in order to provide more accurate predictions in comparison to modeling them individually. Additionally, missoNet
estimates the response network structure influenced by conditioning predictor variables using a L1-regularized conditional Gaussian graphical model. Unlike most penalized multi-task regression methods (e.g., MRCE), missoNet
is capable of obtaining estimates even when the response data is corrupted by missing values. The method automatically enjoys the theoretical and computational benefits of convexity, and returns solutions that are comparable to the estimates obtained without missingness.
This package contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
This package provides a complete and dedicated analytical toolbox for quality control and diagnosis based on subject-related measurements of micro-RNA (miRNA
) expressions. The package consists of a set of functions that allow to train, optimize and use a Bayesian classifier that relies on multiplets of measured miRNA
expressions. The package also implements the quality control tools required to preprocess input datasets. In addition, the package provides a function to carry out a statistical analysis of miRNA
expressions, which can give insights to improve the classifier's performance. The method implemented in the package was first introduced in L. Ricci, V. Del Vescovo, C. Cantaloni, M. Grasso, M. Barbareschi and M. A. Denti, "Statistical analysis of a Bayesian classifier based on the expression of miRNAs
", BMC Bioinformatics 16:287, 2015 <doi:10.1186/s12859-015-0715-9>. The package is thoroughly described in M. Castelluzzo, A. Perinelli, S. Detassis, M. A. Denti and L. Ricci, "MiRNA-QC-and-Diagnosis
: An R package for diagnosis based on MiRNA
expression", SoftwareX
12:100569, 2020 <doi:10.1016/j.softx.2020.100569>. Please cite both these works if you use the package for your analysis. DISCLAIMER: The software in this package is for general research purposes only and is thus provided WITHOUT ANY WARRANTY. It is NOT intended to form the basis of clinical decisions. Please refer to the GNU General Public License 3.0 (GPLv3) for further information.
This package provides data from 6 samples across 2 groups from 450k methylation arrays.
Fitting recurrent events survival models for left-censored data with multiple imputation of the number of previous episodes. See Hernández-Herrera G, Moriña D, Navarro A. (2020) <arXiv:2007.15031>
.
Implementing various things including functions for LaTeX
tables, the Kalman filter, QQ-plots with simulation-based confidence intervals, linear regression diagnostics, web scraping, development tools, relative risk and odds rati, GARCH(1,1) Forecasting.
This package provides functions to enhance the available statistical analysis procedures in R by providing simple functions to analysis and visualize the 16S rRNA
data.Here we present a tutorial with minimum working examples to demonstrate usage and dependencies.
This package generates a Miami plot with centered chromosome labels. The output is a ggplot2 object. Users can specify which data they want plotted on top vs. bottom, whether to display significance line(s), what colors to give chromosomes, and what points to label.
This package provides a variety of association tests for microbiome data analysis including Quasi-Conditional Association Tests (QCAT) described in Tang Z.-Z. et al.(2017) <doi:10.1093/bioinformatics/btw804> and Zero-Inflated Generalized Dirichlet Multinomial (ZIGDM) tests described in Tang Z.-Z. & Chen G. (2017, submitted).
This package provides miscellaneous small tools and utilities. Many of them facilitate the work with matrices, e.g. inserting rows or columns, creating symmetric matrices, or checking for semidefiniteness. Other tools facilitate the work with regression models, e.g. extracting the standard errors, obtaining the number of (estimated) parameters, or calculating R-squared values.
Stand-alone HTTP capable R-package repository, that fully supports R's install.packages()
and available.packages()
. It also contains API endpoints for end-users to add/update packages. This package can supplement miniCRAN
', which has functions for maintaining a local (partial) copy of CRAN'. Current version is bare-minimum without any access-control or much security.
The nonparametric two-stage Bayesian adaptive design is a novel phase II clinical trial design for finding the minimum effective dose (MinED
). This design is motivated by the top priority and concern of clinicians when testing a new drug, which is to effectively treat patients and minimize the chance of exposing them to subtherapeutic or overly toxic doses. It is used to design single-agent trials.
This package provides a nature-inspired metaheuristic algorithm based on the echolocation behavior of microbats that uses frequency tuning to optimize problems in both continuous and discrete dimensions. This R package makes it easy to implement the standard bat algorithm on any user-supplied function. The algorithm was first developed by Xin-She Yang in 2010 (<DOI:10.1007/978-3-642-12538-6_6>, <DOI:10.1109/CINTI.2014.7028669>).
This package provides sampling and density functions for matrix variate normal, t, and inverted t distributions; ML estimation for matrix variate normal and t distributions using the EM algorithm, including some restrictions on the parameters; and classification by linear and quadratic discriminant analysis for matrix variate normal and t distributions described in Thompson et al. (2019) <doi:10.1080/10618600.2019.1696208>. Performs clustering with matrix variate normal and t mixture models.
This package provides tools for training, selecting, and evaluating maximum entropy (and standard logistic regression) distribution models. This package provides tools for user-controlled transformation of explanatory variables, selection of variables by nested model comparison, and flexible model evaluation and projection. It follows principles based on the maximum- likelihood interpretation of maximum entropy modeling, and uses infinitely- weighted logistic regression for model fitting. The package is described in Vollering et al. (2019; <doi:10.1002/ece3.5654>).
Fast approximate methods for mixed logistic regression in genome-wide analysis studies (GWAS). Two computationnally efficient methods are proposed for obtaining effect size estimates (beta) in Mixed Logistic Regression in GWAS: the Approximate Maximum Likelihood Estimate (AMLE), and the Offset method. The wald test obtained with AMLE is identical to the score test. Data can be genotype matrices in plink format, or dosage (VCF files). The methods are described in details in Milet et al (2020) <doi:10.1101/2020.01.17.910109>.
Vitamin and mineral deficiencies continue to be a significant public health problem. This is particularly critical in developing countries where deficiencies to vitamin A, iron, iodine, and other micronutrients lead to adverse health consequences. Cross-sectional surveys are helpful in answering questions related to the magnitude and distribution of deficiencies of selected vitamins and minerals. This package provides tools for calculating and determining select vitamin and mineral deficiencies based on World Health Organization (WHO) guidelines found at <https://www.who.int/teams/nutrition-and-food-safety/databases/vitamin-and-mineral-nutrition-information-system>.
This package provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar's data) in R'. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. midfieldr interacts with practice data provided in midfielddata', an R data package available at <https://midfieldr.github.io/midfielddata/>. midfieldr also interacts with the full MIDFIELD database for users who have access. This work is supported by the US National Science Foundation through grant numbers 1545667 and 2142087.
Kernel-based methods are powerful methods for integrating heterogeneous types of data. mixKernel
aims at providing methods to combine kernel for unsupervised exploratory analysis. Different solutions are provided to compute a meta-kernel, in a consensus way or in a way that best preserves the original topology of the data. mixKernel
also integrates kernel PCA to visualize similarities between samples in a non linear space and from the multiple source point of view <doi:10.1093/bioinformatics/btx682>. A method to select (as well as funtions to display) important variables is also provided <doi:10.1093/nargab/lqac014>.
Statistical Analyses and Pooling after Multiple Imputation. A large variety of repeated statistical analysis can be performed and finally pooled. Statistical analysis that are available are, among others, Levene's test, Odds and Risk Ratios, One sample proportions, difference between proportions and linear and logistic regression models. Functions can also be used in combination with the Pipe operator. More and more statistical analyses and pooling functions will be added over time. Heymans (2007) <doi:10.1186/1471-2288-7-33>. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>. Sidi (2021) <doi:10.1080/00031305.2021.1898468>. Lott (2018) <doi:10.1080/00031305.2018.1473796>. Grund (2021) <doi:10.31234/osf.io/d459g>.
This package contains a suite of functions for health economic evaluations with missing outcome data. The package can fit different types of statistical models under a fully Bayesian approach using the software JAGS (which should be installed locally and which is loaded in missingHE
via the R package R2jags'). Three classes of models can be fitted under a variety of missing data assumptions: selection models, pattern mixture models and hurdle models. In addition to model fitting, missingHE
provides a set of specialised functions to assess model convergence and fit, and to summarise the statistical and economic results using different types of measures and graphs. The methods implemented are described in Mason (2018) <doi:10.1002/hec.3793>, Molenberghs (2000) <doi:10.1007/978-1-4419-0300-6_18> and Gabrio (2019) <doi:10.1002/sim.8045>.