Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.
This package provides a generic, easy-to-use and intuitive pharmacokinetic/pharmacodynamic (PK/PD) simulation platform based on R packages rxode2 and mrgsolve'. CAMPSIS provides an abstraction layer over the underlying processes of writing a PK/PD model, assembling a custom dataset and running a simulation. CAMPSIS has a strong dependency to the R package campsismod', which allows to read/write a model from/to files and adapt it further on the fly in the R environment. Package campsis allows the user to assemble a dataset in an intuitive manner. Once the userâ s dataset is ready, the package is in charge of preparing the simulation, calling rxode2 or mrgsolve (at the user's choice) and returning the results, for the given model, dataset and desired simulation settings.
Computes 138 standard climate indices at monthly, seasonal and annual resolution. These indices were selected, based on their direct and significant impacts on target sectors, after a thorough review of the literature in the field of extreme weather events and natural hazards. Overall, the selected indices characterize different aspects of the frequency, intensity and duration of extreme events, and are derived from a broad set of climatic variables, including surface air temperature, precipitation, relative humidity, wind speed, cloudiness, solar radiation, and snow cover. The 138 indices have been classified as follow: Temperature based indices (42), Precipitation based indices (22), Bioclimatic indices (21), Wind-based indices (5), Aridity/ continentality indices (10), Snow-based indices (13), Cloud/radiation based indices (6), Drought indices (8), Fire indices (5), Tourism indices (5).
Estimation of multivariate normal (MVN) and student-t data of arbitrary dimension where the pattern of missing data is monotone. See Pantaleo and Gramacy (2010) <doi:10.48550/arXiv.0907.2135>
. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.
Implementation of the Phoenix and Phoenix-8 Sepsis Criteria as described in "Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock" by Sanchez-Pinto, Bennett, DeWitt
, Russell et al. (2024) <doi:10.1001/jama.2024.0196> (Drs. Sanchez-Pinto and Bennett contributed equally to this manuscript; Dr. DeWitt
and Mr. Russell contributed equally to the manuscript), "International Consensus Criteria for Pediatric Sepsis and Septic Shock" by Schlapbach, Watson, Sorce, Argent, et al. (2024) <doi:10.1001/jama.2024.0179> (Drs Schlapbach, Watson, Sorce, and Argent contributed equally) and the application note "phoenix: an R package and Python module for calculating the Phoenix pediatric sepsis score and criteria" by DeWitt
, Russell, Rebull, Sanchez-Pinto, and Bennett (2024) <doi:10.1093/jamiaopen/ooae066>.
Estimation of dark diversity and site-specific species pools using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.
Simulation, estimation and inference for univariate and multivariate TV(s)-GARCH(p,q,r)-X models, where s indicates the number and shape of the transition functions, p is the ARCH order, q is the GARCH order, r is the asymmetry order, and X indicates that covariates can be included; see Campos-Martins and Sucarrat (2024) <doi:10.18637/jss.v108.i09>. In the multivariate case, variances are estimated equation by equation and dynamic conditional correlations are allowed. The TV long-term component of the variance as in the multiplicative TV-GARCH model of Amado and Terasvirta (2013) <doi:10.1016/j.jeconom.2013.03.006> introduces non-stationarity whereas the GARCH-X short-term component describes conditional heteroscedasticity. Maximisation by parts leads to consistent and asymptotically normal estimates.
Additional options for making graphics in the context of analyzing high-throughput data are available here. This includes automatic segmenting of the current device (eg window) to accommodate multiple new plots, automatic checking for optimal location of legends in plots, small histograms to insert as legends, histograms re-transforming axis labels to linear when plotting log2-transformed data, a violin-plot <doi:10.1080/00031305.1998.10480559> function for a wide variety of input-formats, principal components analysis (PCA) <doi:10.1080/14786440109462720> with bag-plots <doi:10.1080/00031305.1999.10474494> to highlight and compare the center areas for groups of samples, generic MA-plots (differential- versus average-value plots) <doi:10.1093/nar/30.4.e15>, staggered count plots and generation of mouse-over interactive html pages.
This package provides methods for fitting the Mixture of Factor Analyzers (MFA) model automatically. The MFA model is a mixture model where each sub-population is assumed to follow the Factor Analysis model. The Factor Analysis (FA) model is a latent variable model which assumes that observations are normally distributed, but imposes constraints on their covariance matrix. The MFA model contains two hyperparameters; g (the number of components in the mixture) and q (the number of factors in each component Factor Analysis model). Usually, the Expectation-Maximisation algorithm would be used to fit the MFA model, but this requires g and q to be known. This package treats g and q as unknowns and provides several methods which infer these values with as little input from the user as possible.
Implementation of the Mode Jumping Markov Chain Monte Carlo algorithm from Hubin, A., Storvik, G. (2018) <doi:10.1016/j.csda.2018.05.020>, Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Storvik, G., & Frommlet, F. (2020) <doi:10.1214/18-BA1141>, Hubin, A., Storvik, G., & Frommlet, F. (2021) <doi:10.1613/jair.1.13047>, and Hubin, A., Heinze, G., & De Bin, R. (2023) <doi:10.3390/fractalfract7090641>, and Reversible Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Frommlet, F., & Storvik, G. (2021) <doi:10.48550/arXiv.2110.05316>
, which allow for estimating posterior model probabilities and Bayesian model averaging across a wide set of Bayesian models including linear, generalized linear, generalized linear mixed, generalized nonlinear, generalized nonlinear mixed, and logic regression models.
Preregistrations, or more generally, registrations, enable explicit timestamped and (often but not necessarily publicly) frozen documentation of plans and expectations as well as decisions and justifications. In research, preregistrations are commonly used to clearly document plans and facilitate justifications of deviations from those plans, as well as decreasing the effects of publication bias by enabling identification of research that was conducted but not published. Like reporting guidelines, (pre)registration forms often have specific structures that facilitate systematic reporting of important items. The preregr package facilitates specifying (pre)registrations in R and exporting them to a human-readable format (using R Markdown partials or exporting to an HTML file) as well as human-readable embedded data (using JSON'), as well as importing such exported (pre)registration specifications from such embedded JSON'.
This package provides tools to simulate alphanumeric alleles, impute genetic missing data and reconstruct non-recombinant haplotypes from pedigree databases in a deterministic way. Allelic simulations can be implemented taking into account many factors (such as number of families, markers, alleles per marker, probability and proportion of missing genotypes, recombination rate, etc). Genotype imputation can be used with simulated datasets or real databases (previously loaded in .ped format). Haplotype reconstruction can be carried out even with missing data, since the program firstly imputes each family genotype (without a reference panel), to later reconstruct the corresponding haplotypes for each family member. All this considering that each individual (due to meiosis) should unequivocally have two alleles per marker (one inherited from each parent) and thus imputation and reconstruction results can be deterministically calculated.
It is very common nowadays for a study to collect multiple features and appropriately integrating multiple longitudinal features simultaneously for defining individual clusters becomes increasingly crucial to understanding population heterogeneity and predicting future outcomes. BCClong implements a Bayesian consensus clustering (BCC) model for multiple longitudinal features via a generalized linear mixed model. Compared to existing packages, several key features make the BCClong package appealing: (a) it allows simultaneous clustering of mixed-type (e.g., continuous, discrete and categorical) longitudinal features, (b) it allows each longitudinal feature to be collected from different sources with measurements taken at distinct sets of time points (known as irregularly sampled longitudinal data), (c) it relaxes the assumption that all features have the same clustering structure by estimating the feature-specific (local) clusterings and consensus (global) clustering.
Standard generalized additive models assume a response function, which induces an assumption on the shape of the distribution of the response. However, miss-specifying the response function results in biased estimates. Therefore in Spiegel et al. (2017) <doi:10.1007/s11222-017-9799-6> we propose to estimate the response function jointly with the covariate effects. This package provides the underlying functions to estimate these generalized additive models with flexible response functions. The estimation is based on an iterative algorithm. In the outer loop the response function is estimated, while in the inner loop the covariate effects are determined. For the response function a strictly monotone P-spline is used while the covariate effects are estimated based on a modified Fisher-Scoring algorithm. Overall the estimation relies on the mgcv'-package.
Several yield stability analyses are mentioned in this package: variation and regression based yield stability analyses. Resampling techniques are integrated with these stability analyses. The function stab.mean()
provides the genotypic means and ranks including their corresponding confidence intervals. The function stab.var()
provides the genotypic variances over environments including their corresponding confidence intervals. The function stab.fw()
is an extended method from the Finlay-Wilkinson method (1963). This method can include several other factors that might impact yield stability. Resampling technique is integrated into this method. A few missing data points or unbalanced data are allowed too. The function stab.fw.check()
is an extended method from the Finlay-Wilkinson method (1963). The yield stability is evaluated via common check line(s). Resampling technique is integrated.
The meta-analysis is performed to increase the statistical power by integrating the results from several experiments. The p-values are often combined in meta-analysis when the effect sizes are not available. The metapro R package provides not only traditional methods (Becker BJ (1994, ISBN:0-87154-226-9), Mosteller, F. & Bush, R.R. (1954, ISBN:0201048523) and Lancaster HO (1949, ISSN:00063444)), but also new method named weighted Fisherâ s method we developed. While the (weighted) Z-method is suitable for finding features effective in most experiments, (weighted) Fisherâ s method is useful for detecting partially associated features. Thus, the users can choose the function based on their purpose. Yoon et al. (2021) "Powerful p-value combination methods to detect incomplete association" <doi:10.1038/s41598-021-86465-y>.
Logistic-normal Multinomial (LNM) models are common in problems with multivariate count data. This package gives a simple implementation with a 30 line Stan script. This lightweight implementation makes it an easy starting point for other projects, in particular for downstream tasks that require analysis of "compositional" data. It can be applied whenever a multinomial probability parameter is thought to depend linearly on inputs in a transformed, log ratio space. Additional utilities make it easy to inspect, create predictions, and draw samples using the fitted models. More about the LNM can be found in Xia et al. (2013) "A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis" <doi:10.1111/biom.12079> and Sankaran and Holmes (2023) "Generative Models: An Interdisciplinary Perspective" <doi:10.1146/annurev-statistics-033121-110134>.
Calculates and depicts probabilistic long-term effects in binary models with temporal dependence variables. The package performs two tasks. First, it calculates the change in the probability of the event occurring given a change in a theoretical variable. Second, it calculates the rolling difference in the future probability of the event for two scenarios: one where the event occurred at a given time and one where the event does not occur. The package is consistent with the recent movement to depict meaningful and easy-to-interpret quantities of interest with the requisite measures of uncertainty. It is the first to make it easy for researchers to interpret short- and long-term effects of explanatory variables in binary autoregressive models, which can have important implications for the correct interpretation of these models.
This package provides a toolbox of fast, native and parallel implementations of various information-based importance criteria estimators and feature selection filters based on them, inspired by the overview by Brown, Pocock, Zhao and Lujan (2012) <https://www.jmlr.org/papers/v13/brown12a.html>. Contains, among other, minimum redundancy maximal relevancy ('mRMR
') method by Peng, Long and Ding (2005) <doi:10.1109/TPAMI.2005.159>; joint mutual information ('JMI') method by Yang and Moody (1999) <https://papers.nips.cc/paper/1779-data-visualization-and-feature-selection-new-algorithms-for-nongaussian-data>; double input symmetrical relevance ('DISR') method by Meyer and Bontempi (2006) <doi:10.1007/11732242_9> as well as joint mutual information maximisation ('JMIM') method by Bennasar, Hicks and Setchi (2015) <doi:10.1016/j.eswa.2015.07.007>.
cn.mops (Copy Number estimation by a Mixture Of PoissonS
) is a data processing pipeline for copy number variations and aberrations (CNVs and CNAs) from next generation sequencing (NGS) data. The package supplies functions to convert BAM files into read count matrices or genomic ranges objects, which are the input objects for cn.mops. cn.mops models the depths of coverage across samples at each genomic position. Therefore, it does not suffer from read count biases along chromosomes. Using a Bayesian approach, cn.mops decomposes read variations across samples into integer copy numbers and noise by its mixture components and Poisson distributions, respectively. cn.mops guarantees a low FDR because wrong detections are indicated by high noise and filtered out. cn.mops is very fast and written in C++.
In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities). This algorithm is a part of an article titled "Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings" (2018) by Minsuk Shin, Anirban Bhattacharya, and Valen E. Johnson and "Nonlocal Functional Priors for Nonparametric Hypothesis Testing and High-dimensional Model Selection" (2020+) by Minsuk Shin and Anirban Bhattacharya.
This package provides a scalable implementation of the highly adaptive lasso algorithm, including routines for constructing sparse matrices of basis functions of the observed data, as well as a custom implementation of Lasso regression tailored to enhance efficiency when the matrix of predictors is composed exclusively of indicator functions. For ease of use and increased flexibility, the Lasso fitting routines invoke code from the glmnet package by default. The highly adaptive lasso was first formulated and described by MJ van der Laan (2017) <doi:10.1515/ijb-2015-0097>, with practical demonstrations of its performance given by Benkeser and van der Laan (2016) <doi:10.1109/DSAA.2016.93>. This implementation of the highly adaptive lasso algorithm was described by Hejazi, Coyle, and van der Laan (2020) <doi:10.21105/joss.02526>.
This package provides analytic derivatives and information matrices for fitted linear mixed effects (lme) models and generalized least squares (gls) models estimated using lme()
(from package nlme') and gls()
(from package nlme'), respectively. The package includes functions for estimating the sampling variance-covariance of variance component parameters using the inverse Fisher information. The variance components include the parameters of the random effects structure (for lme models), the variance structure, and the correlation structure. The expected and average forms of the Fisher information matrix are used in the calculations, and models estimated by full maximum likelihood or restricted maximum likelihood are supported. The package also includes a function for estimating standardized mean difference effect sizes (Pustejovsky, Hedges, and Shadish (2014) <DOI:10.3102/1076998614547577>) based on fitted lme or gls models.
Researchers often have expectations about the relations between means of different groups or standardized regression coefficients; using informative hypothesis testing to incorporate these expectations into the analysis through order constraints increases statistical power Vanbrabant and Rosseel (2020) <doi:10.4324/9780429273872-14>. Another valuable tool, the Bayes factor, can evaluate evidence for multiple hypotheses without concerns about multiple testing, and can be used in Bayesian updating Hoijtink, Mulder, van Lissa & Gu (2019) <doi:10.1037/met0000201>. The bain R package enables informative hypothesis testing using the Bayes factor. The mmibain package provides shiny web applications based on bain'. The RepliCrisis()
function launches a shiny card game to simulate the evaluation of replication studies while the mmibain()
function launches a shiny application to fit Bayesian informative hypotheses evaluation models from bain'.