This package provides a series of functions to implement association of covariance for detecting differential co-expression (ACDC), a novel approach for detection of differential co-expression that simultaneously accommodates multiple phenotypes or exposures with binary, ordinal, or continuous data types. Users can use the default method which identifies modules by Partition or may supply their own modules. Also included are functions to choose an information loss criterion (ILC) for Partition using OmicS-data-based
Complex trait Analysis (OSCA) and Genome-wide Complex trait Analysis (GCTA). The manuscript describing these methods is as follows: Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. "ACDC: a general approach for detecting phenotype or exposure associated co-expression" (2023) <doi:10.3389/fmed.2023.1118824>.
Perform sensitivity analysis in structural equation modeling using meta-heuristic optimization methods (e.g., ant colony optimization and others). The references for the proposed methods are: (1) Leite, W., & Shen, Z., Marcoulides, K., Fish, C., & Harring, J. (2022). <doi:10.1080/10705511.2021.1881786> (2) Harring, J. R., McNeish
, D. M., & Hancock, G. R. (2017) <doi:10.1080/10705511.2018.1506925>; (3) Fisk, C., Harring, J., Shen, Z., Leite, W., Suen, K., & Marcoulides, K. (2022). <doi:10.1177/00131644211073121>; (4) Socha, K., & Dorigo, M. (2008) <doi:10.1016/j.ejor.2006.06.046>. We also thank Dr. Krzysztof Socha for sharing his research on ant colony optimization algorithm with continuous domains and associated R code, which provided the base for the development of this package.
Data analysis package for estimating potential biological effects from chemical concentrations in environmental samples. Included are a set of functions to analyze, visualize, and organize measured concentration data as it relates to user-selected chemical-biological interaction benchmark data such as water quality criteria. The intent of these analyses is to develop a better understanding of the potential biological relevance of environmental chemistry data. Results can be used to prioritize which chemicals at which sites may be of greatest concern. These methods are meant to be used as a screening technique to predict potential for biological influence from chemicals that ultimately need to be validated with direct biological assays. A description of the analysis can be found in Blackwell (2017) <doi:10.1021/acs.est.7b01613>.
The t-Digest construction algorithm, by Dunning et al., (2019) <doi:10.48550/arXiv.1902.04023>
, uses a variant of 1-dimensional k-means clustering to produce a very compact data structure that allows accurate estimation of quantiles. This t-Digest data structure can be used to estimate quantiles, compute other rank statistics or even to estimate related measures like trimmed means. The advantage of the t-Digest over previous digests for this purpose is that the t-Digest handles data with full floating point resolution. The accuracy of quantile estimates produced by t-Digests can be orders of magnitude more accurate than those produced by previous digest algorithms. Methods are provided to create and update t-Digests and retrieve quantiles from the accumulated distributions.
An automated graphical exploratory data analysis (EDA) tool that introduces: a.) wideplot graphics for exploring the structure of a dataset through a grid of variables and graphic types. b.) longplot graphics, which present the entire catalog of available graphics for representing a particular variable using a grid of graphic types and variations on these types. c.) plotup function, which presents a particular graphic for a specific variable of a dataset. The plotup()
function also makes it possible to obtain the code used to generate the graphic, meaning that the user can adjust its properties as needed. d.) matrixplot graphics that is a grid of a particular graphic showing bivariate relationships between all pairs of variables of a certain(s) type(s) in a multivariate data set.
There are 6 novel robust tests for equal correlation. They are all based on logistic regressions. The score statistic U is proportion to difference of two correlations based on different types of correlation in 6 methods. The ST1()
is based on Pearson correlation. ST2()
improved ST1()
by using median absolute deviation. ST3()
utilized type M correlation and ST4()
used Spearman correlation. ST5()
and ST6()
used two different ways to combine ST3()
and ST4()
. We highly recommend ST5()
according to the article titled New Statistical Methods for Constructing Robust Differential Correlation Networks to characterize the interactions among microRNAs
published in Scientific Reports. Please see the reference: Yu et al. (2019) <doi:10.1038/s41598-019-40167-8>.
Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>), (Economics Survey Standard 2024, <https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2>) and by Economic Commission for Latin America and Caribbean (2020, <https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf>), (2024, <https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content>).
Works as an "add-on" to packages like shiny', future', as well as rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, dipsaus for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize shiny inputs without freezing the app, or how to get memory size on Linux or MacOS
system. The enhancements roughly fall into these four categories: 1. shiny input widgets; 2. high-performance computing using the future package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.
For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models.Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
An implementation of estimating the Latent Unknown Clusters By Integrating Multi-omics Data (LUCID) model (Peng (2019) <doi:10.1093/bioinformatics/btz667>). LUCID conducts integrated clustering using exposures, omics information (and outcome information as an option). This package implements three different integration strategies for multi-omics data analysis within the LUCID framework: LUCID early integration (the original LUCID model), LUCID in parallel (intermediate integration), and LUCID in serial (late integration). Automated model selection for each LUCID model is available to obtain the optimal number of latent clusters, and an integrated imputation approach is implemented to handle sporadic and list-wise missingness in multi-omics data. Lasso-type regularity for exposure and omics features were added. S3 methods for summary and plotting functions were fixed. Fixed minor bugs.
Implementations of stochastic, limited-memory quasi-Newton optimizers, similar in spirit to the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) algorithm, for smooth stochastic optimization. Implements the following methods: oLBFGS
(online LBFGS) (Schraudolph, N.N., Yu, J. and Guenter, S., 2007 <http://proceedings.mlr.press/v2/schraudolph07a.html>), SQN (stochastic quasi-Newton) (Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016 <arXiv:1401.7020>
), adaQN
(adaptive quasi-Newton) (Keskar, N.S., Berahas, A.S., 2016, <arXiv:1511.01169>
). Provides functions for easily creating R objects with partial_fit/predict methods from some given objective/gradient/predict functions. Includes an example stochastic logistic regression using these optimizers. Provides header files and registered C routines for using it directly from C/C++.
Is a collection of models to analyze genome scale codon data using a Bayesian framework. Provides visualization routines and checkpointing for model fittings. Currently published models to analyze gene data for selection on codon usage based on Ribosome Overhead Cost (ROC) are: ROC (Gilchrist et al. (2015) <doi:10.1093/gbe/evv087>), and ROC with phi (Wallace & Drummond (2013) <doi:10.1093/molbev/mst051>). In addition AnaCoDa
contains three currently unpublished models. The FONSE (First order approximation On NonSense
Error) model analyzes gene data for selection on codon usage against of nonsense error rates. The PA (PAusing time) and PANSE (PAusing time + NonSense
Error) models use ribosome footprinting data to analyze estimate ribosome pausing times with and without nonsense error rate from ribosome footprinting data.
Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.
Computes 138 standard climate indices at monthly, seasonal and annual resolution. These indices were selected, based on their direct and significant impacts on target sectors, after a thorough review of the literature in the field of extreme weather events and natural hazards. Overall, the selected indices characterize different aspects of the frequency, intensity and duration of extreme events, and are derived from a broad set of climatic variables, including surface air temperature, precipitation, relative humidity, wind speed, cloudiness, solar radiation, and snow cover. The 138 indices have been classified as follow: Temperature based indices (42), Precipitation based indices (22), Bioclimatic indices (21), Wind-based indices (5), Aridity/ continentality indices (10), Snow-based indices (13), Cloud/radiation based indices (6), Drought indices (8), Fire indices (5), Tourism indices (5).
This package provides a generic, easy-to-use and intuitive pharmacokinetic/pharmacodynamic (PK/PD) simulation platform based on R packages rxode2 and mrgsolve'. CAMPSIS provides an abstraction layer over the underlying processes of writing a PK/PD model, assembling a custom dataset and running a simulation. CAMPSIS has a strong dependency to the R package campsismod', which allows to read/write a model from/to files and adapt it further on the fly in the R environment. Package campsis allows the user to assemble a dataset in an intuitive manner. Once the userĂ¢ s dataset is ready, the package is in charge of preparing the simulation, calling rxode2 or mrgsolve (at the user's choice) and returning the results, for the given model, dataset and desired simulation settings.
Estimation of multivariate normal (MVN) and student-t data of arbitrary dimension where the pattern of missing data is monotone. See Pantaleo and Gramacy (2010) <doi:10.48550/arXiv.0907.2135>
. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.
Implementation of the Phoenix and Phoenix-8 Sepsis Criteria as described in "Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock" by Sanchez-Pinto, Bennett, DeWitt
, Russell et al. (2024) <doi:10.1001/jama.2024.0196> (Drs. Sanchez-Pinto and Bennett contributed equally to this manuscript; Dr. DeWitt
and Mr. Russell contributed equally to the manuscript), "International Consensus Criteria for Pediatric Sepsis and Septic Shock" by Schlapbach, Watson, Sorce, Argent, et al. (2024) <doi:10.1001/jama.2024.0179> (Drs Schlapbach, Watson, Sorce, and Argent contributed equally) and the application note "phoenix: an R package and Python module for calculating the Phoenix pediatric sepsis score and criteria" by DeWitt
, Russell, Rebull, Sanchez-Pinto, and Bennett (2024) <doi:10.1093/jamiaopen/ooae066>.
Estimation of dark diversity and site-specific species pools using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.
Simulation, estimation and inference for univariate and multivariate TV(s)-GARCH(p,q,r)-X models, where s indicates the number and shape of the transition functions, p is the ARCH order, q is the GARCH order, r is the asymmetry order, and X indicates that covariates can be included; see Campos-Martins and Sucarrat (2024) <doi:10.18637/jss.v108.i09>. In the multivariate case, variances are estimated equation by equation and dynamic conditional correlations are allowed. The TV long-term component of the variance as in the multiplicative TV-GARCH model of Amado and Terasvirta (2013) <doi:10.1016/j.jeconom.2013.03.006> introduces non-stationarity whereas the GARCH-X short-term component describes conditional heteroscedasticity. Maximisation by parts leads to consistent and asymptotically normal estimates.
Additional options for making graphics in the context of analyzing high-throughput data are available here. This includes automatic segmenting of the current device (eg window) to accommodate multiple new plots, automatic checking for optimal location of legends in plots, small histograms to insert as legends, histograms re-transforming axis labels to linear when plotting log2-transformed data, a violin-plot <doi:10.1080/00031305.1998.10480559> function for a wide variety of input-formats, principal components analysis (PCA) <doi:10.1080/14786440109462720> with bag-plots <doi:10.1080/00031305.1999.10474494> to highlight and compare the center areas for groups of samples, generic MA-plots (differential- versus average-value plots) <doi:10.1093/nar/30.4.e15>, staggered count plots and generation of mouse-over interactive html pages.
This package provides methods for fitting the Mixture of Factor Analyzers (MFA) model automatically. The MFA model is a mixture model where each sub-population is assumed to follow the Factor Analysis model. The Factor Analysis (FA) model is a latent variable model which assumes that observations are normally distributed, but imposes constraints on their covariance matrix. The MFA model contains two hyperparameters; g (the number of components in the mixture) and q (the number of factors in each component Factor Analysis model). Usually, the Expectation-Maximisation algorithm would be used to fit the MFA model, but this requires g and q to be known. This package treats g and q as unknowns and provides several methods which infer these values with as little input from the user as possible.
Implementation of the Mode Jumping Markov Chain Monte Carlo algorithm from Hubin, A., Storvik, G. (2018) <doi:10.1016/j.csda.2018.05.020>, Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Storvik, G., & Frommlet, F. (2020) <doi:10.1214/18-BA1141>, Hubin, A., Storvik, G., & Frommlet, F. (2021) <doi:10.1613/jair.1.13047>, and Hubin, A., Heinze, G., & De Bin, R. (2023) <doi:10.3390/fractalfract7090641>, and Reversible Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Frommlet, F., & Storvik, G. (2021) <doi:10.48550/arXiv.2110.05316>
, which allow for estimating posterior model probabilities and Bayesian model averaging across a wide set of Bayesian models including linear, generalized linear, generalized linear mixed, generalized nonlinear, generalized nonlinear mixed, and logic regression models.
Preregistrations, or more generally, registrations, enable explicit timestamped and (often but not necessarily publicly) frozen documentation of plans and expectations as well as decisions and justifications. In research, preregistrations are commonly used to clearly document plans and facilitate justifications of deviations from those plans, as well as decreasing the effects of publication bias by enabling identification of research that was conducted but not published. Like reporting guidelines, (pre)registration forms often have specific structures that facilitate systematic reporting of important items. The preregr package facilitates specifying (pre)registrations in R and exporting them to a human-readable format (using R Markdown partials or exporting to an HTML file) as well as human-readable embedded data (using JSON'), as well as importing such exported (pre)registration specifications from such embedded JSON'.
This package provides tools to simulate alphanumeric alleles, impute genetic missing data and reconstruct non-recombinant haplotypes from pedigree databases in a deterministic way. Allelic simulations can be implemented taking into account many factors (such as number of families, markers, alleles per marker, probability and proportion of missing genotypes, recombination rate, etc). Genotype imputation can be used with simulated datasets or real databases (previously loaded in .ped format). Haplotype reconstruction can be carried out even with missing data, since the program firstly imputes each family genotype (without a reference panel), to later reconstruct the corresponding haplotypes for each family member. All this considering that each individual (due to meiosis) should unequivocally have two alleles per marker (one inherited from each parent) and thus imputation and reconstruction results can be deterministically calculated.