Perform sensitivity analysis in structural equation modeling using meta-heuristic optimization methods (e.g., ant colony optimization and others). The references for the proposed methods are: (1) Leite, W., & Shen, Z., Marcoulides, K., Fish, C., & Harring, J. (2022). <doi:10.1080/10705511.2021.1881786> (2) Harring, J. R., McNeish, D. M., & Hancock, G. R. (2017) <doi:10.1080/10705511.2018.1506925>; (3) Fisk, C., Harring, J., Shen, Z., Leite, W., Suen, K., & Marcoulides, K. (2022). <doi:10.1177/00131644211073121>; (4) Socha, K., & Dorigo, M. (2008) <doi:10.1016/j.ejor.2006.06.046>. We also thank Dr. Krzysztof Socha for sharing his research on ant colony optimization algorithm with continuous domains and associated R code, which provided the base for the development of this package.
Data analysis package for estimating potential biological effects from chemical concentrations in environmental samples. Included are a set of functions to analyze, visualize, and organize measured concentration data as it relates to user-selected chemical-biological interaction benchmark data such as water quality criteria. The intent of these analyses is to develop a better understanding of the potential biological relevance of environmental chemistry data. Results can be used to prioritize which chemicals at which sites may be of greatest concern. These methods are meant to be used as a screening technique to predict potential for biological influence from chemicals that ultimately need to be validated with direct biological assays. A description of the analysis can be found in Blackwell (2017) <doi:10.1021/acs.est.7b01613>.
An automated graphical exploratory data analysis (EDA) tool that introduces: a.) wideplot graphics for exploring the structure of a dataset through a grid of variables and graphic types. b.) longplot graphics, which present the entire catalog of available graphics for representing a particular variable using a grid of graphic types and variations on these types. c.) plotup function, which presents a particular graphic for a specific variable of a dataset. The plotup() function also makes it possible to obtain the code used to generate the graphic, meaning that the user can adjust its properties as needed. d.) matrixplot graphics that is a grid of a particular graphic showing bivariate relationships between all pairs of variables of a certain(s) type(s) in a multivariate data set.
Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>), (Economics Survey Standard 2024, <https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2>) and by Economic Commission for Latin America and Caribbean (2020, <https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf>), (2024, <https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content>).
There are 6 novel robust tests for equal correlation. They are all based on logistic regressions. The score statistic U is proportion to difference of two correlations based on different types of correlation in 6 methods. The ST1() is based on Pearson correlation. ST2() improved ST1() by using median absolute deviation. ST3() utilized type M correlation and ST4() used Spearman correlation. ST5() and ST6() used two different ways to combine ST3() and ST4(). We highly recommend ST5() according to the article titled New Statistical Methods for Constructing Robust Differential Correlation Networks to characterize the interactions among microRNAs published in Scientific Reports. Please see the reference: Yu et al. (2019) <doi:10.1038/s41598-019-40167-8>.
This package implements a wide range of dose escalation designs. The focus is on model-based designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. Bayesian inference is performed via MCMC sampling in JAGS, and it is easy to setup a new design with custom JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanés Bové et al. (2019) <doi:10.18637/jss.v089.i10>.
Works as an "add-on" to packages like shiny', future', as well as rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, dipsaus for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize shiny inputs without freezing the app, or how to get memory size on Linux or MacOS system. The enhancements roughly fall into these four categories: 1. shiny input widgets; 2. high-performance computing using the future package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.
For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models.Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
Implementations of stochastic, limited-memory quasi-Newton optimizers, similar in spirit to the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) algorithm, for smooth stochastic optimization. Implements the following methods: oLBFGS (online LBFGS) (Schraudolph, N.N., Yu, J. and Guenter, S., 2007 <http://proceedings.mlr.press/v2/schraudolph07a.html>), SQN (stochastic quasi-Newton) (Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016 <arXiv:1401.7020>), adaQN (adaptive quasi-Newton) (Keskar, N.S., Berahas, A.S., 2016, <arXiv:1511.01169>). Provides functions for easily creating R objects with partial_fit/predict methods from some given objective/gradient/predict functions. Includes an example stochastic logistic regression using these optimizers. Provides header files and registered C routines for using it directly from C/C++.
Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.
Computes 138 standard climate indices at monthly, seasonal and annual resolution. These indices were selected, based on their direct and significant impacts on target sectors, after a thorough review of the literature in the field of extreme weather events and natural hazards. Overall, the selected indices characterize different aspects of the frequency, intensity and duration of extreme events, and are derived from a broad set of climatic variables, including surface air temperature, precipitation, relative humidity, wind speed, cloudiness, solar radiation, and snow cover. The 138 indices have been classified as follow: Temperature based indices (42), Precipitation based indices (22), Bioclimatic indices (21), Wind-based indices (5), Aridity/ continentality indices (10), Snow-based indices (13), Cloud/radiation based indices (6), Drought indices (8), Fire indices (5), Tourism indices (5).
This package provides functions for computing the density, distribution, and random generation of the Decision Diffusion model (DDM), a widely used cognitive model for analysing choice and response time data. The package allows model specification, including the ability to fix, constrain, or vary parameters across experimental conditions. While it does not include a built-in optimiser, it supports likelihood evaluation and can be integrated with external tools for parameter estimation. Functions for simulating synthetic datasets are also provided. This package is intended for researchers modelling speeded decision-making in behavioural and cognitive experiments. For more information, see Voss, Rothermund, and Voss (2004) <doi:10.3758/BF03196893>, Voss and Voss (2007) <doi:10.3758/BF03192967>, and Ratcliff and McKoon (2008) <doi:10.1162/neco.2008.12-06-420>.
Estimation of multivariate normal (MVN) and student-t data of arbitrary dimension where the pattern of missing data is monotone. See Pantaleo and Gramacy (2010) <doi:10.48550/arXiv.0907.2135>. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.
Implementation of the Phoenix and Phoenix-8 Sepsis Criteria as described in "Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock" by Sanchez-Pinto, Bennett, DeWitt, Russell et al. (2024) <doi:10.1001/jama.2024.0196> (Drs. Sanchez-Pinto and Bennett contributed equally to this manuscript; Dr. DeWitt and Mr. Russell contributed equally to the manuscript), "International Consensus Criteria for Pediatric Sepsis and Septic Shock" by Schlapbach, Watson, Sorce, Argent, et al. (2024) <doi:10.1001/jama.2024.0179> (Drs Schlapbach, Watson, Sorce, and Argent contributed equally) and the application note "phoenix: an R package and Python module for calculating the Phoenix pediatric sepsis score and criteria" by DeWitt, Russell, Rebull, Sanchez-Pinto, and Bennett (2024) <doi:10.1093/jamiaopen/ooae066>.
Catch advice for data-limited vertebrate and invertebrate fisheries managed by harvest slot limits using the SlotLim harvest control rule. The package accompanies the manuscript "SlotLim: catch advice for data-limited vertebrate and invertebrate fisheries managed by harvest slot limits" (Pritchard et al., in prep). Minimum data requirements: at least two consecutive years of catch data, lengthâ frequency distributions, and biomass or abundance indices (all from fishery-dependent sources); species-specific growth rate parameters (either von Bertalanffy, Gompertz, or Schnute); and either the natural mortality rate ('M') or the maximum observed age ('tmax'), from which M is estimated. The following functions have optional plotting capabilities that require ggplot2 installed: prop_target(), TBA(), SAM(), catch_advice(), catch_adjust(), and slotlim_once().
This package provides functions for computing and visualizing generalized canonical discriminant analyses and canonical correlation analysis for a multivariate linear model. Traditional canonical discriminant analysis is restricted to a one-way MANOVA design and is equivalent to canonical correlation analysis between a set of quantitative response variables and a set of dummy variables coded from the factor variable. The candisc package generalizes this to higher-way MANOVA designs for all factors in a multivariate linear model, computing canonical scores and vectors for each term. The graphic functions provide low-rank (1D, 2D, 3D) visualizations of terms in an mlm via the plot.candisc and heplot.candisc methods. Related plots are now provided for canonical correlation analysis when all predictors are quantitative. Methods for linear discriminant analysis are now included.
Estimation of dark diversity and site-specific species pools using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.
Simulation, estimation and inference for univariate and multivariate TV(s)-GARCH(p,q,r)-X models, where s indicates the number and shape of the transition functions, p is the ARCH order, q is the GARCH order, r is the asymmetry order, and X indicates that covariates can be included; see Campos-Martins and Sucarrat (2024) <doi:10.18637/jss.v108.i09>. In the multivariate case, variances are estimated equation by equation and dynamic conditional correlations are allowed. The TV long-term component of the variance as in the multiplicative TV-GARCH model of Amado and Terasvirta (2013) <doi:10.1016/j.jeconom.2013.03.006> introduces non-stationarity whereas the GARCH-X short-term component describes conditional heteroscedasticity. Maximisation by parts leads to consistent and asymptotically normal estimates.
Additional options for making graphics in the context of analyzing high-throughput data are available here. This includes automatic segmenting of the current device (eg window) to accommodate multiple new plots, automatic checking for optimal location of legends in plots, small histograms to insert as legends, histograms re-transforming axis labels to linear when plotting log2-transformed data, a violin-plot <doi:10.1080/00031305.1998.10480559> function for a wide variety of input-formats, principal components analysis (PCA) <doi:10.1080/14786440109462720> with bag-plots <doi:10.1080/00031305.1999.10474494> to highlight and compare the center areas for groups of samples, generic MA-plots (differential- versus average-value plots) <doi:10.1093/nar/30.4.e15>, staggered count plots and generation of mouse-over interactive html pages.
Implementation of the Mode Jumping Markov Chain Monte Carlo algorithm from Hubin, A., Storvik, G. (2018) <doi:10.1016/j.csda.2018.05.020>, Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Storvik, G., & Frommlet, F. (2020) <doi:10.1214/18-BA1141>, Hubin, A., Storvik, G., & Frommlet, F. (2021) <doi:10.1613/jair.1.13047>, and Hubin, A., Heinze, G., & De Bin, R. (2023) <doi:10.3390/fractalfract7090641>, and Reversible Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Frommlet, F., & Storvik, G. (2021) <doi:10.48550/arXiv.2110.05316>, which allow for estimating posterior model probabilities and Bayesian model averaging across a wide set of Bayesian models including linear, generalized linear, generalized linear mixed, generalized nonlinear, generalized nonlinear mixed, and logic regression models.
Preregistrations, or more generally, registrations, enable explicit timestamped and (often but not necessarily publicly) frozen documentation of plans and expectations as well as decisions and justifications. In research, preregistrations are commonly used to clearly document plans and facilitate justifications of deviations from those plans, as well as decreasing the effects of publication bias by enabling identification of research that was conducted but not published. Like reporting guidelines, (pre)registration forms often have specific structures that facilitate systematic reporting of important items. The preregr package facilitates specifying (pre)registrations in R and exporting them to a human-readable format (using R Markdown partials or exporting to an HTML file) as well as human-readable embedded data (using JSON'), as well as importing such exported (pre)registration specifications from such embedded JSON'.
This package provides a comprehensive toolkit for extracting latent signals from panel data through multivariate time series analysis. Implements spectral decomposition methods including wavelet multiresolution analysis via maximal overlap discrete wavelet transform, Percival and Walden (2000) <doi:10.1017/CBO9780511841040>, empirical mode decomposition for non-stationary signals, Huang et al. (1998) <doi:10.1098/rspa.1998.0193>, and Bayesian trend extraction via the Grant-Chan embedded Hodrick-Prescott filter, Grant and Chan (2017) <doi:10.1016/j.jedc.2016.12.007>. Features Bayesian variable selection through regularized Horseshoe priors, Piironen and Vehtari (2017) <doi:10.1214/17-EJS1337SI>, for identifying structurally relevant predictors from high-dimensional candidate sets. Includes dynamic factor model estimation, principal component analysis with bootstrap significance testing, and automated technical interpretation of signal morphology and variance topology.
cn.mops (Copy Number estimation by a Mixture Of PoissonS) is a data processing pipeline for copy number variations and aberrations (CNVs and CNAs) from next generation sequencing (NGS) data. The package supplies functions to convert BAM files into read count matrices or genomic ranges objects, which are the input objects for cn.mops. cn.mops models the depths of coverage across samples at each genomic position. Therefore, it does not suffer from read count biases along chromosomes. Using a Bayesian approach, cn.mops decomposes read variations across samples into integer copy numbers and noise by its mixture components and Poisson distributions, respectively. cn.mops guarantees a low FDR because wrong detections are indicated by high noise and filtered out. cn.mops is very fast and written in C++.
This package provides tools to simulate alphanumeric alleles, impute genetic missing data and reconstruct non-recombinant haplotypes from pedigree databases in a deterministic way. Allelic simulations can be implemented taking into account many factors (such as number of families, markers, alleles per marker, probability and proportion of missing genotypes, recombination rate, etc). Genotype imputation can be used with simulated datasets or real databases (previously loaded in .ped format). Haplotype reconstruction can be carried out even with missing data, since the program firstly imputes each family genotype (without a reference panel), to later reconstruct the corresponding haplotypes for each family member. All this considering that each individual (due to meiosis) should unequivocally have two alleles per marker (one inherited from each parent) and thus imputation and reconstruction results can be deterministically calculated.