Generates artificial point patterns marked by their spatial and temporal signatures. The resulting point cloud may exhibit inherent interactions between both signatures. The simulation integrates microsimulation (Holm, E., (2017)<doi:10.1002/9781118786352.wbieg0320>) and agent-based models (Bonabeau, E., (2002)<doi:10.1073/pnas.082080899>), beginning with the configuration of movement characteristics for the specified agents (referred to as walkers') and their interactions within the simulation environment. These interactions (Quaglietta, L. and Porto, M., (2019)<doi:10.1186/s40462-019-0154-8>) result in specific spatiotemporal patterns that can be visualized, analyzed, and used for various analytical purposes. Given the growing scarcity of detailed spatiotemporal data across many domains, this package provides an alternative data source for applications in social and life sciences.
The t-Digest construction algorithm, by Dunning, (2019) <doi:10.48550/arXiv.1902.04023>, uses a variant of 1-dimensional k-means clustering to produce a very compact data structure that allows accurate estimation of quantiles. This t-Digest data structure can be used to estimate quantiles, compute other rank statistics or even to estimate related measures like trimmed means. The advantage of the t-Digest over previous digests for this purpose is that the t-Digest handles data with full floating point resolution. The accuracy of quantile estimates produced by t-Digests can be orders of magnitude more accurate than those produced by previous digest algorithms. Methods are provided to create and update t-Digests and retrieve quantiles from the accumulated distributions.
This package provides analytic derivatives and information matrices for fitted linear mixed effects (lme) models and generalized least squares (gls) models estimated using lme() (from package nlme) and gls() (from package nlme), respectively. The package includes functions for estimating the sampling variance-covariance of variance component parameters using the inverse Fisher information. The variance components include the parameters of the random effects structure (for lme models), the variance structure, and the correlation structure. The expected and average forms of the Fisher information matrix are used in the calculations, and models estimated by full maximum likelihood or restricted maximum likelihood are supported. The package also includes a function for estimating standardized mean difference effect sizes based on fitted lme or gls models.
This package implements a low dimensional visualization of a set of cytometry samples, in order to visually assess the distances between them. This, in turn, can greatly help the user to identify quality issues like batch effects or outlier samples, and/or check the presence of potential sample clusters that might align with the exeprimental design. The CytoMDS algorithm combines, on the one hand, the concept of Earth Mover's Distance (EMD), a.k.a. Wasserstein metric and, on the other hand, the Multi Dimensional Scaling (MDS) algorithm for the low dimensional projection. Also, the package provides some diagnostic tools for both checking the quality of the MDS projection, as well as tools to help with the interpretation of the axes of the projection.
Several methods may be found for selecting a subset of regressors from a set of k candidate variables in multiple linear regression. One possibility is to evaluate all possible regression models and comparing them using Mallows's Cp statistic (Cp) according to Gilmour original study. Full model is calculated, all possible combinations of regressors are generated, adjusted Cp for each submodel are computed, and the submodel with the minimum adjusted value Cp (ModelMin) is calculated. To identify the final model, the package applies a sequence of hypothesis tests on submodels nested within ModelMin, following the approach outlined in Gilmour's original paper. For more details see the help of the function final_model() and the original study (1996) <doi:10.2307/2348411>.
This package provides a comprehensive analysis tool for metabolomics data. It consists a variety of functional modules, including several new modules: a pre-processing module for normalization and imputation, an exploratory data analysis module for dimension reduction and source of variation analysis, a classification module with the new deep-learning method and other machine-learning methods, a prognosis module with cox-PH and neural-network based Cox-nnet methods, and pathway analysis module to visualize the pathway and interpret metabolite-pathway relationships. References: H. Paul Benton <http://www.metabolomics-forum.com/index.php?topic=281.0> Jeff Xia <https://github.com/cangfengzhe/Metabo/blob/master/MetaboAnalyst/website/name_match.R> Travers Ching, Xun Zhu, Lana X. Garmire (2018) <doi:10.1371/journal.pcbi.1006076>.
This package produces publication-ready statistical tables and figures formatted according to the 7th edition of the American Psychological Association (APA) style guidelines. Supports descriptive statistics, t-tests, z-tests, chi-square tests, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), two-way ANOVA with simple effects, Multivariate Analysis of Variance (MANOVA), robust and cluster-robust regression using Heteroscedasticity-Consistent (HC) standard errors, post-hoc pairwise comparisons, homoskedasticity and heteroscedasticity diagnostics including the Non-Constant Variance (NCV) test, proportion tests, and multilevel mixed-effects models with intraclass correlation coefficients (ICC) and model-comparison tables. Output can be directed to the console, Microsoft Word (via officer and flextable'), or LaTeX. For APA style guidelines see American Psychological Association (2020, ISBN:978-1-4338-3216-1).
This package provides a general regression neural network (GRNN) is a variant of a Radial Basis Function Network characterized by a fast single-pass learning. tsfgrnn allows you to forecast time series using a GRNN model Francisco Martinez et al. (2019) <doi:10.1007/978-3-030-20521-8_17> and Francisco Martinez et al. (2022) <doi:10.1016/j.neucom.2021.12.028>. When the forecasting horizon is higher than 1, two multi-step ahead forecasting strategies can be used. The model built is autoregressive, that is, it is only based on the observations of the time series. You can consult and plot how the prediction was done. It is also possible to assess the forecasting accuracy of the model using rolling origin evaluation.
High-throughput cell imaging facilitates the analysis of cell migration across many wells treated under different biological conditions. These workflows generate considerable technical noise and biological variability, and therefore technical and biological replicates are necessary, leading to large, hierarchically structured datasets, i.e., cells are nested within technical replicates that are nested within biological replicates. Current statistical analyses of such data usually ignore the hierarchical structure of the data and fail to explicitly quantify uncertainty arising from technical or biological variability. To address this gap, we present cellmig, an R package implementing Bayesian hierarchical models for migration analysis. cellmig quantifies condition- specific velocity changes (e.g., drug effects) while modeling nested data structures and technical artifacts. It further enables synthetic data generation for experimental design optimization.
It makes the creation of networks from sequences of RNA, with this is done the abstraction of characteristics of these networks with a methodology of threshold for the purpose of making a classification between the classes of the sequences. There are four data present in the BASiNET package, "sequences", "sequences2", "sequences-predict" and "sequences2-predict" with 11, 10, 11 and 11 sequences respectively. These sequences were taken from the data set used in the article (LI, Aimin; ZHANG, Junying; ZHOU, Zhongyin, 2014) <doi:10.1186/1471-2105-15-311>, these sequences are used to run examples. The BASiNET was published on Nucleic Acids Research, (ITO, Eric; KATAHIRA, Isaque; VICENTE, Fábio; PEREIRA, Felipe; LOPES, Fabrà cio, 2018) <doi:10.1093/nar/gky462>.
This package provides a series of functions to implement association of covariance for detecting differential co-expression (ACDC), a novel approach for detection of differential co-expression that simultaneously accommodates multiple phenotypes or exposures with binary, ordinal, or continuous data types. Users can use the default method which identifies modules by Partition or may supply their own modules. Also included are functions to choose an information loss criterion (ILC) for Partition using OmicS-data-based Complex trait Analysis (OSCA) and Genome-wide Complex trait Analysis (GCTA). The manuscript describing these methods is as follows: Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. "ACDC: a general approach for detecting phenotype or exposure associated co-expression" (2023) <doi:10.3389/fmed.2023.1118824>.
Perform sensitivity analysis in structural equation modeling using meta-heuristic optimization methods (e.g., ant colony optimization and others). The references for the proposed methods are: (1) Leite, W., & Shen, Z., Marcoulides, K., Fish, C., & Harring, J. (2022). <doi:10.1080/10705511.2021.1881786> (2) Harring, J. R., McNeish, D. M., & Hancock, G. R. (2017) <doi:10.1080/10705511.2018.1506925>; (3) Fisk, C., Harring, J., Shen, Z., Leite, W., Suen, K., & Marcoulides, K. (2022). <doi:10.1177/00131644211073121>; (4) Socha, K., & Dorigo, M. (2008) <doi:10.1016/j.ejor.2006.06.046>. We also thank Dr. Krzysztof Socha for sharing his research on ant colony optimization algorithm with continuous domains and associated R code, which provided the base for the development of this package.
Data analysis package for estimating potential biological effects from chemical concentrations in environmental samples. Included are a set of functions to analyze, visualize, and organize measured concentration data as it relates to user-selected chemical-biological interaction benchmark data such as water quality criteria. The intent of these analyses is to develop a better understanding of the potential biological relevance of environmental chemistry data. Results can be used to prioritize which chemicals at which sites may be of greatest concern. These methods are meant to be used as a screening technique to predict potential for biological influence from chemicals that ultimately need to be validated with direct biological assays. A description of the analysis can be found in Blackwell (2017) <doi:10.1021/acs.est.7b01613>.
An automated graphical exploratory data analysis (EDA) tool that introduces: a.) wideplot graphics for exploring the structure of a dataset through a grid of variables and graphic types. b.) longplot graphics, which present the entire catalog of available graphics for representing a particular variable using a grid of graphic types and variations on these types. c.) plotup function, which presents a particular graphic for a specific variable of a dataset. The plotup() function also makes it possible to obtain the code used to generate the graphic, meaning that the user can adjust its properties as needed. d.) matrixplot graphics that is a grid of a particular graphic showing bivariate relationships between all pairs of variables of a certain(s) type(s) in a multivariate data set.
There are 6 novel robust tests for equal correlation. They are all based on logistic regressions. The score statistic U is proportion to difference of two correlations based on different types of correlation in 6 methods. The ST1() is based on Pearson correlation. ST2() improved ST1() by using median absolute deviation. ST3() utilized type M correlation and ST4() used Spearman correlation. ST5() and ST6() used two different ways to combine ST3() and ST4(). We highly recommend ST5() according to the article titled New Statistical Methods for Constructing Robust Differential Correlation Networks to characterize the interactions among microRNAs published in Scientific Reports. Please see the reference: Yu et al. (2019) <doi:10.1038/s41598-019-40167-8>.
This package implements a wide range of dose escalation designs. The focus is on model-based designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. Bayesian inference is performed via MCMC sampling in JAGS, and it is easy to setup a new design with custom JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanés Bové et al. (2019) <doi:10.18637/jss.v089.i10>.
Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>), (Economics Survey Standard 2024, <https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2>) and by Economic Commission for Latin America and Caribbean (2020, <https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf>), (2024, <https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content>).
Works as an "add-on" to packages like shiny', future', as well as rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, dipsaus for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize shiny inputs without freezing the app, or how to get memory size on Linux or MacOS system. The enhancements roughly fall into these four categories: 1. shiny input widgets; 2. high-performance computing using the future package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.
For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models.Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
Implementations of stochastic, limited-memory quasi-Newton optimizers, similar in spirit to the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) algorithm, for smooth stochastic optimization. Implements the following methods: oLBFGS (online LBFGS) (Schraudolph, N.N., Yu, J. and Guenter, S., 2007 <http://proceedings.mlr.press/v2/schraudolph07a.html>), SQN (stochastic quasi-Newton) (Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016 <arXiv:1401.7020>), adaQN (adaptive quasi-Newton) (Keskar, N.S., Berahas, A.S., 2016, <arXiv:1511.01169>). Provides functions for easily creating R objects with partial_fit/predict methods from some given objective/gradient/predict functions. Includes an example stochastic logistic regression using these optimizers. Provides header files and registered C routines for using it directly from C/C++.
Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.
Computes 138 standard climate indices at monthly, seasonal and annual resolution. These indices were selected, based on their direct and significant impacts on target sectors, after a thorough review of the literature in the field of extreme weather events and natural hazards. Overall, the selected indices characterize different aspects of the frequency, intensity and duration of extreme events, and are derived from a broad set of climatic variables, including surface air temperature, precipitation, relative humidity, wind speed, cloudiness, solar radiation, and snow cover. The 138 indices have been classified as follow: Temperature based indices (42), Precipitation based indices (22), Bioclimatic indices (21), Wind-based indices (5), Aridity/ continentality indices (10), Snow-based indices (13), Cloud/radiation based indices (6), Drought indices (8), Fire indices (5), Tourism indices (5).
This package provides functions for computing the density, distribution, and random generation of the Decision Diffusion model (DDM), a widely used cognitive model for analysing choice and response time data. The package allows model specification, including the ability to fix, constrain, or vary parameters across experimental conditions. While it does not include a built-in optimiser, it supports likelihood evaluation and can be integrated with external tools for parameter estimation. Functions for simulating synthetic datasets are also provided. This package is intended for researchers modelling speeded decision-making in behavioural and cognitive experiments. For more information, see Voss, Rothermund, and Voss (2004) <doi:10.3758/BF03196893>, Voss and Voss (2007) <doi:10.3758/BF03192967>, and Ratcliff and McKoon (2008) <doi:10.1162/neco.2008.12-06-420>.
Estimation of multivariate normal (MVN) and student-t data of arbitrary dimension where the pattern of missing data is monotone. See Pantaleo and Gramacy (2010) <doi:10.48550/arXiv.0907.2135>. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.