Real-time quantitative polymerase chain reaction (qPCR
) data sets by Lievens et al. (2012) <doi:10.1093/nar/gkr775>. Provides one single tabular tidy data set in long format, encompassing three dilution series, targeted against the soybean Lectin endogene. Each dilution series was assayed in one of the following PCR-efficiency-modifying conditions: no PCR inhibition, inhibition by isopropanol and inhibition by tannic acid. The inhibitors were co-diluted along with the dilution series. The co-dilution series consists of a five-point, five-fold serial dilution. For each concentration there are 18 replicates. Each amplification curve is 60 cycles long. Original raw data file is available at the Supplementary Data section at Nucleic Acids Research Online <doi:10.1093/nar/gkr775>.
An implementation of ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. As described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7>, ranked sparsity is a philosophy with methods primarily useful for variable selection in the presence of prior informational asymmetry, which occurs in the context of trying to perform variable selection in the presence of interactions and/or polynomials. Ultimately, this package attempts to facilitate dealing with cumbersome interactions and polynomials while not avoiding them entirely. Typically, models selected under ranked sparsity principles will also be more transparent, having fewer falsely selected interactions and polynomials than other methods.
Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.
This package provides a collection of datasets of human-computer interaction (HCI) experiments. Each dataset is from an HCI paper, with all fields described and the original publication linked. All paper authors of included data have consented to the inclusion of their data in this package. The datasets include data from a range of HCI studies, such as pointing tasks, user experience ratings, and steering tasks. Dataset sources: Bergström et al. (2022) <doi:10.1145/3490493>; Dalsgaard et al. (2021) <doi:10.1145/3489849.3489853>; Larsen et al. (2019) <doi:10.1145/3338286.3340115>; Lilija et al. (2019) <doi:10.1145/3290605.3300676>; Pohl and Murray-Smith (2013) <doi:10.1145/2470654.2481307>; Pohl and Mottelson (2022) <doi:10.3389/frvir.2022.719506>.
Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.
This package provides a Non-Metric Space Library ('NMSLIB <https://github.com/nmslib/nmslib>) wrapper, which according to the authors "is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the NMSLIB <https://github.com/nmslib/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic non-metric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on approximate methods". The wrapper also includes Approximate Kernel k-Nearest-Neighbor functions based on the NMSLIB <https://github.com/nmslib/nmslib> Python Library.
Generates artificial point patterns marked by their spatial and temporal signatures. The resulting point cloud may exhibit inherent interactions between both signatures. The simulation integrates microsimulation (Holm, E., (2017)<doi:10.1002/9781118786352.wbieg0320>) and agent-based models (Bonabeau, E., (2002)<doi:10.1073/pnas.082080899>), beginning with the configuration of movement characteristics for the specified agents (referred to as walkers') and their interactions within the simulation environment. These interactions (Quaglietta, L. and Porto, M., (2019)<doi:10.1186/s40462-019-0154-8>) result in specific spatiotemporal patterns that can be visualized, analyzed, and used for various analytical purposes. Given the growing scarcity of detailed spatiotemporal data across many domains, this package provides an alternative data source for applications in social and life sciences.
This package provides a comprehensive analysis tool for metabolomics data. It consists a variety of functional modules, including several new modules: a pre-processing module for normalization and imputation, an exploratory data analysis module for dimension reduction and source of variation analysis, a classification module with the new deep-learning method and other machine-learning methods, a prognosis module with cox-PH and neural-network based Cox-nnet methods, and pathway analysis module to visualize the pathway and interpret metabolite-pathway relationships. References: H. Paul Benton <http://www.metabolomics-forum.com/index.php?topic=281.0> Jeff Xia <https://github.com/cangfengzhe/Metabo/blob/master/MetaboAnalyst/website/name_match.R>
Travers Ching, Xun Zhu, Lana X. Garmire (2018) <doi:10.1371/journal.pcbi.1006076>.
This package provides a general regression neural network (GRNN) is a variant of a Radial Basis Function Network characterized by a fast single-pass learning. tsfgrnn allows you to forecast time series using a GRNN model Francisco Martinez et al. (2019) <doi:10.1007/978-3-030-20521-8_17> and Francisco Martinez et al. (2022) <doi:10.1016/j.neucom.2021.12.028>. When the forecasting horizon is higher than 1, two multi-step ahead forecasting strategies can be used. The model built is autoregressive, that is, it is only based on the observations of the time series. You can consult and plot how the prediction was done. It is also possible to assess the forecasting accuracy of the model using rolling origin evaluation.
This package implements a low dimensional visualization of a set of cytometry samples, in order to visually assess the distances between them. This, in turn, can greatly help the user to identify quality issues like batch effects or outlier samples, and/or check the presence of potential sample clusters that might align with the exeprimental design. The CytoMDS
algorithm combines, on the one hand, the concept of Earth Mover's Distance (EMD), a.k.a. Wasserstein metric and, on the other hand, the Multi Dimensional Scaling (MDS) algorithm for the low dimensional projection. Also, the package provides some diagnostic tools for both checking the quality of the MDS projection, as well as tools to help with the interpretation of the axes of the projection.
It makes the creation of networks from sequences of RNA, with this is done the abstraction of characteristics of these networks with a methodology of threshold for the purpose of making a classification between the classes of the sequences. There are four data present in the BASiNET
package, "sequences", "sequences2", "sequences-predict" and "sequences2-predict" with 11, 10, 11 and 11 sequences respectively. These sequences were taken from the data set used in the article (LI, Aimin; ZHANG, Junying; ZHOU, Zhongyin, 2014) <doi:10.1186/1471-2105-15-311>, these sequences are used to run examples. The BASiNET
was published on Nucleic Acids Research, (ITO, Eric; KATAHIRA, Isaque; VICENTE, Fábio; PEREIRA, Felipe; LOPES, Fabrà cio, 2018) <doi:10.1093/nar/gky462>.
Streamlines Quarto workflows by providing tools for consistent project setup and documentation. Enables portability through reusable metadata, automated project structure creation, and standardized templates. Features include enhanced project initialization, pre-formatted Quarto documents, comprehensive data protection settings, custom styling, and structured documentation generation. Designed to improve efficiency and collaboration in R data science projects by reducing repetitive setup tasks while maintaining consistent formatting across multiple documents. There are many valuable resources providing in-depth explanations of customizing Quarto templates and theme styling by the Posit team: <https://quarto.org/docs/output-formats/html-themes.html#customizing-themes> & <https://quarto.org/docs/output-formats/html-themes-more.html>, and at the Bootstrap community's GitHub
at <https://github.com/twbs/bootstrap/blob/main/scss/_variables.scss>.
This package provides a series of functions to implement association of covariance for detecting differential co-expression (ACDC), a novel approach for detection of differential co-expression that simultaneously accommodates multiple phenotypes or exposures with binary, ordinal, or continuous data types. Users can use the default method which identifies modules by Partition or may supply their own modules. Also included are functions to choose an information loss criterion (ILC) for Partition using OmicS-data-based
Complex trait Analysis (OSCA) and Genome-wide Complex trait Analysis (GCTA). The manuscript describing these methods is as follows: Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. "ACDC: a general approach for detecting phenotype or exposure associated co-expression" (2023) <doi:10.3389/fmed.2023.1118824>.
Perform sensitivity analysis in structural equation modeling using meta-heuristic optimization methods (e.g., ant colony optimization and others). The references for the proposed methods are: (1) Leite, W., & Shen, Z., Marcoulides, K., Fish, C., & Harring, J. (2022). <doi:10.1080/10705511.2021.1881786> (2) Harring, J. R., McNeish
, D. M., & Hancock, G. R. (2017) <doi:10.1080/10705511.2018.1506925>; (3) Fisk, C., Harring, J., Shen, Z., Leite, W., Suen, K., & Marcoulides, K. (2022). <doi:10.1177/00131644211073121>; (4) Socha, K., & Dorigo, M. (2008) <doi:10.1016/j.ejor.2006.06.046>. We also thank Dr. Krzysztof Socha for sharing his research on ant colony optimization algorithm with continuous domains and associated R code, which provided the base for the development of this package.
The t-Digest construction algorithm, by Dunning et al., (2019) <doi:10.48550/arXiv.1902.04023>
, uses a variant of 1-dimensional k-means clustering to produce a very compact data structure that allows accurate estimation of quantiles. This t-Digest data structure can be used to estimate quantiles, compute other rank statistics or even to estimate related measures like trimmed means. The advantage of the t-Digest over previous digests for this purpose is that the t-Digest handles data with full floating point resolution. The accuracy of quantile estimates produced by t-Digests can be orders of magnitude more accurate than those produced by previous digest algorithms. Methods are provided to create and update t-Digests and retrieve quantiles from the accumulated distributions.
Data analysis package for estimating potential biological effects from chemical concentrations in environmental samples. Included are a set of functions to analyze, visualize, and organize measured concentration data as it relates to user-selected chemical-biological interaction benchmark data such as water quality criteria. The intent of these analyses is to develop a better understanding of the potential biological relevance of environmental chemistry data. Results can be used to prioritize which chemicals at which sites may be of greatest concern. These methods are meant to be used as a screening technique to predict potential for biological influence from chemicals that ultimately need to be validated with direct biological assays. A description of the analysis can be found in Blackwell (2017) <doi:10.1021/acs.est.7b01613>.
An automated graphical exploratory data analysis (EDA) tool that introduces: a.) wideplot graphics for exploring the structure of a dataset through a grid of variables and graphic types. b.) longplot graphics, which present the entire catalog of available graphics for representing a particular variable using a grid of graphic types and variations on these types. c.) plotup function, which presents a particular graphic for a specific variable of a dataset. The plotup()
function also makes it possible to obtain the code used to generate the graphic, meaning that the user can adjust its properties as needed. d.) matrixplot graphics that is a grid of a particular graphic showing bivariate relationships between all pairs of variables of a certain(s) type(s) in a multivariate data set.
Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>), (Economics Survey Standard 2024, <https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2>) and by Economic Commission for Latin America and Caribbean (2020, <https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf>), (2024, <https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content>).
There are 6 novel robust tests for equal correlation. They are all based on logistic regressions. The score statistic U is proportion to difference of two correlations based on different types of correlation in 6 methods. The ST1()
is based on Pearson correlation. ST2()
improved ST1()
by using median absolute deviation. ST3()
utilized type M correlation and ST4()
used Spearman correlation. ST5()
and ST6()
used two different ways to combine ST3()
and ST4()
. We highly recommend ST5()
according to the article titled New Statistical Methods for Constructing Robust Differential Correlation Networks to characterize the interactions among microRNAs
published in Scientific Reports. Please see the reference: Yu et al. (2019) <doi:10.1038/s41598-019-40167-8>.
Works as an "add-on" to packages like shiny', future', as well as rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, dipsaus for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize shiny inputs without freezing the app, or how to get memory size on Linux or MacOS
system. The enhancements roughly fall into these four categories: 1. shiny input widgets; 2. high-performance computing using the future package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.
For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models.Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
An implementation of estimating the Latent Unknown Clusters By Integrating Multi-omics Data (LUCID) model (Peng (2019) <doi:10.1093/bioinformatics/btz667>). LUCID conducts integrated clustering using exposures, omics information (and outcome information as an option). This package implements three different integration strategies for multi-omics data analysis within the LUCID framework: LUCID early integration (the original LUCID model), LUCID in parallel (intermediate integration), and LUCID in serial (late integration). Automated model selection for each LUCID model is available to obtain the optimal number of latent clusters, and an integrated imputation approach is implemented to handle sporadic and list-wise missingness in multi-omics data. Lasso-type regularity for exposure and omics features were added. S3 methods for summary and plotting functions were fixed. Fixed minor bugs.
Implementations of stochastic, limited-memory quasi-Newton optimizers, similar in spirit to the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) algorithm, for smooth stochastic optimization. Implements the following methods: oLBFGS
(online LBFGS) (Schraudolph, N.N., Yu, J. and Guenter, S., 2007 <http://proceedings.mlr.press/v2/schraudolph07a.html>), SQN (stochastic quasi-Newton) (Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016 <arXiv:1401.7020>
), adaQN
(adaptive quasi-Newton) (Keskar, N.S., Berahas, A.S., 2016, <arXiv:1511.01169>
). Provides functions for easily creating R objects with partial_fit/predict methods from some given objective/gradient/predict functions. Includes an example stochastic logistic regression using these optimizers. Provides header files and registered C routines for using it directly from C/C++.
Is a collection of models to analyze genome scale codon data using a Bayesian framework. Provides visualization routines and checkpointing for model fittings. Currently published models to analyze gene data for selection on codon usage based on Ribosome Overhead Cost (ROC) are: ROC (Gilchrist et al. (2015) <doi:10.1093/gbe/evv087>), and ROC with phi (Wallace & Drummond (2013) <doi:10.1093/molbev/mst051>). In addition AnaCoDa
contains three currently unpublished models. The FONSE (First order approximation On NonSense
Error) model analyzes gene data for selection on codon usage against of nonsense error rates. The PA (PAusing time) and PANSE (PAusing time + NonSense
Error) models use ribosome footprinting data to analyze estimate ribosome pausing times with and without nonsense error rate from ribosome footprinting data.