Performance metric provides different performance measures like mean squared error, root mean square error, mean absolute deviation, mean absolute percentage error etc. of a fitted model. These can provide a way for forecasters to quantitatively compare the performance of competing models. For method details see (i) Pankaj Das (2020) <http://krishi.icar.gov.in/jspui/handle/123456789/44138>.
An assortment of functions that could be useful in analyzing data from psychophysical experiments. It includes functions for calculating d from several different experimental designs, links for m-alternative forced-choice (mafc) data to be used with the binomial family in glm (and possibly other contexts) and self-Start functions for estimating gamma values for CRT screen calibrations.
This package provides functions for color-based visualization of multivariate data, i.e. colorgrams or heatmaps. Lower-level functions map numeric values to colors, display a matrix as an array of colors, and draw color keys. Higher-level plotting functions generate a bivariate histogram, a dendrogram aligned with a color-coded matrix, a triangular distance matrix, and more.
Generates and predicts a set of linearly stacked Random Forest models using bootstrap sampling. Individual datasets may be heterogeneous (not all samples have full sets of features). Contains support for parallelization but the user should register their cores before running. This is an extension of the method found in Matlock (2018) <doi:10.1186/s12859-018-2060-2>.
This package implements the TabNet model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with Coherent Hierarchical Multi-label Classification Networks by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the tidymodels ecosystem.
This package provides a comprehensive resource for data on Taylor Swift songs. Data is included for all officially released studio albums, extended plays (EPs), and individual singles are included. Data comes from Genius (lyrics) and SoundStat (song characteristics). Additional functions are included for easily creating data visualizations with color palettes inspired by Taylor Swift's album covers.
Unobserved components time series model using the linear innovations state space representation (single source of error) with choice of error distributions and option for dynamic variance. Methods for estimation using automatic differentiation, automatic model selection and ensembling, prediction, filtering, simulation and backtesting. Based on the model described in Hyndman et al (2012) <doi:10.1198/jasa.2011.tm09771>.
The BACON algorithms are methods for multivariate outlier nomination (detection) and robust linear regression by Billor, Hadi, and Velleman (2000) <doi:10.1016/S0167-9473(99)00101-2>. The extension to weighted problems is due to Beguin and Hulliger (2008) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200800110616>; see also <doi:10.21105/joss.03238>.
This package provides a complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.
The Resource Description Framework, or RDF is a widely used data representation model that forms the cornerstone of the Semantic Web. RDF represents data as a graph rather than the familiar data table or rectangle of relational databases. The rdflib package provides a friendly and concise user interface for performing common tasks on RDF data, such as reading, writing and converting between the various serializations of RDF data, including rdfxml', turtle', nquads', ntriples', and json-ld'; creating new RDF graphs, and performing graph queries using SPARQL'. This package wraps the low level redland R package which provides direct bindings to the redland C library. Additionally, the package supports the newer and more developer friendly JSON-LD format through the jsonld package. The package interface takes inspiration from the Python rdflib library.
Data exploration and prediction with focus on high dimensional data and chemometrics. The package was initially designed about partial least squares regression and discrimination models and variants, in particular locally weighted PLS models (LWPLS). Then, it has been expanded to many other methods for analyzing high dimensional data. The name rchemo comes from the fact that the package is orientated to chemometrics, but most of the provided methods are fully generic to other domains. Functions such as transform(), predict(), coef() and summary() are available. Tuning the predictive models is facilitated by generic functions gridscore() (validation dataset) and gridcv() (cross-validation). Faster versions are also available for models based on latent variables (LVs) (gridscorelv() and gridcvlv()) and ridge regularization (gridscorelb() and gridcvlb()).
Allows the user to learn Bayesian networks from datasets containing thousands of variables. It focuses on score-based learning, mainly the BIC and the BDeu score functions. It provides state-of-the-art algorithms for the following tasks: (1) parent set identification - Mauro Scanagatta (2015) <http://papers.nips.cc/paper/5803-learning-bayesian-networks-with-thousands-of-variables>; (2) general structure optimization - Mauro Scanagatta (2018) <doi:10.1007/s10994-018-5701-9>, Mauro Scanagatta (2018) <http://proceedings.mlr.press/v73/scanagatta17a.html>; (3) bounded treewidth structure optimization - Mauro Scanagatta (2016) <http://papers.nips.cc/paper/6232-learning-treewidth-bounded-bayesian-networks-with-thousands-of-variables>; (4) structure learning on incomplete data sets - Mauro Scanagatta (2018) <doi:10.1016/j.ijar.2018.02.004>. Distributed under the LGPL-3 by IDSIA.
SCUDO (Signature-based Clustering for Diagnostic Purposes) is a rank-based method for the analysis of gene expression profiles for diagnostic and classification purposes. It is based on the identification of sample-specific gene signatures composed of the most up- and down-regulated genes for that sample. Starting from gene expression data, functions in this package identify sample-specific gene signatures and use them to build a graph of samples. In this graph samples are joined by edges if they have a similar expression profile, according to a pre-computed similarity matrix. The similarity between the expression profiles of two samples is computed using a method similar to GSEA. The graph of samples can then be used to perform community clustering or to perform supervised classification of samples in a testing set.
This is an R package to make it easier to import and store phylogenetic trees with associated data; and to link external data from different sources to phylogeny. It also supports exporting phylogenetic trees with heterogeneous associated data to a single tree file and can be served as a platform for merging tree with associated data and converting file formats.
This package is a flexible and comprehensive R toolbox for model-based optimization. It implements Efficient Global Optimization Algorithm for single- and multi-objective optimization. It supports mixed parameters. The machine learning toolbox mlr offers regression learners. It provides various infill criteria and features batch proposal, parallel execution, visualization, and logging. Its modular implementation allows easy customization by the user.
The R package ggplot2 is a plotting system based on the grammar of graphics. GGally extends ggplot2 by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
MEDIPS was developed for analyzing data derived from methylated DNA immunoprecipitation (MeDIP) experiments followed by sequencing (MeDIP-seq). However, MEDIPS provides functionalities for the analysis of any kind of quantitative sequencing data (e.g. ChIP-seq, MBD-seq, CMS-seq and others) including calculation of differential coverage between groups of samples and saturation and correlation analysis.
Interact with Condor from R via SSH connection. Files are first uploaded from user machine to submitter machine, and the job is then submitted from the submitter machine to Condor'. Functions are provided to submit, list, and download Condor jobs from R. Condor is an open source high-throughput computing software framework for distributed parallelization of computationally intensive tasks.
This package contains tools for working with data during statistical analysis, promoting flexible, intuitive, and reproducible workflows. There are functions designated for specific statistical tasks such building a custom univariate descriptive table, computing pairwise association statistics, etc. These are built on a collection of data manipulation tools designed for general use that are motivated by the functional programming concept.
This package implements a kernel-based association test for copy number variation (CNV) aggregate analysis in a certain genomic region (e.g., gene set, chromosome, or genome) that is robust to the within-locus and across-locus etiological heterogeneity, and bypass the need to define a "locus" unit for CNVs. Brucker, A., et al. (2020) <doi:10.1101/666875>.
This package provides tools for modeling time-to-event data with a cure fraction under correlated destructive negative binomial cure rate models. The models assume multiple latent competing causes with possible dependence and allow for elimination (inactivation) of some initial causes. Estimation is performed via an Expectation-Maximization algorithm, and diagnostic tools based on Cox-Snell residuals are provided.
This package provides utility functions for standardizing economic entity (economy, aggregate, institution, etc.) name and id in economic datasets such as those published by the International Monetary Fund and World Bank. Aims to facilitate consistent data analysis, reporting, and joining across datasets. Used as a foundational building block in the EconDataverse family of packages (<https://www.econdataverse.org>).
This package provides a Shiny'-based toolkit for item/test analysis. It is designed for multiple-choice, true-false, and open-ended questions. The toolkit is usable with datasets in 1-0 or other formats. Key analyses include difficulty, discrimination, response-option analysis, reports. The classical test theory methods used are described in Ebel & Frisbie (1991, ISBN:978-0132892314).
This package contains Rcpp and RcppEigen implementations of matrix operations useful for Gaussian process models, such as the inversion of a symmetric Toeplitz matrix, sampling from multivariate normal distributions, evaluation of the log-density of a multivariate normal vector, and Bayesian inference for latent variable Gaussian process models with elliptical slice sampling (Murray, Adams, and MacKay 2010).