Allows to perform the dynamic mixture estimation with state-space components and normal regression components, and clustering with normal mixture. Quasi-Bayesian estimation, as well as, that based on the Kerridge inaccuracy approximation are implemented. Main references: Nagy and Suzdaleva (2013) <doi:10.1016/j.apm.2013.05.038>; Nagy et al. (2011) <doi:10.1002/acs.1239>.
Tool to print out the value of R objects/expressions while running an R script. Outputs can be made dependent on user-defined conditions/criteria. Debug messages only appear when a global option for debugging is set. This way, debugr code can even remain in the debugged code for later use without any negative effects during normal runtime.
This package provides utility functions for standardizing economic entity (economy, aggregate, institution, etc.) name and id in economic datasets such as those published by the International Monetary Fund and World Bank. Aims to facilitate consistent data analysis, reporting, and joining across datasets. Used as a foundational building block in the econdataverse family of packages (<https://www.econdataverse.org>).
This package performs test procedures for general hypothesis testing problems for four multivariate coefficients of variation (Ditzhaus and Smaga, 2023 <arXiv:2301.12009>). We can verify the global hypothesis about equality as well as the particular hypotheses defined by contrasts, e.g., we can conduct post hoc tests. We also provide the simultaneous confidence intervals for contrasts.
This package provides a suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the hyperdirichlet package; uses disordR discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.
This package provides functions for genome-wide association studies (GWAS)/gene-environment-wide interaction studies (GEWIS) with longitudinal outcomes and exposures. He et al. (2017) "Set-Based Tests for Gene-Environment Interaction in Longitudinal Studies" and He et al. (2017) "Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA)".
Simulation, analysis and sampling of spatial biodiversity data (May, Gerstner, McGlinn, Xiao & Chase 2017) <doi:10.1111/2041-210x.12986>. In the simulation tools user define the numbers of species and individuals, the species abundance distribution and species aggregation. Functions for analysis include species rarefaction and accumulation curves, species-area relationships and the distance decay of similarity.
Fits community site occupancy models to environmental DNA metabarcoding data collected using spatially-replicated survey design. Model fitting results can be used to evaluate and compare the effectiveness of species detection to find an efficient survey design. Reference: Fukaya et al. (2022) <doi:10.1111/2041-210X.13732>, Fukaya and Hasebe (2025) <doi:10.1002/1438-390X.12219>.
Performance metric provides different performance measures like mean squared error, root mean square error, mean absolute deviation, mean absolute percentage error etc. of a fitted model. These can provide a way for forecasters to quantitatively compare the performance of competing models. For method details see (i) Pankaj Das (2020) <http://krishi.icar.gov.in/jspui/handle/123456789/44138>.
This package provides a ggplot2 front end to plot summary statistics on danish provinces, regions, municipalities, and zipcodes. The needed geoms of each of the four levels are inherent in the package, thus making these types of plots easy for the user. This is essentially an updated port of the previously available mapDK package by Sebastian Barfort.
An assortment of functions that could be useful in analyzing data from psychophysical experiments. It includes functions for calculating d from several different experimental designs, links for m-alternative forced-choice (mafc) data to be used with the binomial family in glm (and possibly other contexts) and self-Start functions for estimating gamma values for CRT screen calibrations.
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as Spectronaut', MaxQuant and Proteome Discover can be easily used due to flexibility of functions.
This package provides functions for color-based visualization of multivariate data, i.e. colorgrams or heatmaps. Lower-level functions map numeric values to colors, display a matrix as an array of colors, and draw color keys. Higher-level plotting functions generate a bivariate histogram, a dendrogram aligned with a color-coded matrix, a triangular distance matrix, and more.
Generates and predicts a set of linearly stacked Random Forest models using bootstrap sampling. Individual datasets may be heterogeneous (not all samples have full sets of features). Contains support for parallelization but the user should register their cores before running. This is an extension of the method found in Matlock (2018) <doi:10.1186/s12859-018-2060-2>.
This package implements the TabNet model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with Coherent Hierarchical Multi-label Classification Networks by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the tidymodels ecosystem.
Unobserved components time series model using the linear innovations state space representation (single source of error) with choice of error distributions and option for dynamic variance. Methods for estimation using automatic differentiation, automatic model selection and ensembling, prediction, filtering, simulation and backtesting. Based on the model described in Hyndman et al (2012) <doi:10.1198/jasa.2011.tm09771>.
The BACON algorithms are methods for multivariate outlier nomination (detection) and robust linear regression by Billor, Hadi, and Velleman (2000) <doi:10.1016/S0167-9473(99)00101-2>. The extension to weighted problems is due to Beguin and Hulliger (2008) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200800110616>; see also <doi:10.21105/joss.03238>.
This package provides a complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.
Data exploration and prediction with focus on high dimensional data and chemometrics. The package was initially designed about partial least squares regression and discrimination models and variants, in particular locally weighted PLS models (LWPLS). Then, it has been expanded to many other methods for analyzing high dimensional data. The name rchemo comes from the fact that the package is orientated to chemometrics, but most of the provided methods are fully generic to other domains. Functions such as transform(), predict(), coef() and summary() are available. Tuning the predictive models is facilitated by generic functions gridscore() (validation dataset) and gridcv() (cross-validation). Faster versions are also available for models based on latent variables (LVs) (gridscorelv() and gridcvlv()) and ridge regularization (gridscorelb() and gridcvlb()).
Allows the user to learn Bayesian networks from datasets containing thousands of variables. It focuses on score-based learning, mainly the BIC and the BDeu score functions. It provides state-of-the-art algorithms for the following tasks: (1) parent set identification - Mauro Scanagatta (2015) <http://papers.nips.cc/paper/5803-learning-bayesian-networks-with-thousands-of-variables>; (2) general structure optimization - Mauro Scanagatta (2018) <doi:10.1007/s10994-018-5701-9>, Mauro Scanagatta (2018) <http://proceedings.mlr.press/v73/scanagatta17a.html>; (3) bounded treewidth structure optimization - Mauro Scanagatta (2016) <http://papers.nips.cc/paper/6232-learning-treewidth-bounded-bayesian-networks-with-thousands-of-variables>; (4) structure learning on incomplete data sets - Mauro Scanagatta (2018) <doi:10.1016/j.ijar.2018.02.004>. Distributed under the LGPL-3 by IDSIA.
The Resource Description Framework, or RDF is a widely used data representation model that forms the cornerstone of the Semantic Web. RDF represents data as a graph rather than the familiar data table or rectangle of relational databases. The rdflib package provides a friendly and concise user interface for performing common tasks on RDF data, such as reading, writing and converting between the various serializations of RDF data, including rdfxml', turtle', nquads', ntriples', and json-ld'; creating new RDF graphs, and performing graph queries using SPARQL'. This package wraps the low level redland R package which provides direct bindings to the redland C library. Additionally, the package supports the newer and more developer friendly JSON-LD format through the jsonld package. The package interface takes inspiration from the Python rdflib library.
SCUDO (Signature-based Clustering for Diagnostic Purposes) is a rank-based method for the analysis of gene expression profiles for diagnostic and classification purposes. It is based on the identification of sample-specific gene signatures composed of the most up- and down-regulated genes for that sample. Starting from gene expression data, functions in this package identify sample-specific gene signatures and use them to build a graph of samples. In this graph samples are joined by edges if they have a similar expression profile, according to a pre-computed similarity matrix. The similarity between the expression profiles of two samples is computed using a method similar to GSEA. The graph of samples can then be used to perform community clustering or to perform supervised classification of samples in a testing set.
MEDIPS was developed for analyzing data derived from methylated DNA immunoprecipitation (MeDIP) experiments followed by sequencing (MeDIP-seq). However, MEDIPS provides functionalities for the analysis of any kind of quantitative sequencing data (e.g. ChIP-seq, MBD-seq, CMS-seq and others) including calculation of differential coverage between groups of samples and saturation and correlation analysis.
This is an R package to make it easier to import and store phylogenetic trees with associated data; and to link external data from different sources to phylogeny. It also supports exporting phylogenetic trees with heterogeneous associated data to a single tree file and can be served as a platform for merging tree with associated data and converting file formats.