mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.
This package implements optimal matching with near-fine balance in large observational studies with the use of optimal calipers to get a sparse network. The caliper is optimal in the sense that it is as small as possible such that a matching exists. The main functions in the bigmatch package are optcal()
to find the optimal caliper, optconstant()
to find the optimal number of nearest neighbors, and nfmatch()
to find a near-fine balance match with a caliper and a restriction on the number of nearest neighbors. Yu, R., Silber, J. H., and Rosenbaum, P. R. (2020). <DOI:10.1214/19-sts699>.
Analyze data from next-generation sequencing experiments on genomic samples. CLONETv2 offers a set of functions to compute allele specific copy number and clonality from segmented data and SNPs position pileup. The package has also calculated the clonality of single nucleotide variants given read counts at mutated positions. The package has been developed at the laboratory of Computational and Functional Oncology, Department of CIBIO, University of Trento (Italy), under the supervision of prof Francesca Demichelis. References: Prandi et al. (2014) <doi:10.1186/s13059-014-0439-6>; Carreira et al. (2014) <doi:10.1126/scitranslmed.3009448>; Romanel et al. (2015) <doi:10.1126/scitranslmed.aac9511>.
This package provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt
specification. The BagIt
specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.
This package implements choice models based on economic theory, including estimation using Markov chain Monte Carlo (MCMC), prediction, and more. Its usability is inspired by ideas from tidyverse'. Models include versions of the Hierarchical Multinomial Logit and Multiple Discrete-Continous (Volumetric) models with and without screening. The foundations of these models are described in Allenby, Hardt and Rossi (2019) <doi:10.1016/bs.hem.2019.04.002>. Models with conjunctive screening are described in Kim, Hardt, Kim and Allenby (2022) <doi:10.1016/j.ijresmar.2022.04.001>. Models with set-size variation are described in Hardt and Kurz (2020) <doi:10.2139/ssrn.3418383>.
This package provides a toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics
HLA GitHub
repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism
Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.
This package provides functionality for working with raster-like quadtrees (also called â region quadtreesâ ), which allow for variable-sized cells. The package allows for flexibility in the quadtree creation process. Several functions defining how to split and aggregate cells are provided, and custom functions can be written for both of these processes. In addition, quadtrees can be created using other quadtrees as â templatesâ , so that the new quadtree's structure is identical to the template quadtree. The package also includes functionality for modifying quadtrees, querying values, saving quadtrees to a file, and calculating least-cost paths using the quadtree as a resistance surface.
RcppArmadillo
implementation for the Matlab code of the Variational Mode Decomposition and Two-Dimensional Variational Mode Decomposition'. For more information, see (i) Variational Mode Decomposition by K. Dragomiretskiy and D. Zosso in IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 531-544, Feb.1, 2014, <doi:10.1109/TSP.2013.2288675>; (ii) Two-Dimensional Variational Mode Decomposition by Dragomiretskiy, K., Zosso, D. (2015), In: Tai, XC., Bae, E., Chan, T.F., Lysaker, M. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2015. Lecture Notes in Computer Science, vol 8932. Springer, <doi:10.1007/978-3-319-14612-6_15>.
Annotates data from liquid chromatography coupled to mass spectrometry (LC/MS) metabolomics experiments. Based on a network algorithm (O.Senan, A. Aguilar- Mogas, M. Navarro, O. Yanes, R.Guimerà and M. Sales-Pardo, Bioinformatics, 35(20), 2019), CliqueMS
builds a weighted similarity network where nodes are features and edges are weighted according to the similarity of this features. Then it searches for the most plausible division of the similarity network into cliques (fully connected components). Finally it annotates metabolites within each clique, obtaining for each annotated metabolite the neutral mass and their features, corresponding to isotopes, ionization adducts and fragmentation adducts of that metabolite.
seqArchR
enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR
does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR
uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.
Allows access to selected services that are part of the Google Adwords API <https://developers.google.com/adwords/api/docs/guides/start>. Google Adwords is an online advertising service by Google', that delivers Ads to users. This package offers a authentication process using OAUTH2'. Currently, there are two methods of data of accessing the API, depending on the type of request. One method uses SOAP requests which require building an XML structure and then sent to the API. These are used for the ManagedCustomerService
and the TargetingIdeaService
'. The second method is by building AWQL queries for the reporting side of the Google Adwords API.
Simplify bivariate and regression analyses by automating result generation, including summary tables, statistical tests, and customizable graphs. It supports tests for continuous and dichotomous data, as well as stepwise regression for linear, logistic, and Firth penalized logistic models. While not a substitute for tailored analysis, BiVariAn
accelerates workflows and is expanding features like multilingual interpretations of results.The methods for selecting significant statistical tests, as well as the predictor selection in prediction functions, can be referenced in the works of Marc Kery (2003) <doi:10.1890/0012-9623(2003)84[92:NORDIG]2.0.CO;2> and Rainer Puhr (2017) <doi:10.1002/sim.7273>.
Implementation of the double/debiased machine learning framework of Chernozhukov et al. (2018) <doi:10.1111/ectj.12097> for partially linear regression models, partially linear instrumental variable regression models, interactive regression models and interactive instrumental variable regression models. DoubleML
allows estimation of the nuisance parts in these models by machine learning methods and computation of the Neyman orthogonal score functions. DoubleML
is built on top of mlr3 and the mlr3 ecosystem. The object-oriented implementation of DoubleML
based on the R6 package is very flexible. More information available in the publication in the Journal of Statistical Software: <doi:10.18637/jss.v108.i03>.
There are many different formats dates are commonly represented with: the order of day, month, or year can differ, different separators ("-", "/", or whitespace) can be used, months can be numerical, names, or abbreviations and year given as two digits or four. datefixR
takes dates in all these different formats and converts them to R's built-in date class. If datefixR
cannot standardize a date, such as because it is too malformed, then the user is told which date cannot be standardized and the corresponding ID for the row. datefixR
also allows the imputation of missing days and months with user-controlled behavior.
We implement various classical tests for the composite hypothesis of testing the fit to the family of gamma distributions as the Kolmogorov-Smirnov test, the Cramer-von Mises test, the Anderson Darling test and the Watson test. For each test a parametric bootstrap procedure is implemented, as considered in Henze, Meintanis & Ebner (2012) <doi:10.1080/03610926.2010.542851>. The recent procedures presented in Henze, Meintanis & Ebner (2012) <doi:10.1080/03610926.2010.542851> and Betsch & Ebner (2019) <doi:10.1007/s00184-019-00708-7> are implemented. Estimation of parameters of the gamma law are implemented using the method of Bhattacharya (2001) <doi:10.1080/00949650108812100>.
Converts table-like objects to stand-alone PDF or PNG. Can be used to embed tables and arbitrary content in PDF or Word documents. Provides a low-level R interface for creating LaTeX
code, e.g. command()
and a high-level interface for creating PDF documents, e.g. as.pdf.data.frame()
. Extensive customization is available via mid-level functions, e.g. as.tabular()
. See also package?latexpdf'. Support for PNG is experimental; see as.png.data.frame'. Adapted from metrumrg <https://r-forge.r-project.org/R/?group_id=1215>. Requires a compatible installation of pdflatex', e.g. <https://miktex.org/>.
This function obtains a Random Number Generator (RNG) or collection of RNGs that replicate the required parameter(s) of a distribution for a time series of data. Consider the case of reproducing a time series data set of size 20 that uses an autoregressive (AR) model with phi = 0.8 and standard deviation equal to 1. When one checks the arima.sin()
function's estimated parameters, it's possible that after a single trial or a few more, one won't find the precise parameters. This enables one to look for the ideal RNG setting for a simulation that will accurately duplicate the desired parameters.
DEsingle is an R package for differential expression (DE) analysis of single-cell RNA-seq (scRNA-seq
) data. It defines and detects 3 types of differentially expressed genes between two groups of single cells, with regard to different expression status (DEs), differential expression abundance (DEa), and general differential expression (DEg). DEsingle employs Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect the 3 types of DE genes. Results showed that DEsingle outperforms existing methods for scRNA-seq
DE analysis, and can reveal different types of DE genes that are enriched in different biological functions.
This package is Cytometry dATa anALYSis Tools (CATALYST). Mass cytometry like Cytometry by time of flight (CyTOF) uses heavy metal isotopes rather than fluorescent tags as reporters to label antibodies, thereby substantially decreasing spectral overlap and allowing for examination of over 50 parameters at the single cell level. While spectral overlap is significantly less pronounced in CyTOF than flow cytometry, spillover due to detection sensitivity, isotopic impurities, and oxide formation can impede data interpretability. CATALYST
was designed to provide a pipeline for preprocessing of cytometry data, including:
normalization using bead standards;
single-cell deconvolution;
bead-based compensation.
This package provides a Swiss Army knife of utility functions for users of the Analytic Hierarchy Process (AHP) which will help you to assess the consistency of a PCM as well as to improve its consistency ratio, to compute the sensitivity of a PCM, create a logical (as distinct from a random PCM) from preferences provided for the alternatives, and a function that helps evaluate the actual consistency of a PCM based on objective, fair bench marking. The various functions in the toolkit additionally provide the flexibility to users to specify only the upper triangular comparison ratios of the PCM in order to performs its assigned task.
This package provides tools for building reinforcement learning (RL) models specifically tailored for Two-Alternative Forced Choice (TAFC) tasks, commonly employed in psychological research. These models build upon the foundational principles of model-free reinforcement learning detailed in Sutton and Barto (1998) <ISBN:0262039249>. The package allows for the intuitive definition of RL models using simple if-else statements. Our approach to constructing and evaluating these computational models is informed by the guidelines proposed in Wilson & Collins (2019) <doi:10.7554/eLife.49547>
. Example datasets included with the package are sourced from the work of Mason et al. (2024) <doi:10.3758/s13423-023-02415-x>.
This package provides functions and a workflow to easily and powerfully calculating specificity, sensitivity and ROC curves of biomarkers combinations. Allows to rank and select multi-markers signatures as well as to find the best performing sub-signatures, now also from single-cell RNA-seq datasets. The method used was first published as a Shiny app and described in Mazzara et al. (2017) <doi:10.1038/srep45477> and further described in Bombaci & Rossi (2019) <doi:10.1007/978-1-4939-9164-8_16>, and widely expanded as a package as presented in the bioRxiv
pre print Ferrari et al. <doi:10.1101/2022.01.17.476603>.
Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on rmarkdown partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.
Intended to analyse recordings from multiple microphones (e.g., backpack microphones in captive setting). It allows users to align recordings even if there is non-linear drift of several minutes between them. A call detection and assignment pipeline can be used to find vocalisations and assign them to the vocalising individuals (even if the vocalisation is picked up on multiple microphones). The tracing and measurement functions allow for detailed analysis of the vocalisations and filtering of noise. Finally, the package includes a function to run spectrographic cross correlation, which can be used to compare vocalisations. It also includes multiple other functions related to analysis of vocal behaviour.