This package provides a comprehensive suite of tools for analyzing omics data. It includes functionalities for alpha diversity analysis, beta diversity analysis, differential abundance analysis, community assembly analysis, visualization of phylogenetic tree, and functional enrichment analysis. With a progressive approach, the package offers a range of analysis methods to explore and understand the complex communities. It is designed to support researchers and practitioners in conducting in-depth and professional omics data analysis.
Calculates, via simulation, power and appropriate stopping alpha boundaries (and/or futility bounds) for sequential analyses (i.e., group sequential design) as well as for multiple hypotheses (multiple tests included in an analysis), given any specified global error rate. This enables the sequential use of practically any significance test, as long as the underlying data can be simulated in advance to a reasonable approximation. Lukács (2022) <doi:10.21105/joss.04643>.
Implementation of SING algorithm to extract joint and individual non-Gaussian components from two datasets. SING uses an objective function that maximizes the skewness and kurtosis of latent components with a penalty to enhance the similarity between subject scores. Unlike other existing methods, SING does not use PCA for dimension reduction, but rather uses non-Gaussianity, which can improve feature extraction. Benjamin B.Risk, Irina Gaynanova (2021) <doi:10.1214/21-AOAS1466>.
An automatic cell type detection and assignment algorithm for single cell RNA-Seq and Cytof/FACS data. SCINA is capable of assigning cell type identities to a pool of cells profiled by scRNA-Seq or Cytof/FACS data with prior knowledge of markers, such as genes and protein symbols that are highly or lowly expressed in each category. See Zhang Z, et al (2019) <doi:10.3390/genes10070531> for more details.
Email Finder R Client Library. Search emails are based on the website You give one domain name and it returns all the email addresses found on the internet. Email Finder generates or retrieves the most likely email address from a domain name, a first name and a last name. Email verify checks the deliverability of a given email address, verifies if it has been found in our database, and returns their sources.
The BADER package is intended for the analysis of RNA sequencing data. The algorithm fits a Bayesian hierarchical model for RNA sequencing count data. BADER returns the posterior probability of differential expression for each gene between two groups A and B. The joint posterior distribution of the variables in the model can be returned in the form of posterior samples, which can be used for further down-stream analyses such as gene set enrichment.
This package is an R program for the subset-based analysis of heterogeneous traits and disease subtypes. ASSET allows the user to search through all possible subsets of z-scores to identify the subset of traits giving the best meta-analyzed z-score. Further, it returns a p-value adjusting for the multiple-testing involved in the search. It also allows for searching for the best combination of disease subtypes associated with each variant.
The atena package quantifies expression of TEs (transposable elements) from RNA-seq data through different methods, including ERVmap, TEtranscripts and Telescope. A common interface is provided to use each of these methods, which consists of building a parameter object, calling the quantification function with this object and getting a SummarizedExperiment object as an output container of the quantified expression profiles. The implementation allows quantifing TEs and gene transcripts in an integrated manner.
This Rcpp-based package implements a highly efficient data structure and algorithm for performing alignment of short reads from CRISPR or shRNA screens to reference barcode library. Sequencing error are considered and matching qualities are evaluated based on Phred scores. A Bayes classifier is employed to predict the originating barcode of a read. The package supports provision of user-defined probability models for evaluating matching qualities. The package also supports multi-threading.
PhILR is short for Phylogenetic Isometric Log-Ratio Transform. This package provides functions for the analysis of compositional data (e.g., data representing proportions of different variables/parts). Specifically this package allows analysis of compositional data where the parts can be related through a phylogenetic tree (as is common in microbiota survey data) and makes available the Isometric Log Ratio transform built from the phylogenetic tree and utilizing a weighted reference measure.
This package provides a unified and straightforward interface for performing a variety of meta-analysis methods directly from user data. Users can input a data frame, specify key parameters, and effortlessly execute and compare multiple common meta-analytic models. Designed for immediate usability, the package facilitates transparent, reproducible research without manual implementation of each analytical method. Ideal for researchers aiming for efficiency and reproducibility, it streamlines workflows from data preparation to results interpretation.
The Global Biodiversity Information Facility ('GBIF', <https://www.gbif.org>) sources data from an international network of data providers, known as nodes'. Several of these nodes - the "living atlases" (<https://living-atlases.gbif.org>) - maintain their own web services using software originally developed by the Atlas of Living Australia ('ALA', <https://www.ala.org.au>). galah enables the R community to directly access data and resources hosted by GBIF and its partner nodes.
This package provides a set of functions to estimate haziness of an image based on RGB bands. It returns a haze factor, varying from 0 to 1, a metric for fogginess and cloudiness. The package also presents additional functions to estimate brightness, darkness and contrast rasters of the RGB image. This package can be used for several applications such as inference of weather quality data and performing environmental studies from interpreting digital images.
Advanced methods for a valuable quantitative environmental risk assessment using Bayesian inference of survival and reproduction Data. Among others, it facilitates Bayesian inference of the general unified threshold model of survival (GUTS). See our companion paper Baudrot and Charles (2021) <doi:10.21105/joss.03200>, as well as complementary details in Baudrot et al. (2018) <doi:10.1021/acs.est.7b05464> and Delignette-Muller et al. (2017) <doi:10.1021/acs.est.6b05326>.
This is an EM algorithm based method for imputation of missing values in multivariate normal time series. The imputation algorithm accounts for both spatial and temporal correlation structures. Temporal patterns can be modeled using an ARIMA(p,d,q), optionally with seasonal components, a non-parametric cubic spline or generalized additive models with exogenous covariates. This algorithm is specially tailored for climate data with missing measurements from several monitors along a given region.
This package implements the procedure from G. J. Ross (2021) - "Nonparametric Detection of Multiple Location-Scale Change Points via Wild Binary Segmentation" <arxiv:2107.01742>. This uses a version of Wild Binary Segmentation to detect multiple location-scale (i.e. mean and/or variance) change points in a sequence of univariate observations, with a strict control on the probability of incorrectly detecting a change point in a sequence which does not contain any.
An estimation procedure for the analysis of nonparametric proportional hazards model (e.g. h(t) = h0(t)exp(b(t)'Z)), providing estimation of b(t) and its pointwise standard errors, and semiparametric proportional hazards model (e.g. h(t) = h0(t)exp(b(t)'Z1 + c*Z2)), providing estimation of b(t), c and their standard errors. More details can be found in Lu Tian et al. (2005) <doi:10.1198/016214504000000845>.
Offers a streamlined programmatic interface to Ordnance Survey's British National Grid (BNG) index system, enabling efficient spatial indexing and analysis based on grid references. It supports a range of geospatial applications, including statistical aggregation, data visualisation, and interoperability across datasets. Designed for developers and analysts working with geospatial data in Great Britain, osbng simplifies integration with geospatial workflows and provides intuitive tools for exploring the structure and logic of the BNG system.
Implementation of a likelihood ratio test of differential onset of senescence between two groups. Given two groups with measures of age and of an individual trait likely to be subjected to senescence (e.g. body mass), OnAge provides an asymptotic p-value for the null hypothesis that senescence starts at the same age in both groups. The package implements the procedure used in Douhard et al. (2017) <doi:10.1111/oik.04421>.
Estimate the size of a networked population based on respondent-driven sampling data. The package is part of the "RDS Analyst" suite of packages for the analysis of respondent-driven sampling data. See Handcock, Gile and Mar (2014) <doi:10.1214/14-EJS923>, Handcock, Gile and Mar (2015) <doi:10.1111/biom.12255>, Kim and Handcock (2021) <doi:10.1093/jssam/smz055>, and McLaughlin, et. al. (2023) <doi:10.1214/23-AOAS1807>.
This package provides functions to combine data.frames in ways that require additional effort in base R, and to add metadata (id, title, ...) that can be used for printing and xlsx export. The Tatoo_report class is provided as a convenient helper to write several such tables to a workbook, one table per worksheet. Tatoo is built on top of openxlsx', but intimate knowledge of that package is not required to use tatoo.
The Time-Delay Correlation algorithm (TDCor) reconstructs the topology of a gene regulatory network (GRN) from time-series transcriptomic data. The algorithm is described in details in Lavenus et al., Plant Cell, 2015. It was initially developed to infer the topology of the GRN controlling lateral root formation in Arabidopsis thaliana. The time-series transcriptomic dataset which was used in this study is included in the package to illustrate how to use it.
artMS provides a set of tools for the analysis of proteomics label-free datasets. It takes as input the MaxQuant search result output (evidence.txt file) and performs quality control, relative quantification using MSstats, downstream analysis and integration. artMS also provides a set of functions to re-format and make it compatible with other analytical tools, including, SAINTq, SAINTexpress, Phosfate, and PHOTON. Check [http://artms.org](http://artms.org) for details.
The purpose of the package is to identify prognostic biomarkers and an optimal numeric cutoff for each biomarker that can be used to stratify a group of test subjects (samples) into two sub-groups with significantly different survival (better vs. worse). The package was developed for the analysis of gene expression data, such as RNA-seq. However, it can be used with any quantitative variable that has a sufficiently large proportion of unique values.