This Rcpp-based package implements a highly efficient data structure and algorithm for performing alignment of short reads from CRISPR or shRNA screens to reference barcode library. Sequencing error are considered and matching qualities are evaluated based on Phred scores. A Bayes classifier is employed to predict the originating barcode of a read. The package supports provision of user-defined probability models for evaluating matching qualities. The package also supports multi-threading.
PhILR is short for Phylogenetic Isometric Log-Ratio Transform. This package provides functions for the analysis of compositional data (e.g., data representing proportions of different variables/parts). Specifically this package allows analysis of compositional data where the parts can be related through a phylogenetic tree (as is common in microbiota survey data) and makes available the Isometric Log Ratio transform built from the phylogenetic tree and utilizing a weighted reference measure.
This package provides a unified and straightforward interface for performing a variety of meta-analysis methods directly from user data. Users can input a data frame, specify key parameters, and effortlessly execute and compare multiple common meta-analytic models. Designed for immediate usability, the package facilitates transparent, reproducible research without manual implementation of each analytical method. Ideal for researchers aiming for efficiency and reproducibility, it streamlines workflows from data preparation to results interpretation.
The Global Biodiversity Information Facility ('GBIF', <https://www.gbif.org>) sources data from an international network of data providers, known as nodes'. Several of these nodes - the "living atlases" (<https://living-atlases.gbif.org>) - maintain their own web services using software originally developed by the Atlas of Living Australia ('ALA', <https://www.ala.org.au>). galah enables the R community to directly access data and resources hosted by GBIF and its partner nodes.
This package provides a set of functions to estimate haziness of an image based on RGB bands. It returns a haze factor, varying from 0 to 1, a metric for fogginess and cloudiness. The package also presents additional functions to estimate brightness, darkness and contrast rasters of the RGB image. This package can be used for several applications such as inference of weather quality data and performing environmental studies from interpreting digital images.
This is an EM algorithm based method for imputation of missing values in multivariate normal time series. The imputation algorithm accounts for both spatial and temporal correlation structures. Temporal patterns can be modeled using an ARIMA(p,d,q), optionally with seasonal components, a non-parametric cubic spline or generalized additive models with exogenous covariates. This algorithm is specially tailored for climate data with missing measurements from several monitors along a given region.
Advanced methods for a valuable quantitative environmental risk assessment using Bayesian inference of survival and reproduction Data. Among others, it facilitates Bayesian inference of the general unified threshold model of survival (GUTS). See our companion paper Baudrot and Charles (2021) <doi:10.21105/joss.03200>, as well as complementary details in Baudrot et al. (2018) <doi:10.1021/acs.est.7b05464> and Delignette-Muller et al. (2017) <doi:10.1021/acs.est.6b05326>.
This package implements the procedure from G. J. Ross (2021) - "Nonparametric Detection of Multiple Location-Scale Change Points via Wild Binary Segmentation" <arxiv:2107.01742>. This uses a version of Wild Binary Segmentation to detect multiple location-scale (i.e. mean and/or variance) change points in a sequence of univariate observations, with a strict control on the probability of incorrectly detecting a change point in a sequence which does not contain any.
An estimation procedure for the analysis of nonparametric proportional hazards model (e.g. h(t) = h0(t)exp(b(t)'Z)), providing estimation of b(t) and its pointwise standard errors, and semiparametric proportional hazards model (e.g. h(t) = h0(t)exp(b(t)'Z1 + c*Z2)), providing estimation of b(t), c and their standard errors. More details can be found in Lu Tian et al. (2005) <doi:10.1198/016214504000000845>.
Offers a streamlined programmatic interface to Ordnance Survey's British National Grid (BNG) index system, enabling efficient spatial indexing and analysis based on grid references. It supports a range of geospatial applications, including statistical aggregation, data visualisation, and interoperability across datasets. Designed for developers and analysts working with geospatial data in Great Britain, osbng simplifies integration with geospatial workflows and provides intuitive tools for exploring the structure and logic of the BNG system.
Implementation of a likelihood ratio test of differential onset of senescence between two groups. Given two groups with measures of age and of an individual trait likely to be subjected to senescence (e.g. body mass), OnAge provides an asymptotic p-value for the null hypothesis that senescence starts at the same age in both groups. The package implements the procedure used in Douhard et al. (2017) <doi:10.1111/oik.04421>.
Estimate the size of a networked population based on respondent-driven sampling data. The package is part of the "RDS Analyst" suite of packages for the analysis of respondent-driven sampling data. See Handcock, Gile and Mar (2014) <doi:10.1214/14-EJS923>, Handcock, Gile and Mar (2015) <doi:10.1111/biom.12255>, Kim and Handcock (2021) <doi:10.1093/jssam/smz055>, and McLaughlin, et. al. (2023) <doi:10.1214/23-AOAS1807>.
This package provides functions to combine data.frames in ways that require additional effort in base R, and to add metadata (id, title, ...) that can be used for printing and xlsx export. The Tatoo_report class is provided as a convenient helper to write several such tables to a workbook, one table per worksheet. Tatoo is built on top of openxlsx', but intimate knowledge of that package is not required to use tatoo.
The Time-Delay Correlation algorithm (TDCor) reconstructs the topology of a gene regulatory network (GRN) from time-series transcriptomic data. The algorithm is described in details in Lavenus et al., Plant Cell, 2015. It was initially developed to infer the topology of the GRN controlling lateral root formation in Arabidopsis thaliana. The time-series transcriptomic dataset which was used in this study is included in the package to illustrate how to use it.
artMS provides a set of tools for the analysis of proteomics label-free datasets. It takes as input the MaxQuant search result output (evidence.txt file) and performs quality control, relative quantification using MSstats, downstream analysis and integration. artMS also provides a set of functions to re-format and make it compatible with other analytical tools, including, SAINTq, SAINTexpress, Phosfate, and PHOTON. Check [http://artms.org](http://artms.org) for details.
The purpose of the package is to identify prognostic biomarkers and an optimal numeric cutoff for each biomarker that can be used to stratify a group of test subjects (samples) into two sub-groups with significantly different survival (better vs. worse). The package was developed for the analysis of gene expression data, such as RNA-seq. However, it can be used with any quantitative variable that has a sufficiently large proportion of unique values.
It is an easy-to-use GUI using disease information for detecting tumor/normal sample discriminating gene sets from differentially expressed genes. Our approach is based on an iterative algorithm filtering genes with disease ontology enrichment analysis and wilk and wilks lambda criterion connected to SVM classification model construction. Along with gene set extraction, SVMDO also provides individual prognostic marker detection. The algorithm is designed for FPKM and RPKM normalized RNA-Seq transcriptome datasets.
Analyzes autocorrelation and partial autocorrelation using surrogate methods and bootstrapping, and computes the acceleration constants for the vectorized moving block bootstrap provided by this package. It generates percentile, bias-corrected, and accelerated intervals and estimates partial autocorrelations using Durbin-Levinson. This package calculates the autocorrelation power spectrum, computes cross-correlations between two time series, computes bandwidth for any time series, and performs autocorrelation frequency analysis. It also calculates the periodicity of a time series.
Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, bdpar allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, YouTube comments).
An implementation of the k-means-- algorithm proposed by Chawla and Gionis, 2013 in their paper, "k-means-- : A unified approach to clustering and outlier detection. SIAM International Conference on Data Mining (SDM13)", <doi:10.1137/1.9781611972832.21> and using ordering described by Howe, 2013 in the thesis, Clustering and anomaly detection in tropical cyclones". Useful for creating (potentially) tighter clusters than standard k-means and simultaneously finding outliers inexpensively in multidimensional space.
Algorithms for solving various Maximum Weight Connected Subgraph Problems, including variants with budget constraints, cardinality constraints, weighted edges and signals. The package represents an R interface to high-efficient solvers based on relax-and-cut approach (Ã lvarez-Miranda E., Sinnl M. (2017) <doi:10.1016/j.cor.2017.05.015>) mixed-integer programming (Loboda A., Artyomov M., and Sergushichev A. (2016) <doi:10.1007/978-3-319-43681-4_17>) and simulated annealing.
Convert Markdown ('.md') or R Markdown ('.Rmd') texts, R scripts, directory structures, and other hierarchical structured documents into mind map widgets or Freemind codes or Mermaid mind map codes, and vice versa. Freemind mind map ('.mm') files can be opened by or imported to common mind map software such as Freemind (<https://freemind.sourceforge.io/wiki/index.php/Main_Page>). Mermaid mind map codes (<https://mermaid.js.org/>) can be directly embedded in documents.
Functions, data sets and examples for the book: Yves Croissant (2025) "Microeconometrics with R", Chapman and Hall/CRC The R Series <doi:10.1201/9781003100263>. The package includes a set of estimators for models used in microeconometrics, especially for count data and limited dependent variables. Test functions include score test, Hausman test, Vuong test, Sargan test and conditional moment test. A small subset of the data set used in the book is also included.
An adaptation of Non-dominated Sorting Genetic Algorithm III for multi objective feature selection tasks. Non-dominated Sorting Genetic Algorithm III is a genetic algorithm that solves multiple optimization problems simultaneously by applying a non-dominated sorting technique. It uses a reference points based selection operator to explore solution space and preserve diversity. See the original paper by K. Deb and H. Jain (2014) <DOI:10.1109/TEVC.2013.2281534> for a detailed description.