Polytect is an advanced computational tool designed for the analysis of multi-color digital PCR data. It provides automatic clustering and labeling of partitions into distinct groups based on clusters first identified by the flowPeaks algorithm. Polytect is particularly useful for researchers in molecular biology and bioinformatics, enabling them to gain deeper insights into their experimental results through precise partition classification and data visualization.
Dalli is a high performance pure Ruby client for accessing memcached servers. Dalli supports:
Simple and complex memcached configurations
Fail-over between memcached instances
Fine-grained control of data serialization and compression
Thread-safe operation
SSL/TLS connections to memcached
SASL authentication.
The name is a variant of Salvador Dali for his famous painting The Persistence of Memory.
Efficient C++ optimized functions for numerical and symbolic calculus. It includes basic symbolic arithmetic, tensor calculus, Einstein summing convention, fast computation of the Levi-Civita symbol and generalized Kronecker delta, Taylor series expansion, multivariate Hermite polynomials, accurate high-order derivatives, differential operators (Gradient, Jacobian, Hessian, Divergence, Curl, Laplacian) and numerical integration in arbitrary orthogonal coordinate systems: cartesian, polar, spherical, cylindrical, parabolic or user defined by custom scale factors.
Enrich your ggplots with group-wise comparisons. This package provides an easy way to indicate if two groups are significantly different. Commonly this is shown by a bracket on top connecting the groups of interest which itself is annotated with the level of significance. The package provides a single layer that takes the groups for comparison and the test as arguments and adds the annotation to the plot.
This package provides a fast, flexible, and comprehensive framework for quantitative text analysis in R. It provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
This library lets you write interactive programs without callbacks or side-effects. Functional Reactive Programming (FRP) uses composable events and time-varying values to describe interactive systems as pure functions. Just like other pure functional code, functional reactive code is easier to get right on the first try, maintain, and reuse. Reflex is a fully-deterministic, higher-order FRP interface and an engine that efficiently implements that interface.
SPsimSeq uses a specially designed exponential family for density estimation to constructs the distribution of gene expression levels from a given real RNA sequencing data (single-cell or bulk), and subsequently simulates a new dataset from the estimated marginal distributions using Gaussian-copulas to retain the dependence between genes. It allows simulation of multiple groups and batches with any required sample size and library size.
This package offers functions to process multiple ChIP-seq BAM files and detect allele-specific events. It computes allele counts at individual variants (SNPs/SNVs), implements extensive QC (quality control) steps to remove problematic variants, and utilizes a Bayesian framework to identify statistically significant allele-specific events. BaalChIP is able to account for copy number differences between the two alleles, a known phenotypical feature of cancer samples.
The first day of any MMWR week is Sunday. MMWR week numbering is sequential beginning with 1 and incrementing with each week to a maximum of 52 or 53. MMWR week #1 of an MMWR year is the first week of the year that has at least four days in the calendar year. This package provides functionality to convert dates to MMWR day, week, and year and the reverse.
Phylogenetic clustering (phyloclustering) is an evolutionary continuous time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance and interfaced with R for visualization.
squallms is a Bioconductor R package that implements a "semi-labeled" approach to untargeted mass spectrometry data. It pulls in raw data from mass-spec files to calculate several metrics that are then used to label MS features in bulk as high or low quality. These metrics of peak quality are then passed to a simple logistic model that produces a fully-labeled dataset suitable for downstream analysis.
MsImpute is a package for imputation of peptide intensity in proteomics experiments. It additionally contains tools for MAR/MNAR diagnosis and assessment of distortions to the probability distribution of the data post imputation. The missing values are imputed by low-rank approximation of the underlying data matrix if they are MAR (method = "v2"), by Barycenter approach if missingness is MNAR ("v2-mnar"), or by Peptide Identity Propagation (PIP).
This package provides next-generation sequencing (NGS) and mass spectrometry (MS) sample data, code snippets and replication material used for developing NestLink. The NestLink approach is a protein binder selection and identification technology able to biophysically characterize thousands of library members at once without handling individual clones at any stage of the process. Data were acquired on NGS and MS platforms at the Functional Genomics Center Zurich.
srnadiff is a package that finds differently expressed regions from RNA-seq data at base-resolution level without relying on existing annotation. To do so, the package implements the identify-then-annotate methodology that builds on the idea of combining two pipelines approachs differential expressed regions detection and differential expression quantification. It reads BAM files as input, and outputs a list differentially regions, together with the adjusted p-values.
This package implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.
This is an R package for spell checking common document formats including LaTeX, markdown, manual pages, and DESCRIPTION files. It includes utilities to automate checking of documentation and vignettes as a unit test during R CMD check. Both British and American English are supported out of the box and other languages can be added. In addition, packages may define a wordlist to allow custom terminology without having to abuse punctuation.
This package provides p-values in type I, II or III anova and summary tables for lmer model fits via Satterthwaite's degrees of freedom method. A Kenward-Roger method is also available via the pbkrtest package. Model selection methods include step, drop1 and anova-like tables for random effects (ranova). Methods for Least-Square means (LS-means) and tests of linear contrasts of fixed effects are also available.
This is a set of tools for dendrograms and tree plots using ggplot2. The ggplot2 philosophy is to clearly separate data from the presentation. Unfortunately the plot method for dendrograms plots directly to a plot device with out exposing the data. The ggdendro package resolves this by making available functions that extract the dendrogram plot data. The package provides implementations for tree, rpart, as well as diana and agnes cluster diagrams.
This package implements a simple key-value style database where character string keys are associated with data values that are stored on the disk. A simple interface is provided for inserting, retrieving, and deleting data from the database. Utilities are provided that allow filehash databases to be treated much like environments and lists are already used in R. These utilities are provided to encourage interactive and exploratory analysis on large datasets.
Identification of aberrant gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Furthermore, OUTRIDER provides useful plotting functions to analyze and visualize the results.
Starting from one SBML file, it extracts information from each listOfCompartments, listOfSpecies and listOfReactions element by saving them into data frames. Each table provides one row for each entity (i.e. either compartment, species, reaction or speciesReference) and one set of columns for the attributes, one column for the content of the notes subelement and one set of columns for the content of the annotation subelement.
Finding an optimal Bayesian experimental design involves maximizing an objective function given by the expectation of some appropriately chosen utility function with respect to the joint distribution of unknown quantities (including responses). This objective function is usually not available in closed form and the design space can be continuous and of high dimensionality. This package uses Approximate Coordinate Exchange (ACE) to maximise an approximation to the expectation of the utility function.
Monocle 3 performs clustering, differential expression and trajectory analysis for single-cell expression experiments. It orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. Monocle 3 also performs differential expression analysis, clustering, visualization, and other useful tasks on single-cell expression data. It is designed to work with RNA-Seq data, but could be used with other types as well.
Tree based algorithms can be improved by introducing boosting frameworks. LightGBM is one such framework, based on Ke, Guolin et al. (2017). This package offers an R interface to work with it. It is designed to be distributed and efficient with the following goals:
Faster training speed and higher efficiency;
lower memory usage;
better accuracy;
parallel learning supported; and
capable of handling large-scale data.