Enter the query into the form above. You can look for specific version of a package by using @ symbol like this: gcc@10.
API method:
GET /api/packages?search=hello&page=1&limit=20
where search is your query, page is a page number and limit is a number of items on a single page. Pagination information (such as a number of pages and etc) is returned
in response headers.
If you'd like to join our channel webring send a patch to ~whereiseveryone/toys@lists.sr.ht adding your channel as an entry in channels.scm.
Isolator analyzes RNA-Seq experiments. Isolator has a particular focus on producing stable, consistent estimates. It implements a full hierarchical Bayesian model of an entire RNA-Seq experiment. It saves all the samples generated by the sampler, which can be processed to compute posterior probabilities for arbitrarily complex questions, far beyond the confines of pairwise tests. It aggressively corrects for technical effects, such as random priming bias, GC-bias, 3' bias, and fragmentation effects. Compared to other MCMC approaches, it is exceedingly efficient, though generally slower than modern maximum likelihood approaches.
ChIPKernels is an R package for building different string kernels used for DNA Sequence analysis. A dictionary of the desired kernel must be built and this dictionary can be used for determining kernels for DNA Sequences.
This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays.
CPAT is a method to distinguish coding and noncoding RNA by using a logistic regression model based on four pure sequence-based, linguistic features: ORF size, ORF coverage, Ficket TESTCODE, and Hexamer usage bias. Linguistic features based method does not require other genomes or protein databases to perform alignment and is more robust. Because it is alignment-free, it runs much faster and also easier to use.
The SCDE package implements a set of statistical methods for analyzing single-cell RNA-seq data. SCDE fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The SCDE package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells.
Circus is an R package for annotation, analysis and visualization of circRNA data. Users can annotate their circRNA candidates with host genes, gene features they are spliced from, and discriminate between known and yet unknown splice junctions. Circular-to-linear ratios of circRNAs can be calculated, and a number of descriptive plots easily generated.
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It parses both FASTA and FASTQ files which can be optionally compressed by gzip.
SlamDunk is a fully automated tool for automated, robust, scalable and reproducible SLAMseq data analysis. Diagnostic plotting features and a MultiQC plugin will make your SLAMseq data ready for immediate QA and interpretation.
HTSJDK is an implementation of a unified Java library for accessing common file formats, such as SAM and VCF, used for high-throughput sequencing (HTS) data. There are also an number of useful utilities for manipulating HTS data.
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
This package implements parallel block gzip. For many formats, in particular genomics data formats, data are compressed in fixed-length blocks such that they can be easily indexed based on a (genomic) coordinate order, since typically each block is sorted according to this order. This allows for each block to be individually compressed (deflated), or more importantly, decompressed (inflated), with the latter enabling random retrieval of data in large files (gigabytes to terabytes). pbgzip is not limited to any particular format, but certain features are tailored to genomics data formats when enabled. Parallel decompression is somewhat faster, but the true speedup comes during compression.
twobitreader is a Python library for reading .2bit files as used by the UCSC genome browser.
This package provides an automated pipeline for spatial mapping of unique transcripts.
This package provides a tool for identifying and removing doublets in single-cell RNA-seq data.
Mudskipper is a tool for projecting genomic alignments to transcriptomic coordinates.
This is a package that lets you process UMI-4C data from scratch to produce nice plots.
Smithlab CPP is a C++ library that includes functions used in many of the Smith lab bioinformatics projects, such as a wrapper around Samtools data structures, classes for genomic regions, mapped sequencing reads, etc.
Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools.
Entrez Direct (EDirect) is a method for accessing the National Center for Biotechnology Information's (NCBI) set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a terminal. Functions take search terms from command-line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
EDirect also provides an argument-driven function that simplifies the extraction of data from document summaries or other results that are returned in structured XML format. This can eliminate the need for writing custom software to answer ad hoc questions.
t-Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization of high dimensional datasets. A popular implementation of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. This implementation differs in these ways:
Instead of approximating the N-body simulation using Barnes-Hut, we interpolate onto an equispaced grid and use FFT to perform the convolution.
Instead of computing nearest neighbors using vantage-point trees, we approximate nearest neighbors using the Annoy library. The neighbor lookups are multithreaded to take advantage of machines with multiple cores.
Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.
This package provides Python bindings to the UCSC Big Binary (bigWig/bigBed) file library. This provides read-level access to local and remote bigWig and bigBed files but no write capabilitites. The main feature is fast retrieval of range queries into numpy arrays.
RSeQC provides a number of modules that can comprehensively evaluate high throughput sequence data, especially RNA-seq data. Some basic modules inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, etc.
ParDRe is a parallel tool to remove duplicate genetic sequence reads. Duplicate reads can be seen as identical or nearly identical sequences with some mismatches. This tool lets users avoid the analysis of unnecessary reads, reducing the time of subsequent procedures with the dataset (e.g. assemblies, mappings, etc.). The tool is implemented with MPI in order to exploit the parallel capabilities of multicore clusters. It is faster than multithreaded counterparts (end of 2015) for the same number of cores and, thanks to the message-passing technology, it can be executed on clusters.