Enter the query into the form above. You can look for specific version of a package by using @ symbol like this: gcc@10.
API method:
GET /api/packages?search=hello&page=1&limit=20
where search is your query, page is a page number and limit is a number of items on a single page. Pagination information (such as a number of pages and etc) is returned
in response headers.
If you'd like to join our channel webring send a patch to ~whereiseveryone/toys@lists.sr.ht adding your channel as an entry in channels.scm.
The store package provides a number of data store types that are useful for bioinformatic analysis.
PyEGA3 is a tool for viewing and downloading files from authorized EGA datasets. It uses the EGA data API and has several key features:
Files are transferred over secure https connections and received unencrypted, so no need for decryption after download.
Downloads resume from where they left off in the event that the connection is interrupted.
Supports file segmenting and parallelized download of segments, improving overall performance.
After download completes, file integrity is verified using checksums.
Implements the GA4GH-compliant htsget protocol for download of genomic ranges for data files with accompanying index files.
This package provides a a transcriptomic-based framework to dissect cell communication in a global manner. It integrates an original expert-curated database of ligand-receptor interactions taking into account multiple subunits expression. Based on transcriptomic profiles (gene expression), this package computes communication scores between cells and provides several visualization modes that can be helpful to dig into cell-cell interaction mechanism and extend biological knowledge.
Pypairix is a Python module for fast querying on a pairix-indexed bgzipped text file that contains a pair of genomic coordinates per line.
This package provides Python bindings for lib2bit to access 2bit files with Python.
Segemehl is software to map short sequencer reads to reference genomes. Segemehl implements a matching strategy based on enhanced suffix arrays (ESA). It accepts fasta and fastq queries (gzip'ed and bgzip'ed). In addition to the alignment of reads from standard DNA- and RNA-seq protocols, it also allows the mapping of bisulfite converted reads (Lister and Cokus) and implements a split read mapping strategy. The output of segemehl is a SAM or BAM formatted alignment file.
This package implements bindings for h5 files that are compatible with Bioconductor S4 data structures, namely the DataFrame and DelayedArray. This allows HDF5-backed data to be easily used as data frames with arbitrary sets of columns.
Tombo is a suite of tools primarily for the identification of modified nucleotides from nanopore sequencing data. Tombo also provides tools for the analysis and visualization of raw nanopore signal.
SeqGL is a group lasso based algorithm to extract transcription factor sequence signals from ChIP, DNase and ATAC-seq profiles. This package presents a method which uses group lasso to discriminate between bound and non bound genomic regions to accurately identify transcription factors bound at the specific regions.
A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences. In order to use the program, the user submits a sequence in FASTA format. The output consists of two files: a repeat table file and an alignment file. Submitted sequences may be of arbitrary length. Repeats with pattern size in the range from 1 to 2000 bases are detected.
MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion. Intuitively, MOFA can be viewed as a versatile and statistically rigorous generalization of principal component analysis to multi-omics data. Given several data matrices with measurements of multiple -omics data types on the same or on overlapping sets of samples, MOFA infers an interpretable low-dimensional representation in terms of a few latent factors. These learnt factors represent the driving sources of variation across data modalities, thus facilitating the identification of cellular states or disease subgroups.
Centrifuge is a microbial classification engine that enables rapid, accurate and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.7 GB for all complete bacterial and viral genomes plus the human genome) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes.
The Spliced Transcripts Alignment to a Reference (STAR) software is based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences.
WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible.
WebLogo can create output in several common graphics' formats, including the bitmap formats GIF and PNG, suitable for on-screen display, and the vector formats EPS and PDF, more suitable for printing, publication, and further editing. Additional graphics options include bitmap resolution, titles, optional axis, and axis labels, antialiasing, error bars, and alternative symbol formats.
A sequence logo is a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. The width of the stack is proportional to the fraction of valid symbols in that position.
CIRI-long is a package for circular RNA identification using long-read sequencing data.
This package provides a companion annotation file to the IlluminaHumanMethylationEPICmanifest package based on the same annotation 1.0B5.
An interval tree can be used to efficiently find a set of numeric intervals overlapping or containing another interval. This library provides a basic implementation of an interval tree using C++ templates, allowing the insertion of arbitrary types into the tree.
This package provides extra utility functions to perform common tasks in the analysis of omics data, leveraging and enhancing features provided by Bioconductor packages.
This package provides a deconvolution based on Single Nucleotide Position (SNP) for multiplexed scRNA-seq data. The name vireo stand for Variational Inference for Reconstructing Ensemble Origin by expressed SNPs in multiplexed scRNA-seq data and follows the clone identification from single-cell data named cardelino.
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
Trinity assembles transcript sequences from Illumina RNA-Seq data. Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
Telomerecat is a tool for estimating the average telomere length (TL) for a paired end, whole genome sequencing (WGS) sample.
Telomerecat is adaptable, accurate and fast. The algorithm accounts for sequencing amplification artifacts, anneouploidy (common in cancer samples) and noise generated by WGS. For a high coverage WGS BAM file of around 100GB telomerecat can produce an estimate in ~1 hour.
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences. The main processing of such FASTA/FASTQ files is mapping the sequences to reference genomes. However, it is sometimes more productive to preprocess the files before mapping the sequences to the genome---manipulating the sequences to produce better mapping results. The FASTX-Toolkit tools perform some of these preprocessing tasks.
This package aims to simplify working with genomic region / interval data by providing a common interface that lets you access a wide selection of file types and formats for handling genomic region data---all using the same syntax.