Various definitions for a high-dimensional median exist and this Python package provides a number of fast implementations of these definitions. Medians are extremely useful due to their high breakdown point (up to 50% contamination) and have a number of nice applications in machine learning, computer vision, and high-dimensional statistics.
Logomaker is a Python package for generating publication-quality sequence logos. Logomaker can generate both standard and highly customized logos illustrating the properties of DNA, RNA, or protein sequences. Logos are rendered as vector graphics embedded within native matplotlib Axes objects, making them easy to style and incorporate into multi-panel figures.
The metacells package implements the improved metacell algorithm for single-cell RNA sequencing (scRNA-seq) data analysis within the scipy framework, and projection algorithm based on it. The original metacell algorithm was implemented in R. The Python package contains various algorithmic improvements and is scalable for larger data sets (millions of cells).
This package is a port of Python's built-in functools.lru_cache function for asyncio. To better handle async behaviour, it also ensures multiple concurrent calls will only result in 1 call to the wrapped function, with all awaits receiving the result of that call when it completes.
MQTT and MQTT-SN are lightweight publish/subscribe messaging transports for TCP/IP and connection-less protocols (such as UDP). The Eclipse Paho project provides client side implementations of MQTT and MQTT-SN in a variety of programming languages. This package is for the Python implementation of an MQTT version client class.
aiostream provides a collection of stream operators that can be combined to create asynchronous pipelines of operations. It can be seen as an asynchronous version of itertools, although some aspects are slightly different. All the provided operators return a unified interface called a stream. A stream is an enhanced asynchronous iterable.
The nbconvert tool, jupyter nbconvert, converts notebooks to various other formats via Jinja templates. It allows you to convert an .ipynb notebook file into various static formats including:
HTML
LaTeX
PDF
Reveal JS
Markdown (md)
ReStructured Text (rst)
executable script
The asttokens module annotates Python abstract syntax trees (ASTs) with the positions of tokens and text in the source code that generated them. It makes it possible for tools that work with logical AST nodes to find the particular text that resulted in those nodes, for example for automated refactoring or highlighting.
pycountry provides the ISO databases for the standards:
639-3 (Languages)
3166 (Countries)
3166-3 (Deleted Countries)
3166-2 (Subdivisions of countries)
4217 (Currencies)
15924 (Scripts)
It includes a copy from Debian’s pkg-isocodes and makes the data accessible through a Python API.
SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 31 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.
Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1 specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Astrolib PySynphot (hereafter referred to only as pysynphot) is an object-oriented replacement for STSDAS SYNPHOT synthetic photometry package in IRAF. pysynphot simulates photometric data and spectra as they are observed with the Hubble Space Telescope (HST). Passbands for standard photometric systems are available, and users can incorporate their own filters, spectra, and data.
SnapTools can operate on snap files the following types of operations:
index the reference genome before alignment;
align reads to the corresponding reference genome;
pre-process by convert pair-end reads into fragments, checking the mapping quality score, alignment and filtration;
create the cell-by-bin matrix.
This package provides both simple, atomic constructs (such as integers of various sizes), as well as composite ones which allow you form hierarchical and sequential structures of increasing complexity. It features bit and byte granularity, easy debugging and testing, an easy-to-extend subclass system, and lots of primitive constructs to make your work easier.
FontTools/TTX is a library to manipulate font files from Python. It supports reading and writing of TrueType/OpenType fonts, reading and writing of AFM files, reading (and partially writing) of PS Type 1 fonts. The package also contains a tool called “TTX” which converts TrueType/OpenType fonts to and from an XML-based format.
The main purpose of this package is to provide more complex arithmetic operations on dates/times. Heavy use is made of the relativedelta type from the dateutil library. Much of this package is just a light wrapper on top of this with some added features such as range generation and business day calculation.
The xmlschema library is an implementation of XML Schema for Python. It has full support for the XSD 1.0 and 1.1 standards, an XPath-based API for finding schema's elements and attributes; and can encode and decode XML data to JSON and other formats.
The xmlschema library is an implementation of XML Schema for Python. It has full support for the XSD 1.0 and 1.1 standards, an XPath-based API for finding schema's elements and attributes; and can encode and decode XML data to JSON and other formats.
This package includes simulation models for an induction motor, a synchronous reluctance motor, and a permanent-magnet synchronous motor. The motor models are simulated in the continuous-time domain while the control algorithms run in discrete time. The default solver is the explicit Runge-Kutta method of order 5(4) from scipy.integrate.solve_ivp.
The goal of GeoPandas is to make working with geospatial data in Python easier. It combines the capabilities of Pandas and Shapely, providing geospatial operations in Pandas and a high-level interface to multiple geometries to Shapely. GeoPandas enables you to easily do operations in Python that would otherwise require a spatial database such as PostGIS.
specutils is a Python package for representing, loading, manipulating,and analyzing astronomical spectroscopic data. The generic data containers and accompanying modules provide a toolbox that the astronomical community can use to build more domain-specific packages. For more details about the underlying principles, see APE13.
geosketch is a Python package that implements the geometric sketching algorithm described by Brian Hie, Hyunghoon Cho, Benjamin DeMeo, Bryan Bryson, and Bonnie Berger in "Geometric sketching compactly summarizes the single-cell transcriptomic landscape", Cell Systems (2019). This package provides an example implementation of the algorithm as well as scripts necessary for reproducing the experiments in the paper.
HTML to Text is a Python library for extract text from HTML. Contrary to other solution such as LXML or Beautiful Soup, the text extracted with html_text does not contain elements such as JavaScript or inline styles not normally visible to users. It also normalizes white space characters in a smarter, more visually pleasing style.
Hotspot is a tool for identifying informative genes (and gene modules) in a single-cell dataset. Importantly, "informative" is decided based on how well a gene's variation agrees with some cell metric---some similarity mapping between cells. Genes which are informative are those whose expression varies in similar way among cells which are nearby in the given metric.