Enter the query into the form above. You can look for specific version of a package by using @ symbol like this: gcc@10.
API method:
GET /api/packages?search=hello&page=1&limit=20
where search is your query, page is a page number and limit is a number of items on a single page. Pagination information (such as a number of pages and etc) is returned
in response headers.
If you'd like to join our channel webring send a patch to ~whereiseveryone/toys@lists.sr.ht adding your channel as an entry in channels.scm.
Color schemes ready for each type of data (qualitative, diverging or sequential), with colors that are distinct for all people, including color-blind readers. This package provides an implementation of Paul Tol (2018) and Fabio Crameri (2018) <doi:10.5194/gmd-11-2541-2018> color schemes for use with graphics or ggplot2'. It provides tools to simulate color-blindness and to test how well the colors of any palette are identifiable. Several scientific thematic schemes (geologic timescale, land cover, FAO soils, etc.) are also implemented.
Many data science problems reduce to operations on very tall, skinny matrices. However, sometimes these matrices can be so tall that they are difficult to work with, or do not even fit into main memory. One strategy to deal with such objects is to distribute their rows across several processors. To this end, we offer an S4 class for tall, skinny, distributed matrices, called the shaq'. We also provide many useful numerical methods and statistics operations for operating on these distributed objects. The naming is a bit "tongue-in-cheek", with the class a play on the fact that Shaquille ONeal ('Shaq') is very tall, and he starred in the film Kazaam'.
This package provides a comprehensive set of geostatistical, visual, and analytical methods, in conjunction with the expanded version of the acclaimed J.E. Klovan's mining dataset, are included in klovan'. This makes the package an excellent learning resource for Principal Component Analysis (PCA), Factor Analysis (FA), kriging, and other geostatistical techniques. Originally published in the 1976 book Geological Factor Analysis', the included mining dataset was assembled by Professor J. E. Klovan of the University of Calgary. Being one of the first applications of FA in the geosciences, this dataset has significant historical importance. As a well-regarded and published dataset, it is an excellent resource for demonstrating the capabilities of PCA, FA, kriging, and other geostatistical techniques in geosciences. For those interested in these methods, the klovan datasets provide a valuable and illustrative resource. Note that some methods require the RGeostats package. Please refer to the README or Additional_repositories for installation instructions. This material is based upon research in the Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3-COE), and supported by the Department of Energy's National Nuclear Security Administration under Award Number DE-NA0004104.
Implementation of Kmeans clustering algorithm and a supervised KNN (K Nearest Neighbors) learning method. It allows users to perform unsupervised clustering and supervised classification on their datasets. Additional features include data normalization, imputation of missing values, and the choice of distance metric. The package also provides functions to determine the optimal number of clusters for Kmeans and the best k-value for KNN: knn_Function(), find_Knn_best_k(), KMEANS_FUNCTION(), and find_Kmeans_best_k().
Smoothing techniques and computing bandwidth selectors of the nth derivative of a probability density for one-dimensional data (described in Arsalane Chouaib Guidoum (2020) <arXiv:2012.06102> [stat.CO]).
This package provides a unified software package simultaneously implemented in Python', R', and Matlab providing a uniform and internally-consistent way of calculating stoichiometric equilibrium constants in modern and palaeo seawater as a function of temperature, salinity, pressure and the concentration of magnesium, calcium, sulphate, and fluorine.
Clustering typically assigns data points into discrete groups, but the clusters can sometimes be indistinct. Cluster sharpening adjusts an existing clustering to create contrast between groups. This package provides a general interface for cluster sharpening along with several implementations based on different excision criteria.
Software for k-means clustering of partially observed data from Chi, Chi, and Baraniuk (2016) <doi:10.1080/00031305.2015.1086685>.
This package provides functions for analysing eye tracking data, including event detection, visualizations and area of interest (AOI) based analyses. The package includes implementations of the IV-T, I-DT, adaptive velocity threshold, and Identification by two means clustering (I2MC) algorithms. See separate documentation for each function. The principles underlying I-VT and I-DT algorithms are described in Salvucci & Goldberg (2000,\doi10.1145/355017.355028). Two-means clustering is described in Hessels et al. (2017, \doi10.3758/s13428-016-0822-1). The adaptive velocity threshold algorithm is described in Nyström & Holmqvist (2010,\doi10.3758/BRM.42.1.188). See a demonstration in the URL.
This package implements the Lilliefors-corrected Kolmogorov-Smirnov test for use in goodness-of-fit tests, suitable when population parameters are unknown and must be estimated by sample statistics. P-values are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.
It uses species accumulation curves and diverse estimators to assess, at the same time, the levels of survey coverage in multiple geographic cells of a size defined by the user or polygons. It also enables the geographical depiction of observed species richness, survey effort and completeness values including a background with administrative areas.
Analysis of DNA copy number in single cells using custom genome-wide targeted DNA sequencing panels for the Mission Bio Tapestri platform. Users can easily parse, manipulate, and visualize datasets produced from the automated Tapestri Pipeline', with support for normalization, clustering, and copy number calling. Functions are also available to deconvolute multiplexed samples by genotype and parsing barcoded reads from exogenous lentiviral constructs.
Selection of k in k-means clustering based on Pham et al. paper ``Selection of k in k-means clustering''.
This package provides a self-guided, weakly supervised learning algorithm for feature extraction from noisy and high-dimensional data. It facilitates the identification of patterns that reflect underlying group structures across all samples in a dataset. The method incorporates a novel strategy to integrate spatial information, improving the interpretability of results in spatially resolved data.
Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The kstMatrix package provides basic functionalities to generate, handle, and manipulate knowledge structures and knowledge spaces. Opposed to the kst package, kstMatrix uses matrix representations for knowledge structures. Furthermore, kstMatrix contains several knowledge spaces developed by the research group around Cornelia Dowling through querying experts.
This package provides a shiny app to visualize the knowledge networks for the code concepts. Using co-occurrence matrices of EHR codes from Veterans Affairs (VA) and Massachusetts General Brigham (MGB), the knowledge extraction via sparse embedding regression (KESER) algorithm was used to construct knowledge networks for the code concepts. Background and details about the method can be found at Chuan et al. (2021) <doi:10.1038/s41746-021-00519-z>.
Online, Semi-online, and Offline K-medians algorithms are given. For both methods, the algorithms can be initialized randomly or with the help of a robust hierarchical clustering. The number of clusters can be selected with the help of a penalized criterion. We provide functions to provide robust clustering. Function gen_K() enables to generate a sample of data following a contaminated Gaussian mixture. Functions Kmedians() and Kmeans() consists in a K-median and a K-means algorithms while Kplot() enables to produce graph for both methods. Cardot, H., Cenac, P. and Zitt, P-A. (2013). "Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm". Bernoulli, 19, 18-43. <doi:10.3150/11-BEJ390>. Cardot, H. and Godichon-Baggioni, A. (2017). "Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis". Test, 26(3), 461-480 <doi:10.1007/s11749-016-0519-x>. Godichon-Baggioni, A. and Surendran, S. "A penalized criterion for selecting the number of clusters for K-medians" <arXiv:2209.03597> Vardi, Y. and Zhang, C.-H. (2000). "The multivariate L1-median and associated data depth". Proc. Natl. Acad. Sci. USA, 97(4):1423-1426. <doi:10.1073/pnas.97.4.1423>.
Miscellaneous functions and data used in psychological research and teaching. Keng currently has a built-in dataset depress, and could (1) scale a vector; (2) compute the cut-off values of Pearson's r with known sample size; (3) test the significance and compute the post-hoc power for Pearson's r with known sample size; (4) conduct a priori power analysis and plan the sample size for Pearson's r; (5) compare lm()'s fitted outputs using R-squared, f_squared, post-hoc power, and PRE (Proportional Reduction in Error, also called partial R-squared or partial Eta-squared); (6) calculate PRE from partial correlation, Cohen's f, or f_squared; (7) conduct a priori power analysis and plan the sample size for one or a set of predictors in regression analysis; (8) conduct post-hoc power analysis for one or a set of predictors in regression analysis with known sample size; (9) randomly pick numbers for Chinese Super Lotto and Double Color Balls; (10) assess course objective achievement in Outcome-Based Education.
Implementations of two empirical versions the kernel partial correlation (KPC) coefficient and the associated variable selection algorithms. KPC is a measure of the strength of conditional association between Y and Z given X, with X, Y, Z being random variables taking values in general topological spaces. As the name suggests, KPC is defined in terms of kernels on reproducing kernel Hilbert spaces (RKHSs). The population KPC is a deterministic number between 0 and 1; it is 0 if and only if Y is conditionally independent of Z given X, and it is 1 if and only if Y is a measurable function of Z and X. One empirical KPC estimator is based on geometric graphs, such as K-nearest neighbor graphs and minimum spanning trees, and is consistent under very weak conditions. The other empirical estimator, defined using conditional mean embeddings (CMEs) as used in the RKHS literature, is also consistent under suitable conditions. Using KPC, a stepwise forward variable selection algorithm KFOCI (using the graph based estimator of KPC) is provided, as well as a similar stepwise forward selection algorithm based on the RKHS based estimator. For more details on KPC, its empirical estimators and its application on variable selection, see Huang, Z., N. Deb, and B. Sen (2022). â Kernel partial correlation coefficient â a measure of conditional dependenceâ (URL listed below). When X is empty, KPC measures the unconditional dependence between Y and Z, which has been described in Deb, N., P. Ghosal, and B. Sen (2020), â Measuring association on topological spaces using kernels and geometric graphsâ <arXiv:2010.01768>, and it is implemented in the functions KMAc() and Klin() in this package. The latter can be computed in near linear time.
Aids in identifying the Koeppen-Geiger (KG) climatic zone for a given location. The Koeppen-Geiger climate zones were first published in 1884, as a system to classify regions of the earth by their relative heat and humidity through the year, for the benefit of human health, plant and agriculture and other human activity [1]. This climate zone classification system, applicable to all of the earths surface, has continued to be developed by scientists up to the present day. Recently one of use (FZ) has published updated, higher accuracy KG climate zone definitions [2]. In this package we use these updated high-resolution maps as the data source [3]. We provide functions that return the KG climate zone for a given longitude and lattitude, or for a given United States zip code. In addition the CZUncertainty() function will check climate zones nearby to check if the given location is near a climate zone boundary. In addition an interactive shiny app is provided to define the KG climate zone for a given longitude and lattitude, or United States zip code. Digital data, as well as animated maps, showing the shift of the climate zones are provided on the following website <http://koeppen-geiger.vu-wien.ac.at>. This work was supported by the DOE-EERE SunShot award DE-EE-0007140. [1] W. Koeppen, (2011) <doi:10.1127/0941-2948/2011/105>. [2] F. Rubel and M. Kottek, (2010) <doi:10.1127/0941-2948/2010/0430>. [3] F. Rubel, K. Brugger, K. Haslinger, and I. Auer, (2016) <doi:10.1127/metz/2016/0816>.
Convert keys and other values to memorable phrases. Includes some methods to build lists of words.
Quality of life functions for interactive programming. Shortcuts for common combinations of functions or different default arguments. Not to be used in production level scripts, but useful for exploring and quickly manipulating data for easy analysis. Also imports a variety of packages to facilitate the installation of those imported packages on the host machine.
Identification of putative causal variants in genome-wide association studies using hybrid analysis of both the trio and population designs. The package implements the method in the paper: Yang, Y., Wang, Q., Wang, C., Buxbaum, J., & Ionita-Laza, I. (2024). KnockoffHybrid: A knockoff framework for hybrid analysis of trio and population designs in genome-wide association studies. The American Journal of Human Genetics, in press.
Distance metrics for mixed-type data consisting of continuous, nominal, and ordinal variables. This methodology uses additive and product kernels to calculate similarity functions and metrics, and selects variables relevant to the underlying distance through bandwidth selection via maximum similarity cross-validation. These methods can be used in any distance-based algorithm, such as distance-based clustering. For further details, we refer the reader to Ghashti and Thompson (2024) <doi:10.1007/s00357-024-09493-z> for dkps() methodology, and Ghashti (2024) <doi:10.14288/1.0443975> for dkss() methodology.