Enter the query into the form above. You can look for specific version of a package by using @ symbol like this: gcc@10.
API method:
GET /api/packages?search=hello&page=1&limit=20
where search is your query, page is a page number and limit is a number of items on a single page. Pagination information (such as a number of pages and etc) is returned
in response headers.
If you'd like to join our channel webring send a patch to ~whereiseveryone/toys@lists.sr.ht adding your channel as an entry in channels.scm.
Efficient sampling from high-dimensional truncated Gaussian distributions, or multivariate truncated normal (MTN). Techniques include zigzag Hamiltonian Monte Carlo as in Akihiko Nishimura, Zhenyu Zhang and Marc A. Suchard (2024) <doi:10.1080/01621459.2024.2395587>, and harmonic Monte in Ari Pakman and Liam Paninski (2014) <doi:10.1080/10618600.2013.788448>.
Testing homogeneity of k multivariate distributions is a classical and challenging problem in statistics, and this becomes even more challenging when the dimension of the data exceeds the sample size. We construct some tests for this purpose which are exact level (size) alpha tests based on clustering. These tests are easy to implement and distribution-free in finite sample situations. Under appropriate regularity conditions, these tests have the consistency property in HDLSS asymptotic regime, where the dimension of data grows to infinity while the sample size remains fixed. We also consider a multiscale approach, where the results for different number of partitions are aggregated judiciously. Details are in Biplab Paul, Shyamal K De and Anil K Ghosh (2021) <doi:10.1016/j.jmva.2021.104897>; Soham Sarkar and Anil K Ghosh (2019) <doi:10.1109/TPAMI.2019.2912599>; William M Rand (1971) <doi:10.1080/01621459.1971.10482356>; Cyrus R Mehta and Nitin R Patel (1983) <doi:10.2307/2288652>; Joseph C Dunn (1973) <doi:10.1080/01969727308546046>; Sture Holm (1979) <doi:10.2307/4615733>; Yoav Benjamini and Yosef Hochberg (1995) <doi: 10.2307/2346101>.
This package provides a collection of datasets of human-computer interaction (HCI) experiments. Each dataset is from an HCI paper, with all fields described and the original publication linked. All paper authors of included data have consented to the inclusion of their data in this package. The datasets include data from a range of HCI studies, such as pointing tasks, user experience ratings, and steering tasks. Dataset sources: Bergström et al. (2022) <doi:10.1145/3490493>; Dalsgaard et al. (2021) <doi:10.1145/3489849.3489853>; Larsen et al. (2019) <doi:10.1145/3338286.3340115>; Lilija et al. (2019) <doi:10.1145/3290605.3300676>; Pohl and Murray-Smith (2013) <doi:10.1145/2470654.2481307>; Pohl and Mottelson (2022) <doi:10.3389/frvir.2022.719506>.
Read, plot, manipulate and process hydro-meteorological data from Argentina and Chile.
An implementation of several Hierarchical Ensemble Methods (HEMs) for Directed Acyclic Graphs (DAGs). HEMDAG package: 1) reconciles flat predictions with the topology of the ontology; 2) can enhance the predictions of virtually any flat learning methods by taking into account the hierarchical relationships between ontology classes; 3) provides biologically meaningful predictions that always obey the true-path-rule, the biological and logical rule that governs the internal coherence of biomedical ontologies; 4) is specifically designed for exploiting the hierarchical relationships of DAG-structured taxonomies, such as the Human Phenotype Ontology (HPO) or the Gene Ontology (GO), but can be safely applied to tree-structured taxonomies as well (as FunCat), since trees are DAGs; 5) scales nicely both in terms of the complexity of the taxonomy and in the cardinality of the examples; 6) provides several utility functions to process and analyze graphs; 7) provides several performance metrics to evaluate HEMs algorithms. (Marco Notaro, Max Schubach, Peter N. Robinson and Giorgio Valentini (2017) <doi:10.1186/s12859-017-1854-y>).
When performing multiple imputations, while 5-10 imputations are sufficient for obtaining point estimates, a larger number of imputations are needed for proper standard error estimates. This package allows you to calculate how many imputations are needed, following the work of von Hippel (2020) <doi:10.1177/0049124117747303>.
This package provides a Haar-Fisz algorithm for Poisson intensity estimation. Will denoise Poisson distributed sequences where underlying intensity is not constant. Uses the multiscale variance-stabilization method called the Haar-Fisz transform. Contains functions to carry out the forward and inverse Haar-Fisz transform and denoising on near-Gaussian sequences. Can also carry out cycle-spinning. Main reference: Fryzlewicz, P. and Nason, G.P. (2004) "A Haar-Fisz algorithm for Poisson intensity estimation." Journal of Computational and Graphical Statistics, 13, 621-638. <doi:10.1198/106186004X2697>.
Graphical model is an informative and powerful tool to explore the conditional dependence relationships among variables. The traditional Gaussian graphical model and its extensions either have a Gaussian assumption on the data distribution or assume the data are homogeneous. However, there are data with complex distributions violating these two assumptions. For example, the air pollutant concentration records are non-negative and, hence, non-Gaussian. Moreover, due to climate changes, distributions of these concentration records in different months of a year can be far different, which means it is uncertain whether datasets from different months are homogeneous. Methods with a Gaussian or homogeneous assumption may incorrectly model the conditional dependence relationships among variables. Therefore, we propose a heterogeneous graphical model for non-negative data (HGMND) to simultaneously cluster multiple datasets and estimate the conditional dependence matrix of variables from a non-Gaussian and non-negative exponential family in each cluster.
The classical Markowitz's mean-variance portfolio formulation ignores heavy tails and skewness. High-order portfolios use higher order moments to better characterize the return distribution. Different formulations and fast algorithms are proposed for high-order portfolios based on the mean, variance, skewness, and kurtosis. The package is based on the papers: R. Zhou and D. P. Palomar (2021). "Solving High-Order Portfolios via Successive Convex Approximation Algorithms." <arXiv:2008.00863>. X. Wang, R. Zhou, J. Ying, and D. P. Palomar (2022). "Efficient and Scalable High-Order Portfolios Design via Parametric Skew-t Distribution." <arXiv:2206.02412>.
This package provides a data only package containing commercial domestic flights that departed Houston (IAH and HOU) in 2011.
Computation of generalized hypergeometric function with tunable high precision in a vectorized manner, with the floating-point datatypes from mpfr or gmp library. The computation is limited to real numbers.
The heterogeneous multi-task feature learning is a data integration method to conduct joint feature selection across multiple related data sets with different distributions. The algorithm can combine different types of learning tasks, including linear regression, Huber regression, adaptive Huber, and logistic regression. The modified version of Bayesian Information Criterion (BIC) is produced to measure the model performance. Package is based on Yuan Zhong, Wei Xu, and Xin Gao (2022) <https://www.fields.utoronto.ca/talk-media/1/53/65/slides.pdf>.
Estimation procedures and goodness-of-fit test for several Markov regime switching models and mixtures of bivariate copula models. The goodness-of-fit test is based on a Cramer-von Mises statistic and uses Rosenblatt's transform and parametric bootstrap to estimate the p-value. The proposed methodologies are described in Nasri, Remillard and Thioub (2020) <doi:10.1002/cjs.11534>.
This package implements hierarchically regularized entropy balancing proposed by Xu and Yang (2022) <doi:10.1017/pan.2022.12>. The method adjusts the covariate distributions of the control group to match those of the treatment group. hbal automatically expands the covariate space to include higher order terms and uses cross-validation to select variable penalties for the balancing conditions.
Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.
Estimates treatment effects using covariate adjustment methods in Randomized Clinical Trials (RCT) motivated by higher-order influence functions (HOIF). Provides point estimates, oracle bias, variance, and approximate variance for HOIF-adjusted estimators. For methodology details, see Zhao et al. (2024) <doi:10.48550/arXiv.2411.08491>.
This package contains data for software hotspot analysis, along with a function performing the analysis itself.
Fast, model-agnostic implementation of different H-statistics introduced by Jerome H. Friedman and Bogdan E. Popescu (2008) <doi:10.1214/07-AOAS148>. These statistics quantify interaction strength per feature, feature pair, and feature triple. The package supports multi-output predictions and can account for case weights. In addition, several variants of the original statistics are provided. The shape of the interactions can be explored through partial dependence plots or individual conditional expectation plots. DALEX explainers, meta learners ('mlr3', tidymodels', caret') and most other models work out-of-the-box.
By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.
Estimating heterogeneous treatment effects with tree-based machine learning algorithms and visualizing estimated results in flexible and presentation-ready ways. For more information, see Brand, Xu, Koch, and Geraldo (2021) <doi:10.1177/0081175021993503>. Our current package first started as a fork of the causalTree package on GitHub and we greatly appreciate the authors for their extremely useful and free package.
Code Syntax Highlighting made easy for code snippets or complete files. Whether you're documenting your data analysis or creating interactive shiny apps.
This package provides a dependency free interface to the H3 geospatial indexing system utilizing the Rust library h3o <https://github.com/HydroniumLabs/h3o> via the extendr library <https://github.com/extendr/extendr>.
High-dimensional matrix factor models have drawn much attention in view of the fact that observations are usually well structured to be an array such as in macroeconomics and finance. In addition, data often exhibit heavy-tails and thus it is also important to develop robust procedures. We aim to address this issue by replacing the least square loss with Huber loss function. We propose two algorithms to do robust factor analysis by considering the Huber loss. One is based on minimizing the Huber loss of the idiosyncratic error's Frobenius norm, which leads to a weighted iterative projection approach to compute and learn the parameters and thereby named as Robust-Matrix-Factor-Analysis (RMFA), see the details in He et al. (2023)<doi:10.1080/07350015.2023.2191676>. The other one is based on minimizing the element-wise Huber loss, which can be solved by an iterative Huber regression algorithm (IHR), see the details in He et al. (2023) <arXiv:2306.03317>. In this package, we also provide the algorithm for alpha-PCA by Chen & Fan (2021) <doi:10.1080/01621459.2021.1970569>, the Projected estimation (PE) method by Yu et al. (2022)<doi:10.1016/j.jeconom.2021.04.001>. In addition, the methods for determining the pair of factor numbers are also given.
Generates high-entropy integer synthetic populations from marginal and (optionally) seed data using quasirandom sampling, in arbitrary dimensionality (Smith, Lovelace and Birkin (2017) <doi:10.18564/jasss.3550>). The package also provides an implementation of the Iterative Proportional Fitting (IPF) algorithm (Zaloznik (2011) <doi:10.13140/2.1.2480.9923>).