Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models (LLMs). LLMs are artificial intelligence models trained on vast text corpora to understand and generate human-like text. This package integrates with Seurat objects and provides uncertainty quantification for annotations. Supports various LLM providers including OpenAI', Anthropic', and Google'. The package leverages these models through their respective APIs (Application Programming Interfaces) <https://platform.openai.com/docs>, <https://docs.anthropic.com/>, and <https://ai.google.dev/gemini-api/docs>. For details see Yang et al. (2025) <doi:10.1101/2025.04.10.647852>.
It provides functions that calculate Mahalanobis distance, Euclidean distance, Manhattan distance, Chebyshev distance, Hamming distance, Canberra distance, Minkowski dissimilarity (distance defined for p >= 1), Cosine dissimilarity, Bhattacharyya dissimilarity, Jaccard distance, Hellinger distance, Bray-Curtis dissimilarity, Sorensen-Dice dissimilarity between each pair of species in a list of data frames. These statistics are fundamental in various fields, such as cluster analysis, classification, and other applications of machine learning and data mining, where assessing similarity or dissimilarity between data is crucial. The package is designed to be flexible and easily integrated into data analysis workflows, providing reliable tools for evaluating distances in multidimensional contexts.
Automated and robust framework for analyzing R-R interval (RRi) signals using advanced nonlinear modeling and preprocessing techniques. The package implements a dual-logistic model to capture the rapid drop and subsequent recovery of RRi during exercise, as described by Castillo-Aguilar et al. (2025) <doi:10.1038/s41598-025-93654-6>. In addition, CardioCurveR includes tools for filtering RRi signals using zero-phase Butterworth low-pass filtering and for cleaning ectopic beats via adaptive outlier replacement using local regression and robust statistics. These integrated methods preserve the dynamic features of RRi signals and facilitate accurate cardiovascular monitoring and clinical research.
Routines for nonlinear time series analysis based on Threshold Autoregressive Moving Average (TARMA) models. It provides functions and methods for: TARMA model fitting and forecasting, including robust estimators, see Goracci et al. JBES (2025) <doi:10.1080/07350015.2024.2412011>; tests for threshold effects, see Giannerini et al. JoE (2024) <doi:10.1016/j.jeconom.2023.01.004>, Goracci et al. Statistica Sinica (2023) <doi:10.5705/ss.202021.0120>, Angelini et al. (2024) <doi:10.48550/arXiv.2308.00444>; unit-root tests based on TARMA models, see Chan et al. Statistica Sinica (2024) <doi:10.5705/ss.202022.0125>.
This package provides a database of Chinese surnames and given names (1930-2008). This database contains nationwide frequency statistics of 1,806 Chinese surnames and 2,614 Chinese characters used in given names, covering about 1.2 billion Han Chinese population (96.8 percent of the Han Chinese household-registered population born from 1930 to 2008 and still alive in 2008). This package also contains a function for computing multiple indices of Chinese surnames and given names for social science research (e.g., name uniqueness, name gender, name valence, and name warmth/competence). Details are provided at <https://psychbruce.github.io/ChineseNames/>.
Calculate clinical scores for hidradenitis suppurativa (HS), a dermatologic disease. The scores are typically used for evaluation of efficacy in clinical trials. The scores are not commonly used in clinical practice. The specific scores implemented are Hidradenitis Suppurativa Clinical Response (HiSCR) (Kimball, et al. (2015) <doi:10.1111/jdv.13216>), Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R) (Goldfarb, et al. (2020) <doi:10.1111/bjd.19565>), hidradenitis suppurativa Physician Global Assessment (HS PGA) (Marzano, et al. (2020) <doi:10.1111/jdv.16328>), and the International Hidradenitis Suppurativa Severity Score System (IHS4) (Zouboulis, et al. (2017) <doi:10.1111/bjd.15748>).
As a sequel to iNEXT', the iNEXT.beta3D package provides functions to compute standardized taxonomic, phylogenetic, and functional diversity (3D) estimates with a common sample size (for alpha and gamma diversity) or sample coverage (for alpha, beta, gamma diversity as well as dissimilarity or turnover indices). Hill numbers and their generalizations are used to quantify 3D and to make multiplicative decomposition (gamma = alpha x beta). The package also features size- and coverage-based rarefaction and extrapolation sampling curves to facilitate rigorous comparison of beta diversity across datasets. See Chao et al. (2023) <doi:10.1002/ecm.1588> for more details.
The function install_load checks the local R library(ies) to see if the required package(s) is/are installed or not. If the package(s) is/are not installed, then the package(s) will be installed along with the required dependency(ies). This function pulls source or binary packages from the Posit/RStudio-sponsored CRAN mirror. Lastly, the chosen package(s) is/are loaded. The function load_package simply loads the provided package(s). If this package does not fit your needs, then you may want to consider these other R packages: needs', easypackages', pacman', pak', anyLib', and/or librarian'.
This package provides a collection of tools for detecting influential cases in generalized mixed effects models. It analyses models that were estimated using lme4'. The basic rationale behind identifying influential data is that when single units are omitted from the data, models based on these data should not produce substantially different estimates. To standardize the assessment of how influential a (single group of) observation(s) is, several measures of influence are common practice, such as Cook's Distance. In addition, we provide a measure of percentage change of the fixed point estimates and a simple procedure to detect changing levels of significance.
Taxonomic dictionaries, formative element lists, and functions related to the maintenance, development and application of U.S. Soil Taxonomy. Data and functionality are based on official U.S. Department of Agriculture sources including the latest edition of the Keys to Soil Taxonomy. Descriptions and metadata are obtained from the National Soil Information System or Soil Survey Geographic databases. Other sources are referenced in the data documentation. Provides tools for understanding and interacting with concepts in the U.S. Soil Taxonomic System. Most of the current utilities are for working with taxonomic concepts at the "higher" taxonomic levels: Order, Suborder, Great Group, and Subgroup.
adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks.
IsoCorrectoR performs the correction of mass spectrometry data from stable isotope labeling/tracing metabolomics experiments with regard to natural isotope abundance and tracer impurity. Data from both MS and MS/MS measurements can be corrected (with any tracer isotope: 13C, 15N, 18O...), as well as ultra-high resolution MS data from multiple-tracer experiments (e.g. 13C and 15N used simultaneously). See the Bioconductor package IsoCorrectoRGUI for a graphical user interface to IsoCorrectoR. NOTE: With R version 4.0.0, writing correction results to Excel files may currently not work on Windows. However, writing results to csv works as before.
Easy function for text-mining the PubMed repository based on defined sets of terms. The relationship between fix-terms (related to your research topic) and pub-terms (terms which pivot around your research focus) is calculated using the pointwise mutual information algorithm ('PMI'). Church, Kenneth Ward and Hanks, Patrick (1990) <https://www.aclweb.org/anthology/J90-1003/> A text file is generated with the PMI'-scores for each fix-term. Then for each collocation pairs (a fix-term + a pub-term), a text file is generated with related article titles and publishing years. Additional Author section will follow in the next version updates.
In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The crew.cluster package extends the mirai'-powered crew package with worker launcher plugins for traditional high-performance computing systems. Inspiration also comes from packages mirai by Gao (2023) <https://github.com/r-lib/mirai>, future by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, rrq by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, clustermq by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and batchtools by Lang, Bischl, and Surmann (2017). <doi:10.21105/joss.00135>.
This is an add-on package to the monobin package that simplifies its use. It provides shiny-based user interface (UI) that is especially handy for less experienced R users as well as for those who intend to perform quick scanning of numeric risk factors when building credit rating models. The additional functions implemented in monobinShiny that do no exist in monobin package are: descriptive statistics, special case and outliers imputation. The function descriptive statistics is exported and can be used in R sessions independently from the user interface, while special case and outlier imputation functions are written to be used with shiny UI.
This package provides tools for multivariate analyses of morphological data, wrapped in one package, to make the workflow convenient and fast. Statistical and graphical tools provide a comprehensive framework for checking and manipulating input data, statistical analyses, and visualization of results. Several methods are provided for the analysis of raw data, to make the dataset ready for downstream analyses. Integrated statistical methods include hierarchical classification, principal component analysis, principal coordinates analysis, non-metric multidimensional scaling, and multiple discriminant analyses: canonical, stepwise, and classificatory (linear, quadratic, and the non-parametric k nearest neighbours). The philosophy of the package is described in Å lenker et al. 2022.
This is a sparklyr extension integrating VariantSpark and R. VariantSpark is a framework based on scala and spark to analyze genome datasets, see <https://bioinformatics.csiro.au/>. It was tested on datasets with 3000 samples each one containing 80 million features in either unsupervised clustering approaches and supervised applications, like classification and regression. The genome datasets are usually writing in VCF, a specific text file format used in bioinformatics for storing gene sequence variations. So, VariantSpark is a great tool for genome research, because it is able to read VCF files, run analyses and return the output in a spark data frame.
This package implements Bayesian Distribution Regression methods. This package contains functions for three estimators (non-asymptotic, semi-asymptotic and asymptotic) and related routines for Bayesian Distribution Regression in Huang and Tsyawo (2018) <doi:10.2139/ssrn.3048658> which is also the recommended reference to cite for this package. The functions can be grouped into three (3) categories. The first computes the logit likelihood function and posterior densities under uniform and normal priors. The second contains Independence and Random Walk Metropolis-Hastings Markov Chain Monte Carlo (MCMC) algorithms as functions and the third category of functions are useful for semi-asymptotic and asymptotic Bayesian distribution regression inference.
Maximum likelihood estimation of nonlinear mixed effects models of epidemic growth using Template Model Builder ('TMB'). Enables joint estimation for collections of disease incidence time series, including time series that describe multiple epidemic waves. Supports a set of widely used phenomenological models: exponential, logistic, Richards (generalized logistic), subexponential, and Gompertz. Provides methods for interrogating model objects and several auxiliary functions, including one for computing basic reproduction numbers from fitted values of the initial exponential growth rate. Preliminary versions of this software were applied in Ma et al. (2014) <doi:10.1007/s11538-013-9918-2> and in Earn et al. (2020) <doi:10.1073/pnas.2004904117>.
This package implements an algorithm which increases the number of simultaneously measurable markers and in this way helps with study of the immune responses. Thus, the present algorithm, named CytoBackBone, allows combining phenotypic information of cells from different cytometric profiles obtained from different cytometry panels. This computational approach is based on the principle that each cell has its own phenotypic and functional characteristics that can be used as an identification card. CytoBackBone uses a set of predefined markers, that we call the backbone, to define this identification card. The phenotypic information of cells with similar identification cards in the different cytometric profiles is then merged.
PCG is a family of simple fast space-efficient statistically good algorithms for random number generation. Unlike many general-purpose RNGs, they are also hard to predict. . This library implements bindings to the standard C implementation. This includes the standard, unique, fast and single variants in the pcg family. There is a pure implementation that can be used as a generator with the random package as well as a faster primitive api that includes functions for generating common types. . The generators in this module are suitable for use in parallel but make sure threads don't share the same generator or things will go horribly wrong.
One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to Bayesian inference and from spatial analyses to pharmacokinetics (<https://cran.r-project.org/web/views/>). There is probably not an area of quantitative research that isn't represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio.
The aim of the package is to provide some basic functions for doing statistics with trapezoidal fuzzy numbers. In particular, the package contains several functions for simulating trapezoidal fuzzy numbers, as well as for calculating some central tendency measures (mean and two types of median), some scale measures (variance, ADD, MDD, Sn, Qn, Tn and some M-estimators) and one diversity index and one inequality index. Moreover, functions for calculating the 1-norm distance, the mid/spr distance and the (phi,theta)-wabl/ldev/rdev distance between fuzzy numbers are included, and a function to calculate the value phi-wabl given a sample of trapezoidal fuzzy numbers.
This package provides tools to decompose differences in cohort health expectancy (HE) by age and cause using longitudinal data. The package implements a novel longitudinal attribution method based on a semiparametric additive hazards model with time-dependent covariates, specifically designed to address interval censoring and semi-competing risks via a copula framework. The resulting age-cause-specific contributions to disability prevalence and death probability can be used to quantify and decompose differences in cohort HE between groups. The package supports stepwise replacement decomposition algorithms and is applicable to cohort-based health disparity research across diverse populations. Related methods include Sun et al. (2023) <doi:10.1177/09622802221133552>.