Simulates stochastic hybrid models for transmission of infectious diseases in dynamic networks. It is a metapopulation model in which each node in the network is a sub-population and disease spreads within nodes and among them, combining two approaches: stochastic simulation algorithm (<doi:10.1146/annurev.physchem.58.032806.104637>) and individual-based approach, respectively. Equations that models spread within nodes are customizable and there are two link types among nodes: migration and influence (commuting). More information in Fernando S. Marques, Jose H. H. Grisi-Filho, Marcos Amaku et al. (2020) <doi:10.18637/jss.v094.i06>.
Frequentist and Bayesian linear regression for large data sets. Useful when the data does not fit into memory (for both frequentist and Bayesian regression), to make running time manageable (mainly for Bayesian regression), and to reduce the total running time because of reduced or less severe memory-spillover into the virtual memory. This is an implementation of Merge & Reduce for linear regression as described in Geppert, L.N., Ickstadt, K., Munteanu, A., & Sohler, C. (2020). Streaming statistical models via Merge & Reduce'. International Journal of Data Science and Analytics, 1-17, <doi:10.1007/s41060-020-00226-0>.
This package provides basic functions that support an implementation of multi-profile case (Case 3) best-worst scaling (BWS). Case 3 BWS is a question-based survey method to elicit people's preferences for attribute levels. Case 3 BWS constructs various combinations of attribute levels (profiles) and then asks respondents to select the best and worst profiles in each choice set. A main function creates a dataset for the analysis from the choice sets and the responses to the questions. For details on Case 3 BWS, refer to Louviere et al. (2015) <doi:10.1017/CBO9781107337855>.
Various estimators of causal effects based on inverse probability weighting, doubly robust estimation, and double machine learning. Specifically, the package includes methods for estimating average treatment effects, direct and indirect effects in causal mediation analysis, and dynamic treatment effects. The models refer to studies of Froelich (2007) <doi:10.1016/j.jeconom.2006.06.004>, Huber (2012) <doi:10.3102/1076998611411917>, Huber (2014) <doi:10.1080/07474938.2013.806197>, Huber (2014) <doi:10.1002/jae.2341>, Froelich and Huber (2017) <doi:10.1111/rssb.12232>, Hsu, Huber, Lee, and Lettry (2020) <doi:10.1002/jae.2765>, and others.
This package contains functions to query and visualize the Neuroimaging features associated with genetically regulated gene expression (GReX). The primary utility, neuroimaGene(), relies on a list of user-defined genes and returns a table of neuroimaging features (NIDPs) associated with each gene. This resource is designed to assist in the interpretation of genome-wide and transcriptome-wide association studies that evaluate brain related traits. Bledsoe (2024) <doi:10.1016/j.ajhg.2024.06.002>. In addition there are several visualization functions that generate summary plots and 2-dimensional visualizations of regional brain measures. Mowinckel (2020).
This package provides a roclet for roxygen2 that identifies and processes code blocks in your documentation marked with `@longtests`. These blocks should contain tests that take a long time to run and thus cannot be included in the regular test suite of the package. When you run `roxygen2::roxygenise` with the `longtests_roclet`, it will extract these long tests from your documentation and save them in a separate directory. This allows you to run these long tests separately from the rest of your tests, for example, on a continuous integration server that is set up to run long tests.
glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely "gaussian", "poisson", "binomial", "multinomial", "cox", and "mgaussian".
Bayesian power/type I error calculation and model fitting using the power prior and the normalized power prior for proportional hazards models with piecewise constant hazard. The methodology and examples of applying the package are detailed in <doi:10.48550/arXiv.2404.05118>. The Bayesian clinical trial design methodology is described in Chen et al. (2011) <doi:10.1111/j.1541-0420.2011.01561.x>, and Psioda and Ibrahim (2019) <doi:10.1093/biostatistics/kxy009>. The proportional hazards model with piecewise constant hazard is detailed in Ibrahim et al. (2001) <doi:10.1007/978-1-4757-3447-8>.
Implementation of the empirical method to derive log2 counts per million (CPM) cutoff to filter out lowly expressed genes using ERCC spike-ins as described in Goll and Bosinger et.al (2022)<doi:10.1101/2022.06.23.497396>. This package utilizes the synthetic mRNA control pairs developed by the External RNA Controls Consortium (ERCC) (ERCC 1 / ERCC 2) that are spiked into sample pairs at known ratios at various absolute abundances. The relationship between the observed and expected fold changes is then used to empirically determine an optimal log2 CPM cutoff for filtering out lowly expressed genes.
The Ljung-Box test is one of the most important tests for time series diagnostics and model selection. The Hassani SACF (Sum of the Sample Autocorrelation Function) Theorem , however, indicates that the sum of sample autocorrelation function is always fix for any stationary time series with arbitrary length. This package confirms for sensitivity of the Ljung-Box test to the number of lags involved in the test and therefore it should be used with extra caution. The Hassani SACF Theorem has been described in : Hassani, Yeganegi and M. R. (2019) <doi:10.1016/j.physa.2018.12.028>.
Robust multi-criteria land-allocation optimization that explicitly accounts for the uncertainty of the indicators in the objective function. Solves the problem of allocating scarce land to various land-use options with regard to multiple, coequal indicators. The method aims to find the land allocation that represents the indicator composition with the best possible trade-off under uncertainty. optimLanduse includes the actual optimization procedure as described by Knoke et al. (2016) <doi:10.1038/ncomms11877> and the post-hoc calculation of the portfolio performance as presented by Gosling et al. (2020) <doi:10.1016/j.jenvman.2020.110248>.
Compiled and cleaned the county-level estimates of fertilizer, nitrogen and phosphorus, from 1945 to 2012 in United States of America (USA). The commercial fertilizer data were originally generated by USGS based on the sales data of commercial fertilizer. The manure data were estimated based on county-level population data of livestock, poultry, and other animals. See the user manual for detailed data sources and cleaning methods. usfertilizer utilized the tidyverse to clean the original data and provide user-friendly dataframe. Please note that USGS does not endorse this package. Also data from 1986 is not available for now.
This is an ExperimentHub package that provides access to the data generated and analyzed in the [smoking-nicotine-mouse](https://github.com/LieberInstitute/smoking-nicotine-mouse/) LIBD project. The datasets contain the expression data of mouse genes, transcripts, exons, and exon-exon junctions across 208 samples from pup and adult mouse brain, and adult blood, that were exposed to nicotine, cigarette smoke, or controls. They also contain relevant metadata of these samples and gene expression features, such QC metrics, if they were used after filtering steps and also if the features were differently expressed in the different experiments.
Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. bibliometrix provides various routines for importing bibliographic data from SCOPUS', Clarivate Analytics Web of Science (<https://www.webofknowledge.com/>), Digital Science Dimensions (<https://www.dimensions.ai/>), OpenAlex (<https://openalex.org/>), Cochrane Library (<https://www.cochranelibrary.com/>), Lens (<https://lens.org>), and PubMed (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models (LLMs). LLMs are artificial intelligence models trained on vast text corpora to understand and generate human-like text. This package integrates with Seurat objects and provides uncertainty quantification for annotations. Supports various LLM providers including OpenAI', Anthropic', and Google'. The package leverages these models through their respective APIs (Application Programming Interfaces) <https://platform.openai.com/docs>, <https://docs.anthropic.com/>, and <https://ai.google.dev/gemini-api/docs>. For details see Yang et al. (2025) <doi:10.1101/2025.04.10.647852>.
Simulates the results of completed randomized controlled trials, as if they had been conducted as adaptive Multi-Arm Bandit (MAB) trials instead. Augmented inverse probability weighted estimation (AIPW), outlined by Hadad et al. (2021) <doi:10.1073/pnas.2014602118>, is used to robustly estimate the probability of success for each treatment arm under the adaptive design. Provides customization options to simulate perfect/imperfect information, stationary/non-stationary bandits, blocked treatment assignments, along with control augmentation, and other hybrid strategies for assigning treatment arms. The methods used in simulation were inspired by Offer-Westort et al. (2021) <doi:10.1111/ajps.12597>.
It provides functions that calculate Mahalanobis distance, Euclidean distance, Manhattan distance, Chebyshev distance, Hamming distance, Canberra distance, Minkowski dissimilarity (distance defined for p >= 1), Cosine dissimilarity, Bhattacharyya dissimilarity, Jaccard distance, Hellinger distance, Bray-Curtis dissimilarity, Sorensen-Dice dissimilarity between each pair of species in a list of data frames. These statistics are fundamental in various fields, such as cluster analysis, classification, and other applications of machine learning and data mining, where assessing similarity or dissimilarity between data is crucial. The package is designed to be flexible and easily integrated into data analysis workflows, providing reliable tools for evaluating distances in multidimensional contexts.
Automated and robust framework for analyzing R-R interval (RRi) signals using advanced nonlinear modeling and preprocessing techniques. The package implements a dual-logistic model to capture the rapid drop and subsequent recovery of RRi during exercise, as described by Castillo-Aguilar et al. (2025) <doi:10.1038/s41598-025-93654-6>. In addition, CardioCurveR includes tools for filtering RRi signals using zero-phase Butterworth low-pass filtering and for cleaning ectopic beats via adaptive outlier replacement using local regression and robust statistics. These integrated methods preserve the dynamic features of RRi signals and facilitate accurate cardiovascular monitoring and clinical research.
Routines for nonlinear time series analysis based on Threshold Autoregressive Moving Average (TARMA) models. It provides functions and methods for: TARMA model fitting and forecasting, including robust estimators, see Goracci et al. JBES (2025) <doi:10.1080/07350015.2024.2412011>; tests for threshold effects, see Giannerini et al. JoE (2024) <doi:10.1016/j.jeconom.2023.01.004>, Goracci et al. Statistica Sinica (2023) <doi:10.5705/ss.202021.0120>, Angelini et al. (2024) <doi:10.48550/arXiv.2308.00444>; unit-root tests based on TARMA models, see Chan et al. Statistica Sinica (2024) <doi:10.5705/ss.202022.0125>.
IsoCorrectoR performs the correction of mass spectrometry data from stable isotope labeling/tracing metabolomics experiments with regard to natural isotope abundance and tracer impurity. Data from both MS and MS/MS measurements can be corrected (with any tracer isotope: 13C, 15N, 18O...), as well as ultra-high resolution MS data from multiple-tracer experiments (e.g. 13C and 15N used simultaneously). See the Bioconductor package IsoCorrectoRGUI for a graphical user interface to IsoCorrectoR. NOTE: With R version 4.0.0, writing correction results to Excel files may currently not work on Windows. However, writing results to csv works as before.
This package provides a database of Chinese surnames and given names (1930-2008). This database contains nationwide frequency statistics of 1,806 Chinese surnames and 2,614 Chinese characters used in given names, covering about 1.2 billion Han Chinese population (96.8 percent of the Han Chinese household-registered population born from 1930 to 2008 and still alive in 2008). This package also contains a function for computing multiple indices of Chinese surnames and given names for social science research (e.g., name uniqueness, name gender, name valence, and name warmth/competence). Details are provided at <https://psychbruce.github.io/ChineseNames/>.
Calculate clinical scores for hidradenitis suppurativa (HS), a dermatologic disease. The scores are typically used for evaluation of efficacy in clinical trials. The scores are not commonly used in clinical practice. The specific scores implemented are Hidradenitis Suppurativa Clinical Response (HiSCR) (Kimball, et al. (2015) <doi:10.1111/jdv.13216>), Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R) (Goldfarb, et al. (2020) <doi:10.1111/bjd.19565>), hidradenitis suppurativa Physician Global Assessment (HS PGA) (Marzano, et al. (2020) <doi:10.1111/jdv.16328>), and the International Hidradenitis Suppurativa Severity Score System (IHS4) (Zouboulis, et al. (2017) <doi:10.1111/bjd.15748>).
As a sequel to iNEXT', the iNEXT.beta3D package provides functions to compute standardized taxonomic, phylogenetic, and functional diversity (3D) estimates with a common sample size (for alpha and gamma diversity) or sample coverage (for alpha, beta, gamma diversity as well as dissimilarity or turnover indices). Hill numbers and their generalizations are used to quantify 3D and to make multiplicative decomposition (gamma = alpha x beta). The package also features size- and coverage-based rarefaction and extrapolation sampling curves to facilitate rigorous comparison of beta diversity across datasets. See Chao et al. (2023) <doi:10.1002/ecm.1588> for more details.
This package provides a collection of tools for detecting influential cases in generalized mixed effects models. It analyses models that were estimated using lme4'. The basic rationale behind identifying influential data is that when single units are omitted from the data, models based on these data should not produce substantially different estimates. To standardize the assessment of how influential a (single group of) observation(s) is, several measures of influence are common practice, such as Cook's Distance. In addition, we provide a measure of percentage change of the fixed point estimates and a simple procedure to detect changing levels of significance.