Compiled and cleaned the county-level estimates of fertilizer, nitrogen and phosphorus, from 1945 to 2012 in United States of America (USA). The commercial fertilizer data were originally generated by USGS based on the sales data of commercial fertilizer. The manure data were estimated based on county-level population data of livestock, poultry, and other animals. See the user manual for detailed data sources and cleaning methods. usfertilizer utilized the tidyverse to clean the original data and provide user-friendly dataframe. Please note that USGS does not endorse this package. Also data from 1986 is not available for now.
This is an ExperimentHub package that provides access to the data generated and analyzed in the [smoking-nicotine-mouse](https://github.com/LieberInstitute/smoking-nicotine-mouse/) LIBD project. The datasets contain the expression data of mouse genes, transcripts, exons, and exon-exon junctions across 208 samples from pup and adult mouse brain, and adult blood, that were exposed to nicotine, cigarette smoke, or controls. They also contain relevant metadata of these samples and gene expression features, such QC metrics, if they were used after filtering steps and also if the features were differently expressed in the different experiments.
Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. bibliometrix provides various routines for importing bibliographic data from SCOPUS', Clarivate Analytics Web of Science (<https://www.webofknowledge.com/>), Digital Science Dimensions (<https://www.dimensions.ai/>), OpenAlex (<https://openalex.org/>), Cochrane Library (<https://www.cochranelibrary.com/>), Lens (<https://lens.org>), and PubMed (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Routines for nonlinear time series analysis based on Threshold Autoregressive Moving Average (TARMA) models. It provides functions and methods for: TARMA model fitting and forecasting, including robust estimators, see Goracci et al. JBES (2025) <doi:10.1080/07350015.2024.2412011>; tests for threshold effects, see Giannerini et al. JoE (2024) <doi:10.1016/j.jeconom.2023.01.004>, Goracci et al. Statistica Sinica (2023) <doi:10.5705/ss.202021.0120>, Angelini et al. (2024) OBES <doi:10.1111/obes.12647>; unit-root tests based on TARMA models, see Chan et al. Statistica Sinica (2024) <doi:10.5705/ss.202022.0125>.
Simulates the results of completed randomized controlled trials, as if they had been conducted as adaptive Multi-Arm Bandit (MAB) trials instead. Augmented inverse probability weighted estimation (AIPW), outlined by Hadad et al. (2021) <doi:10.1073/pnas.2014602118>, is used to robustly estimate the probability of success for each treatment arm under the adaptive design. Provides customization options to simulate perfect/imperfect information, stationary/non-stationary bandits, blocked treatment assignments, along with control augmentation, and other hybrid strategies for assigning treatment arms. The methods used in simulation were inspired by Offer-Westort et al. (2021) <doi:10.1111/ajps.12597>.
Automated and robust framework for analyzing R-R interval (RRi) signals using advanced nonlinear modeling and preprocessing techniques. The package implements a dual-logistic model to capture the rapid drop and subsequent recovery of RRi during exercise, as described by Castillo-Aguilar et al. (2025) <doi:10.1038/s41598-025-93654-6>. In addition, CardioCurveR includes tools for filtering RRi signals using zero-phase Butterworth low-pass filtering and for cleaning ectopic beats via adaptive outlier replacement using local regression and robust statistics. These integrated methods preserve the dynamic features of RRi signals and facilitate accurate cardiovascular monitoring and clinical research.
It provides functions that calculate Mahalanobis distance, Euclidean distance, Manhattan distance, Chebyshev distance, Hamming distance, Canberra distance, Minkowski dissimilarity (distance defined for p >= 1), Cosine dissimilarity, Bhattacharyya dissimilarity, Jaccard distance, Hellinger distance, Bray-Curtis dissimilarity, Sorensen-Dice dissimilarity between each pair of species in a list of data frames. These statistics are fundamental in various fields, such as cluster analysis, classification, and other applications of machine learning and data mining, where assessing similarity or dissimilarity between data is crucial. The package is designed to be flexible and easily integrated into data analysis workflows, providing reliable tools for evaluating distances in multidimensional contexts.
IsoCorrectoR performs the correction of mass spectrometry data from stable isotope labeling/tracing metabolomics experiments with regard to natural isotope abundance and tracer impurity. Data from both MS and MS/MS measurements can be corrected (with any tracer isotope: 13C, 15N, 18O...), as well as ultra-high resolution MS data from multiple-tracer experiments (e.g. 13C and 15N used simultaneously). See the Bioconductor package IsoCorrectoRGUI for a graphical user interface to IsoCorrectoR. NOTE: With R version 4.0.0, writing correction results to Excel files may currently not work on Windows. However, writing results to csv works as before.
This package provides a database of Chinese surnames and given names (1930-2008). This database contains nationwide frequency statistics of 1,806 Chinese surnames and 2,614 Chinese characters used in given names, covering about 1.2 billion Han Chinese population (96.8 percent of the Han Chinese household-registered population born from 1930 to 2008 and still alive in 2008). This package also contains a function for computing multiple indices of Chinese surnames and given names for social science research (e.g., name uniqueness, name gender, name valence, and name warmth/competence). Details are provided at <https://psychbruce.github.io/ChineseNames/>.
Calculate clinical scores for hidradenitis suppurativa (HS), a dermatologic disease. The scores are typically used for evaluation of efficacy in clinical trials. The scores are not commonly used in clinical practice. The specific scores implemented are Hidradenitis Suppurativa Clinical Response (HiSCR) (Kimball, et al. (2015) <doi:10.1111/jdv.13216>), Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R) (Goldfarb, et al. (2020) <doi:10.1111/bjd.19565>), hidradenitis suppurativa Physician Global Assessment (HS PGA) (Marzano, et al. (2020) <doi:10.1111/jdv.16328>), and the International Hidradenitis Suppurativa Severity Score System (IHS4) (Zouboulis, et al. (2017) <doi:10.1111/bjd.15748>).
This package provides a collection of tools for detecting influential cases in generalized mixed effects models. It analyses models that were estimated using lme4'. The basic rationale behind identifying influential data is that when single units are omitted from the data, models based on these data should not produce substantially different estimates. To standardize the assessment of how influential a (single group of) observation(s) is, several measures of influence are common practice, such as Cook's Distance. In addition, we provide a measure of percentage change of the fixed point estimates and a simple procedure to detect changing levels of significance.
As a sequel to iNEXT', the iNEXT.beta3D package provides functions to compute standardized taxonomic, phylogenetic, and functional diversity (3D) estimates with a common sample size (for alpha and gamma diversity) or sample coverage (for alpha, beta, gamma diversity as well as dissimilarity or turnover indices). Hill numbers and their generalizations are used to quantify 3D and to make multiplicative decomposition (gamma = alpha x beta). The package also features size- and coverage-based rarefaction and extrapolation sampling curves to facilitate rigorous comparison of beta diversity across datasets. See Chao et al. (2023) <doi:10.1002/ecm.1588> for more details.
The function install_load checks the local R library(ies) to see if the required package(s) is/are installed or not. If the package(s) is/are not installed, then the package(s) will be installed along with the required dependency(ies). This function pulls source or binary packages from the Posit/RStudio-sponsored CRAN mirror. Lastly, the chosen package(s) is/are loaded. The function load_package simply loads the provided package(s). If this package does not fit your needs, then you may want to consider these other R packages: needs', easypackages', pacman', pak', anyLib', and/or librarian'.
Companion package to quallmer providing an interactive shiny application for manual coding, reviewing large language model (LLM) generated annotations, and computing inter-rater reliability metrics. Supports three modes: blind manual coding, LLM output validation, and agreement calculation. Computes standard reliability metrics including Krippendorff's alpha (Krippendorff 2019 <doi:10.4135/9781071878781>), Cohen's kappa, Fleiss kappa (Fleiss 1971 <doi:10.1037/h0031619>), intraclass correlation coefficient (ICC), and percent agreement for nominal, ordinal, interval, and ratio data. Also computes gold-standard validation metrics including accuracy, precision, recall, and F1 scores following Sokolova and Lapalme (2009 <doi:10.1016/j.ipm.2009.03.002>).
Taxonomic dictionaries, formative element lists, and functions related to the maintenance, development and application of U.S. Soil Taxonomy. Data and functionality are based on official U.S. Department of Agriculture sources including the latest edition of the Keys to Soil Taxonomy. Descriptions and metadata are obtained from the National Soil Information System or Soil Survey Geographic databases. Other sources are referenced in the data documentation. Provides tools for understanding and interacting with concepts in the U.S. Soil Taxonomic System. Most of the current utilities are for working with taxonomic concepts at the "higher" taxonomic levels: Order, Suborder, Great Group, and Subgroup.
Easy function for text-mining the PubMed repository based on defined sets of terms. The relationship between fix-terms (related to your research topic) and pub-terms (terms which pivot around your research focus) is calculated using the pointwise mutual information algorithm ('PMI'). Church, Kenneth Ward and Hanks, Patrick (1990) <https://www.aclweb.org/anthology/J90-1003/> A text file is generated with the PMI'-scores for each fix-term. Then for each collocation pairs (a fix-term + a pub-term), a text file is generated with related article titles and publishing years. Additional Author section will follow in the next version updates.
In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The crew.cluster package extends the mirai'-powered crew package with worker launcher plugins for traditional high-performance computing systems. Inspiration also comes from packages mirai by Gao (2023) <https://github.com/r-lib/mirai>, future by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, rrq by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, clustermq by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and batchtools by Lang, Bischl, and Surmann (2017). <doi:10.21105/joss.00135>.
This is an add-on package to the monobin package that simplifies its use. It provides shiny-based user interface (UI) that is especially handy for less experienced R users as well as for those who intend to perform quick scanning of numeric risk factors when building credit rating models. The additional functions implemented in monobinShiny that do no exist in monobin package are: descriptive statistics, special case and outliers imputation. The function descriptive statistics is exported and can be used in R sessions independently from the user interface, while special case and outlier imputation functions are written to be used with shiny UI.
This package provides tools for multivariate analyses of morphological data, wrapped in one package, to make the workflow convenient and fast. Statistical and graphical tools provide a comprehensive framework for checking and manipulating input data, statistical analyses, and visualization of results. Several methods are provided for the analysis of raw data, to make the dataset ready for downstream analyses. Integrated statistical methods include hierarchical classification, principal component analysis, principal coordinates analysis, non-metric multidimensional scaling, and multiple discriminant analyses: canonical, stepwise, and classificatory (linear, quadratic, and the non-parametric k nearest neighbours). The philosophy of the package is described in Å lenker et al. 2022.
This is a sparklyr extension integrating VariantSpark and R. VariantSpark is a framework based on scala and spark to analyze genome datasets, see <https://bioinformatics.csiro.au/>. It was tested on datasets with 3000 samples each one containing 80 million features in either unsupervised clustering approaches and supervised applications, like classification and regression. The genome datasets are usually writing in VCF, a specific text file format used in bioinformatics for storing gene sequence variations. So, VariantSpark is a great tool for genome research, because it is able to read VCF files, run analyses and return the output in a spark data frame.
This package implements an algorithm which increases the number of simultaneously measurable markers and in this way helps with study of the immune responses. Thus, the present algorithm, named CytoBackBone, allows combining phenotypic information of cells from different cytometric profiles obtained from different cytometry panels. This computational approach is based on the principle that each cell has its own phenotypic and functional characteristics that can be used as an identification card. CytoBackBone uses a set of predefined markers, that we call the backbone, to define this identification card. The phenotypic information of cells with similar identification cards in the different cytometric profiles is then merged.
PCG is a family of simple fast space-efficient statistically good algorithms for random number generation. Unlike many general-purpose RNGs, they are also hard to predict. . This library implements bindings to the standard C implementation. This includes the standard, unique, fast and single variants in the pcg family. There is a pure implementation that can be used as a generator with the random package as well as a faster primitive api that includes functions for generating common types. . The generators in this module are suitable for use in parallel but make sure threads don't share the same generator or things will go horribly wrong.
Maximum likelihood estimation of nonlinear mixed effects models of epidemic growth using Template Model Builder ('TMB'). Enables joint estimation for collections of disease incidence time series, including time series that describe multiple epidemic waves. Supports a set of widely used phenomenological models: exponential, logistic, Richards (generalized logistic), subexponential, and Gompertz. Provides methods for interrogating model objects and several auxiliary functions, including one for computing basic reproduction numbers from fitted values of the initial exponential growth rate. Preliminary versions of this software were applied in Ma et al. (2014) <doi:10.1007/s11538-013-9918-2> and in Earn et al. (2020) <doi:10.1073/pnas.2004904117>.
The package provides methods of combining the graph structure learning and generalized least squares regression to improve the regression estimation. The main function sparsenetgls() provides solutions for multivariate regression with Gaussian distributed dependant variables and explanatory variables utlizing multiple well-known graph structure learning approaches to estimating the precision matrix, and uses a penalized variance covariance matrix with a distance tuning parameter of the graph structure in deriving the sandwich estimators in generalized least squares (gls) regression. This package also provides functions for assessing a Gaussian graphical model which uses the penalized approach. It uses Receiver Operative Characteristics curve as a visualization tool in the assessment.