Many modern C/C++ development tools in the clang toolchain, such as clang-tidy or clangd', rely on the presence of a compilation database in JSON format <https://clang.llvm.org/docs/JSONCompilationDatabase.html>. This package temporarily injects additional build flags into the R build process to generate such a compilation database.
Estimation, prediction, and simulation of nonstationary Gaussian process with modular covariate-based covariance functions. Sources of nonstationarity, such as spatial mean, variance, geometric anisotropy, smoothness, and nugget, can be considered based on spatial characteristics. An induced compact-supported nonstationary covariance function is provided, enabling fast and memory-efficient computations when handling densely sampled domains.
Populate data from an R environment into .doc and .docx templates. Create a template document in a program such as Word', and add strings encased in guillemet characters to create flags («example»). Use getDictionary() to create a dictionary of flags and replacement values, then call docket() to generate a populated document.
Draws stylized choropleth maps -- hexagonal maps and triangular multiclass hex maps -- for New Zealand District Health Boards and Regional Council areas. These allow faceted, coloured displays of quantitative information for comparison across District Health Boards or Regional Councils. The preprint Lumley (2019) <arXiv:1912.04435> is based on the methods in this package.
The Explainable Ensemble Trees e2tree approach has been proposed by Aria et al. (2024) <doi:10.1007/s00180-022-01312-6>. It aims to explain and interpret decision tree ensemble models using a single tree-like structure. e2tree is a new way of explaining an ensemble tree trained through randomForest or xgboost packages.
This package provides tools for simulating draws from continuous time processes with well-defined exponential family random graph (ERGM) equilibria, i.e. ERGM generating processes (EGPs). A number of EGPs are supported, including the families identified in Butts (2023) <doi:10.1080/0022250X.2023.2180001>, as are functions for hazard calculation and timing calibration.
Estimate gender from names in Spanish and Portuguese. Works with vectors and dataframes. The estimation works not only for first names but also full names. The package relies on a compilation of common names with it's most frequent associated gender in both languages which are used as look up tables for gender inference.
This package provides an interface to HDFql <https://www.hdfql.com/> and helper functions for reading data from and writing data to HDF5 files. HDFql provides a high-level language for managing HDF5 data that is platform independent. For more information, see the reference manual <https://www.hdfql.com/resources/HDFqlReferenceManual.pdf>.
Calculates intraclass correlation coefficient (ICC) for assessing reproducibility of interval-censored data with two repeated measurements (Kovacic and Varnai (2014) <doi:10.1097/EDE.0000000000000139>). ICC is estimated by maximum likelihood from model with one fixed and one random effect (both intercepts). Help in model checking (normality of subjects means and residuals) is provided.
This package contains functions for fitting a joinpoint proportional hazards model to relative survival or cause-specific survival data, including estimates of joinpoint years at which survival trends have changed and trend measures in the hazard and cumulative survival scale. See Yu et al.(2009) <doi:10.1111/j.1467-985X.2009.00580.x>.
This package provides the tables from the Sean Lahman Baseball Database as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2024, as recorded in the 2025 version of the database. Documentation examples show how many baseball questions can be investigated.
Simulation and estimation of univariate and multivariate log-GARCH models. The main functions of the package are: lgarchSim(), mlgarchSim(), lgarch() and mlgarch(). The first two functions simulate from a univariate and a multivariate log-GARCH model, respectively, whereas the latter two estimate a univariate and multivariate log-GARCH model, respectively.
This package provides tools for analyzing metabolic pathway completeness, abundance, and transcripts using KEGG Orthology (KO) data from (meta)genomic and (meta)transcriptomic studies. Supports both completeness (presence/absence) and abundance-weighted analyses. Includes built-in KEGG reference datasets. For more details see Li et al. (2023) <doi:10.1038/s41467-023-42193-7>.
Computes efficient data distributions from highly inconsistent datasets with many missing values using multi-set intersections. Based upon hash functions, mulset can quickly identify intersections from very large matrices of input vectors across columns and rows and thus provides scalable solution for dealing with missing values. Tomic et al. (2019) <doi:10.1101/545186>.
This package implements an interface to the legacy Fortran code from O'Connell and Dobson (1984) <DOI:10.2307/2531148>. Implements Fortran 77 code for the methods developed by Schouten (1982) <DOI:10.1111/j.1467-9574.1982.tb00774.x>. Includes estimates of average agreement for each observer and average agreement for each subject.
Procedures to fit species distributions models from occurrence records and environmental variables, using glmnet for model fitting. Model structure is the same as for the Maxent Java package, version 3.4.0, with the same feature types and regularization options. See the Maxent website <http://biodiversityinformatics.amnh.org/open_source/maxent> for more details.
Supports visual interpretation of hierarchical composite endpoints (HCEs). HCEs are complex constructs used as primary endpoints in clinical trials, combining outcomes of different types into ordinal endpoints, in which each patient contributes the most clinically important event (one and only one) to the analysis. See Karpefors M et al. (2022) <doi:10.1177/17407745221134949>.
An implementation of the National Information Platforms for Nutrition or NiPN's analytic methods for assessing quality of anthropometric datasets that include measurements of weight, height or length, middle upper arm circumference, sex and age. The focus is on anthropometric status but many of the presented methods could be applied to other variables.
This package provides a set of techniques that can be used to develop, validate, and implement automated classifiers. A powerful tool for transforming raw data into meaningful information, ncodeR (Shaffer, D. W. (2017) Quantitative Ethnography. ISBN: 0578191687) is designed specifically for working with big data: large document collections, logfiles, and other text data.
This package provides a small package designed for interpreting continuous and categorical latent variables. You provide a data set with a latent variable you want to understand and some other explanatory variables. It provides a description of the latent variable based on the explanatory variables. It also provides a name to the latent variable.
Collection of pivotal algorithms for: relabelling the MCMC chains in order to undo the label switching problem in Bayesian mixture models; fitting sparse finite mixtures; initializing the centers of the classical k-means algorithm in order to obtain a better clustering solution. For further details see Egidi, Pappadà , Pauli and Torelli (2018b)<ISBN:9788891910233>.
Allows to download current and historical METAR weather reports extract and parse basic parameters and present main weather information. Current reports are downloaded from Aviation Weather Center <https://aviationweather.gov/data/metar/> and historical reports from Iowa Environmental Mesonet web page of Iowa State University ASOS-AWOS-METAR <http://mesonet.agron.iastate.edu/AWOS/>.
This package implements spatial and spatiotemporal GLMMs (Generalized Linear Mixed Effect Models) using TMB', fmesher', and the SPDE (Stochastic Partial Differential Equation) Gaussian Markov random field approximation to Gaussian random fields. One common application is for spatially explicit species distribution models (SDMs). See Anderson et al. (2024) <doi:10.1101/2022.03.24.485545>.
Ordinary and modified statistics for symmetrical linear regression models with small samples. The supported ordinary statistics include Wald, score, likelihood ratio and gradient. The modified statistics include score, likelihood ratio and gradient. Diagnostic tools associated with the fitted model are implemented. For more details see Medeiros and Ferrari (2017) <DOI:10.1111/stan.12107>.