Computation of adherence to medications from Electronic Health care Data and visualization of individual medication histories and adherence patterns. The package implements a set of S3 classes and functions consistent with current adherence guidelines and definitions. It allows the computation of different measures of adherence (as defined in the literature, but also several original ones), their publication-quality plotting, the estimation of event duration and time to initiation, the interactive exploration of patient medication history and the real-time estimation of adherence given various parameter settings. It scales from very small datasets stored in flat CSV files to very large databases and from single-thread processing on mid-range consumer laptops to parallel processing on large heterogeneous computing clusters. It exposes a standardized interface allowing it to be used from other programming languages and platforms, such as Python.
Statistical methods for ROC surface analysis in three-class classification problems for clustered data and in presence of covariates. In particular, the package allows to obtain covariate-specific point and interval estimation for: (i) true class fractions (TCFs) at fixed pairs of thresholds; (ii) the ROC surface; (iii) the volume under ROC surface (VUS); (iv) the optimal pairs of thresholds. Methods considered in points (i), (ii) and (iv) are proposed and discussed in To et al. (2022) <doi:10.1177/09622802221089029>. Referring to point (iv), three different selection criteria are implemented: Generalized Youden Index (GYI), Closest to Perfection (CtP) and Maximum Volume (MV). Methods considered in point (iii) are proposed and discussed in Xiong et al. (2018) <doi:10.1177/0962280217742539>. Visualization tools are also provided. We refer readers to the articles cited above for all details.
Fits a geographically weighted regression model using zero inflated probability distributions. Has the zero inflated negative binomial distribution (zinb) as default, but also accepts the zero inflated Poisson (zip), negative binomial (negbin) and Poisson distributions. Can also fit the global versions of each regression model. Da Silva, A. R. & De Sousa, M. D. R. (2023). "Geographically weighted zero-inflated negative binomial regression: A general case for count data", Spatial Statistics <doi:10.1016/j.spasta.2023.100790>. Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). "Geographically weighted regression: a method for exploring spatial nonstationarity", Geographical Analysis, <doi:10.1111/j.1538-4632.1996.tb00936.x>. Yau, K. K. W., Wang, K., & Lee, A. H. (2003). "Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros", Biometrical Journal, <doi:10.1002/bimj.200390024>.
This package provides functions and example datasets for phytosociological analysis, forest inventory, biomass and carbon estimation, and visualization of vegetation data. Includes functions to compute structural parameters [phytoparam(), summary.param(), stats()], estimate above-ground biomass and carbon [AGB()], stratify wood volume by diameter at breast height (DBH) classes [stratvol()], generate collector and rarefaction curves [collector.curve(), rarefaction()], and visualize basal areas on quadrat maps [BAplot(), including rectangular plots and individual coordinates]. Several example datasets are provided to demonstrate the functionality of these tools. For more details see FAO (1981, ISBN:92-5-101132-X) "Manual of forest inventory", IBGE (2012, ISBN:9788524042720) "Manual técnico da vegetação brasileira" and Heringer et al. (2020) "Phytosociology in R: A routine to estimate phytosociological parameters" <doi:10.22533/at.ed.3552009033>.
Processes amino acid alignments produced by the IPD-IMGT/HLA (Immuno Polymorphism-ImMunoGeneTics/Human Leukocyte Antigen) Database to identify user-defined amino acid residue motifs shared across HLA alleles, HLA alleles, or HLA haplotypes, and calculates frequencies based on HLA allele frequency data. SSHAARP (Searching Shared HLA Amino Acid Residue Prevalence) uses Generic Mapping Tools (GMT) software and the GMT R package to generate global frequency heat maps that illustrate the distribution of each user-defined map around the globe. SSHAARP analyzes the allele frequency data described by Solberg et al. (2008) <doi:10.1016/j.humimm.2008.05.001>, a global set of 497 population samples from 185 published datasets, representing 66,800 individuals total. Users may also specify their own datasets, but file conventions must follow the prebundled Solberg dataset, or the mock haplotype dataset.
DNA methylation signatures are usually based on multivariate approaches that require hundreds of sites for predictions. CimpleG is a method for the detection of small CpG methylation signatures used for cell-type classification and deconvolution. CimpleG is time efficient and performs as well as top performing methods for cell-type classification of blood cells and other somatic cells, while basing its prediction on a single DNA methylation site per cell type (but users can also select more sites if they so wish). Users can train cell type classifiers ('CimpleG based, and others) and directly apply these in a deconvolution of cell mixes context. Altogether, CimpleG provides a complete computational framework for the delineation of DNAm signatures and cellular deconvolution. For more details see Maié et al. (2023) <doi:10.1186/s13059-023-03000-0>.
Estimation of a density from grouped (tabulated) summary statistics evaluated in each of the big bins (or classes) partitioning the support of the variable. These statistics include class frequencies and central moments of order one up to four. The log-density is modelled using a linear combination of penalised B-splines. The multinomial log-likelihood involving the frequencies adds up to a roughness penalty based on the differences in the coefficients of neighbouring B-splines and the log of a root-n approximation of the sampling density of the observed vector of central moments in each class. The so-obtained penalized log-likelihood is maximized using the EM algorithm to get an estimate of the spline parameters and, consequently, of the variable density and related quantities such as quantiles, see Lambert, P. (2021) <arXiv:2107.03883> for details.
Turns one-off iterative R procedures (such as for loops, lapply() or pmap() from purrr') into production-grade workflows by wrapping them with orthogonal, composable execution layers. Two layers are always active: structured logging with real traceback and per-case timing; and reproducibility capture, which records the R version, loaded package versions, execution environment, the exact iteration mask, and a stat-based fingerprint of every input file referenced in the mask (with a diff_inputs() helper to detect silent drift between runs). Parallel execution (built on the future framework, Bengtsson (2021) <doi:10.32614/RJ-2021-048>), non-blocking background jobs, and opt-in progress reporting (via progressr') are implemented as optional, composable layers. Further layers (error replay, content-hash input fingerprinting, content-based case identifiers) are planned and will remain composable with the default layers.
This package provides functions for (1) soil water retention (SWC) and unsaturated hydraulic conductivity (Ku) (van Genuchten-Mualem (vGM or vG) [1, 2], Peters-Durner-Iden (PDI) [3, 4, 5], Brooks and Corey (bc) [8]), (2) fitting of parameter for SWC and/or Ku using Shuffled Complex Evolution (SCE) optimisation and (3) calculation of soil hydraulic properties (Ku and soil water contents) based on the simplified evaporation method (SEM) [6, 7]. Main references: [1] van Genuchten (1980) <doi:10.2136/sssaj1980.03615995004400050002x>, [2] Mualem (1976) <doi:10.1029/WR012i003p00513>, [3] Peters (2013) <doi:10.1002/wrcr.20548>, [4] Iden and Durner (2013) <doi:10.1002/2014WR015937>, [5] Peters (2014) <doi:10.1002/2014WR015937>, [6] Wind G. P. (1966), [7] Peters and Durner (2008) <doi:10.1016/j.jhydrol.2008.04.016> and [8] Brooks and Corey (1964).
The model estimates air pollution removal by dry deposition on trees. It also estimates or uses hourly values for aerodynamic resistance, boundary layer resistance, canopy resistance, stomatal resistance, cuticular resistance, mesophyll resistance, soil resistance, friction velocity and deposition velocity. It also allows plotting graphical results for a specific time period. The pollutants are nitrogen dioxide, ozone, sulphur dioxide, carbon monoxide and particulate matter. Baldocchi D (1994) <doi:10.1093/treephys/14.7-8-9.1069>. Farquhar GD, von Caemmerer S, Berry JA (1980) Planta 149: 78-90. Hirabayashi S, Kroll CN, Nowak DJ (2015) i-Tree Eco Dry Deposition Model. Nowak DJ, Crane DE, Stevens JC (2006) <doi:10.1016/j.ufug.2006.01.007>. US EPA (1999) PCRAMMET User's Guide. EPA-454/B-96-001. Weiss A, Norman JM (1985) Agricultural and Forest Meteorology 34: 205â 213.
Perform state and parameter inference, and forecasting, in stochastic state-space systems using the ctsmTMB class. This class, built with the R6 package, provides a user-friendly interface for defining and handling state-space models. Inference is based on maximum likelihood estimation, with derivatives efficiently computed through automatic differentiation enabled by the TMB'/'RTMB packages (Kristensen et al., 2016) <doi:10.18637/jss.v070.i05>. The available inference methods include Kalman filters, in addition to a Laplace approximation-based smoothing method. For further details of these methods refer to the documentation of the CTSMR package <https://ctsm.info/ctsmr-reference.pdf> and Thygesen (2025) <doi:10.48550/arXiv.2503.21358>. Forecasting capabilities include moment predictions and stochastic path simulations, both implemented in C++ using Rcpp (Eddelbuettel et al., 2018) <doi:10.1080/00031305.2017.1375990> for computational efficiency.
Variable selection methods have been extensively developed for analyzing highdimensional omics data within both the frequentist and Bayesian frameworks. This package provides implementations of the spike-and-slab quantile (group) LASSO which have been developed along the line of Bayesian hierarchical models but deeply rooted in frequentist regularization methods by utilizing Expectationâ Maximization (EM) algorithm. The spike-and-slab quantile LASSO can handle data irregularity in terms of skewness and outliers in response variables, compared to its non-robust alternative, the spike-and-slab LASSO, which has also been implemented in the package. In addition, procedures for fitting the spike-and-slab quantile group LASSO and its non-robust counterpart have been implemented in the form of quantile/least-square varying coefficient mixed effect models for high-dimensional longitudinal data. The core module of this package is developed in C++'.
Density estimation for possibly large data sets and conditional/unconditional random number generation or bootstrapping with distribution element trees. The function det.construct translates a dataset into a distribution element tree. To evaluate the probability density based on a previously computed tree at arbitrary query points, the function det.query is available. The functions det1 and det2 provide density estimation and plotting for one- and two-dimensional datasets. Conditional/unconditional smooth bootstrapping from an available distribution element tree can be performed by det.rnd'. For more details on distribution element trees, see: Meyer, D.W. (2016) <arXiv:1610.00345> or Meyer, D.W., Statistics and Computing (2017) <doi:10.1007/s11222-017-9751-9> and Meyer, D.W. (2017) <arXiv:1711.04632> or Meyer, D.W., Journal of Computational and Graphical Statistics (2018) <doi:10.1080/10618600.2018.1482768>.
Collection of data sets from various assessments that can be used to evaluate psychometric models. These data sets have been analyzed in the following papers that introduced new methodology as part of the application section: Jimenez, A., Balamuta, J. J., & Culpepper, S. A. (2023) <doi:10.1111/bmsp.12307>, Culpepper, S. A., & Balamuta, J. J. (2021) <doi:10.1080/00273171.2021.1985949>, Yinghan Chen et al. (2021) <doi:10.1007/s11336-021-09750-9>, Yinyin Chen et al. (2020) <doi:10.1007/s11336-019-09693-2>, Culpepper, S. A. (2019a) <doi:10.1007/s11336-019-09683-4>, Culpepper, S. A. (2019b) <doi:10.1007/s11336-018-9643-8>, Culpepper, S. A., & Chen, Y. (2019) <doi:10.3102/1076998618791306>, Culpepper, S. A., & Balamuta, J. J. (2017) <doi:10.1007/s11336-015-9484-7>, and Culpepper, S. A. (2015) <doi:10.3102/1076998615595403>.
It is a hybrid spatial model that combines the variable selection capabilities of stepwise regression methods with the predictive power of the Geographically Weighted Regression(GWR) model.The developed hybrid model follows a two-step approach where the stepwise variable selection method is applied first to identify the subset of predictors that have the most significant impact on the response variable, and then a GWR model is fitted using those selected variables for spatial prediction at test or unknown locations. For method details,see Leung, Y., Mei, C. L. and Zhang, W. X. (2000).<DOI:10.1068/a3162>.This hybrid spatial model aims to improve the accuracy and interpretability of GWR predictions by selecting a subset of relevant variables through a stepwise selection process.This approach is particularly useful for modeling spatially varying relationships and improving the accuracy of spatial predictions.
Conditional graphical lasso estimator is an extension of the graphical lasso proposed to estimate the conditional dependence structure of a set of p response variables given q predictors. This package provides suitable extensions developed to study datasets with censored and/or missing values. Standard conditional graphical lasso is available as a special case. Furthermore, the package provides an integrated set of core routines for visualization, analysis, and simulation of datasets with censored and/or missing values drawn from a Gaussian graphical model. Details about the implemented models can be found in Augugliaro et al. (2023) <doi: 10.18637/jss.v105.i01>, Augugliaro et al. (2020b) <doi: 10.1007/s11222-020-09945-7>, Augugliaro et al. (2020a) <doi: 10.1093/biostatistics/kxy043>, Yin et al. (2001) <doi: 10.1214/11-AOAS494> and Stadler et al. (2012) <doi: 10.1007/s11222-010-9219-7>.
Partitioning clustering algorithms divide data sets into k subsets or partitions so-called clusters. They require some initialization procedures for starting the algorithms. Initialization of cluster prototypes is one of such kind of procedures for most of the partitioning algorithms. Cluster prototypes are the centers of clusters, i.e. centroids or medoids, representing the clusters in a data set. In order to initialize cluster prototypes, the package inaparc contains a set of the functions that are the implementations of several linear time-complexity and loglinear time-complexity methods in addition to some novel techniques. Initialization of fuzzy membership degrees matrices is another important task for starting the probabilistic and possibilistic partitioning algorithms. In order to initialize membership degrees matrices required by these algorithms, a number of functions based on some traditional and novel initialization techniques are also available in the package inaparc'.
This package implements a parametric decision-theoretic framework for optimal diagnostic cutoff selection under the family of scale mixtures of skew-normal (SMSN) distributions, including the skew-normal (SN) and skew-t (ST) models as special cases. The optimal cutoff is defined by minimising a weighted misclassification risk that incorporates disease prevalence and asymmetric costs, leading to a likelihood-ratio equation that generalises the Youden criterion. Under a monotone likelihood ratio condition, existence, uniqueness, and global optimality of the cutoff are established. Asymptotic normality and a closed-form plug-in variance estimator are provided via the implicit function theorem and the multivariate delta method. Tools for model fitting, cutoff estimation, confidence intervals, the local identifiability diagnostic, and Monte Carlo simulation are included. The methodology is described in de Paula, Mouriño, and Dias Domingues (2026) <doi:10.48550/arXiv.2605.07829>.
Many Fitbit users, and R-friendly Fitbit users especially, have found themselves curious about their Fitbit data. Fitbit aggregates a large amount of personal data, much of which is interesting for personal research and to satisfy curiosity, and is even potentially useful in medical settings. The goal of fitbitr is to make interfacing with the Fitbit API as streamlined as possible, to make it simple for R users of all backgrounds and comfort levels to analyze their Fitbit data and do whatever they want with it! Currently, fitbitr includes methods for pulling data on activity, sleep, and heart rate, but this list is likely to grow in the future as the package gains more traction and more requests for new methods to be implemented come in. You can find details on the Fitbit API at <https://dev.fitbit.com/build/reference/web-api/>.
Implementation of the Interval Testing Procedure for functional data in different frameworks (i.e., one or two-population frameworks, functional linear models) by means of different basis expansions (i.e., B-spline, Fourier, and phase-amplitude Fourier). The current version of the package requires functional data evaluated on a uniform grid; it automatically projects each function on a chosen functional basis; it performs the entire family of multivariate tests; and, finally, it provides the matrix of the p-values of the previous tests and the vector of the corrected p-values. The functional basis, the coupled or uncoupled scenario, and the kind of test can be chosen by the user. The package provides also a plotting function creating a graphical output of the procedure: the p-value heat-map, the plot of the corrected p-values, and the plot of the functional data.
This package provides core computational operations in C++ via RcppArmadillo', enabling faster performance than pure R, improved numerical stability, and parallel execution with OpenMP where available. On systems without OpenMP support, the package automatically falls back to single-threaded execution with no user configuration required. For efficient model selection, it integrates with CVST to provide sequential-testing cross-validation that identifies competitive hyperparameters without exhaustive grid search. The package offers a unified interface for exact kernel ridge regression and three scalable approximationsâ Nyström, Pivoted Cholesky, and Random Fourier Featuresâ allowing analyses with substantially larger sample sizes than are feasible with exact KRR. It also integrates with the tidymodels ecosystem via the parsnip model specification krr_reg', and the S3 method tunable.krr_reg(). To understand the theoretical background, one can refer to Wainwright (2019) <doi:10.1017/9781108627771>.
This small collection of functions provides what we call elemental graphics for display of analysis of variance results, David C. Hoaglin, Frederick Mosteller and John W. Tukey (1991, ISBN:978-0-471-52735-0), Paul R. Rosenbaum (1989) <doi:10.2307/2684513>, Robert M. Pruzek and James E. Helmreich <https://jse.amstat.org/v17n1/helmreich.html>. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular analysis of variance methods. These functions can be particularly helpful for students and non-statistician analysts. But these methods should be quite generally helpful for work-a-day applications of all kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of non-linear transformations of data.
The theoretical covariance between pairs of markers is calculated from either paternal haplotypes and maternal linkage disequilibrium (LD) or vise versa. A genetic map is required. Grouping of markers is based on the correlation matrix and a representative marker is suggested for each group. Employing the correlation matrix, optimal sample size can be derived for association studies based on a SNP-BLUP approach. The implementation relies on paternal half-sib families and biallelic markers. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. Wittenburg, Bonk, Doschoris, Reyer (2020) "Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations" <doi:10.1186/s12863-020-00871-1>. Carlson, Eberle, Rieder, Yi, Kruglyak, Nickerson (2004) "Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium" <doi:10.1086/381000>.
This package provides a flexible general-purpose toolbox for implementing Rescorla-Wagner models in multi-armed bandit tasks. As the successor and functional extension of the binaryRL package, multiRL modularizes the Markov Decision Process (MDP) into six core components. This framework enables users to construct custom models via intuitive if-else syntax and define latent learning rules for agents. For parameter estimation, it provides both likelihood-based inference (MLE and MAP) and simulation-based inference (ABC and RNN), with full support for parallel processing across subjects. The workflow is highly standardized, featuring four main functions that strictly follow the four-step protocol (and ten rules) proposed by Wilson & Collins (2019) <doi:10.7554/eLife.49547>. Beyond the three built-in models (TD, RSTD, and Utility), users can easily derive new variants by declaring which variables are treated as free parameters.