Investigating and visualising Bayesian Additive Regression Tree (BART) (Chipman, H. A., George, E. I., & McCulloch
, R. E. 2010) <doi:10.1214/09-AOAS285> model fits. We construct conventional plots to analyze a modelâ s performance and stability as well as create new tree-based plots to analyze variable importance, interaction, and tree structure. We employ Value Suppressing Uncertainty Palettes (VSUP) to construct heatmaps that display variable importance and interactions jointly using colour scale to represent posterior uncertainty. Our visualisations are designed to work with the most popular BART R packages available, namely BART Rodney Sparapani and Charles Spanbauer and Robert McCulloch
2021 <doi:10.18637/jss.v097.i01>, dbarts (Vincent Dorie 2023) <https://CRAN.R-project.org/package=dbarts>, and bartMachine
(Adam Kapelner and Justin Bleich 2016) <doi:10.18637/jss.v070.i04>.
Various methods for the identification of trend and seasonal components in time series (TS) are provided. Among them is a data-driven locally weighted regression approach with automatically selected bandwidth for equidistant short-memory time series. The approach is a combination / extension of the algorithms by Feng (2013) <doi:10.1080/02664763.2012.740626> and Feng, Y., Gries, T., and Fritz, M. (2020) <doi:10.1080/10485252.2020.1759598> and a brief description of this new method is provided in the package documentation. Furthermore, the package allows its users to apply the base model of the Berlin procedure, version 4.1, as described in Speth (2004) <https://www.destatis.de/DE/Methoden/Saisonbereinigung/BV41-methodenbericht-Heft3_2004.pdf?__blob=publicationFile>
. Permission to include this procedure was kindly provided by the Federal Statistical Office of Germany.
Transferring over a code base from Matlab to R is often a repetitive and inefficient use of time. This package provides a translator for Matlab / Octave code into R code. It does some syntax changes, but most of the heavy lifting is in the function changes since the languages are so similar. Options for different data structures and the functions that can be changed are given. The Matlab code should be mostly in adherence to the standard style guide but some effort has been made to accommodate different number of spaces and other small syntax issues. This will not make the code more R friendly and may not even run afterwards. However, the rudimentary syntax, base function and data structure conversion is done quickly so that the maintainer can focus on changes to the design structure.
Utilize an orthogonality constrained optimization algorithm of Wen & Yin (2013) <DOI:10.1007/s10107-012-0584-1> to solve a variety of dimension reduction problems in the semiparametric framework, such as Ma & Zhu (2012) <DOI:10.1080/01621459.2011.646925>, Ma & Zhu (2013) <DOI:10.1214/12-AOS1072>, Sun, Zhu, Wang & Zeng (2019) <DOI:10.1093/biomet/asy064> and Zhou, Zhu & Zeng (2021) <DOI:10.1093/biomet/asaa087>. The package also implements some existing dimension reduction methods such as hMave
by Xia, Zhang, & Xu (2010) <DOI:10.1198/jasa.2009.tm09372> and partial SAVE by Feng, Wen & Zhu (2013) <DOI:10.1080/01621459.2012.746065>. It also serves as a general purpose optimization solver for problems with orthogonality constraints, i.e., in Stiefel manifold. Parallel computing for approximating the gradient is enabled through OpenMP
'.
Calculates a comprehensive list of features from profile hidden Markov models (HMMs) of proteins. Adapts and ports features for use with HMMs instead of Position Specific Scoring Matrices, in order to take advantage of more accurate multiple sequence alignment by programs such as HHBlits Remmert et al. (2012) <DOI:10.1038/nmeth.1818> and HMMer Eddy (2011) <DOI:10.1371/journal.pcbi.1002195>. Features calculated by this package can be used for protein fold classification, protein structural class prediction, sub-cellular localization and protein-protein interaction, among other tasks. Some examples of features extracted are found in Song et al. (2018) <DOI:10.3390/app8010089>, Jin & Zhu (2021) <DOI:10.1155/2021/8629776>, Lyons et al. (2015) <DOI:10.1109/tnb.2015.2457906> and Saini et al. (2015) <DOI:10.1016/j.jtbi.2015.05.030>.
When comparing single cases to control populations and no parameters are known researchers and clinicians must estimate these with a control sample. This is often done when testing a case's abnormality on some variable or testing abnormality of the discrepancy between two variables. Appropriate frequentist and Bayesian methods for doing this are here implemented, including tests allowing for the inclusion of covariates. These have been developed first and foremost by John Crawford and Paul Garthwaite, e.g. in Crawford and Howell (1998) <doi:10.1076/clin.12.4.482.7241>, Crawford and Garthwaite (2005) <doi:10.1037/0894-4105.19.3.318>, Crawford and Garthwaite (2007) <doi:10.1080/02643290701290146> and Crawford, Garthwaite and Ryan (2011) <doi:10.1016/j.cortex.2011.02.017>. The package is also equipped with power calculators for each method.
This package provides methods for building self-organizing maps (SOMs) with a number of distinguishing features such automatic centroid detection and cluster visualization using starbursts. For more details see the paper "Improved Interpretability of the Unified Distance Matrix with Connected Components" by Hamel and Brown (2011) in <ISBN:1-60132-168-6>. The package provides user-friendly access to two models we construct: (a) a SOM model and (b) a centroid based clustering model. The package also exposes a number of quality metrics for the quantitative evaluation of the map, Hamel (2016) <doi:10.1007/978-3-319-28518-4_4>. Finally, we reintroduced our fast, vectorized training algorithm for SOM with substantial improvements. It is about an order of magnitude faster than the canonical, stochastic C implementation <doi:10.1007/978-3-030-01057-7_60>.
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep()
, tolower()
, sprintf()
, and strptime()
) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to ICU (International Components for Unicode) and stringi', they are fast, reliable, and portable across different platforms.
Inference procedures accommodate a flexible range of hazard ratio patterns with a two-sample semi-parametric model. This model contains the proportional hazards model and the proportional odds model as sub-models, and accommodates non-proportional hazards situations to the extreme of having crossing hazards and crossing survivor functions. Overall, this package has four major functions: 1) the parameter estimation, namely short-term and long-term hazard ratio parameters; 2) 95 percent and 90 percent point-wise confidence intervals and simultaneous confidence bands for the hazard ratio function; 3) p-value of the adaptive weighted log-rank test; 4) p-values of two lack-of-fit tests for the model. See the included "read_me_first.pdf" for brief instructions. In this version (1.1), there is no need to sort the data before applying this package.
Computation of adherence to medications from Electronic Health care Data and visualization of individual medication histories and adherence patterns. The package implements a set of S3 classes and functions consistent with current adherence guidelines and definitions. It allows the computation of different measures of adherence (as defined in the literature, but also several original ones), their publication-quality plotting, the estimation of event duration and time to initiation, the interactive exploration of patient medication history and the real-time estimation of adherence given various parameter settings. It scales from very small datasets stored in flat CSV files to very large databases and from single-thread processing on mid-range consumer laptops to parallel processing on large heterogeneous computing clusters. It exposes a standardized interface allowing it to be used from other programming languages and platforms, such as Python.
Statistical methods for ROC surface analysis in three-class classification problems for clustered data and in presence of covariates. In particular, the package allows to obtain covariate-specific point and interval estimation for: (i) true class fractions (TCFs) at fixed pairs of thresholds; (ii) the ROC surface; (iii) the volume under ROC surface (VUS); (iv) the optimal pairs of thresholds. Methods considered in points (i), (ii) and (iv) are proposed and discussed in To et al. (2022) <doi:10.1177/09622802221089029>. Referring to point (iv), three different selection criteria are implemented: Generalized Youden Index (GYI), Closest to Perfection (CtP
) and Maximum Volume (MV). Methods considered in point (iii) are proposed and discussed in Xiong et al. (2018) <doi:10.1177/0962280217742539>. Visualization tools are also provided. We refer readers to the articles cited above for all details.
Fits a geographically weighted regression model using zero inflated probability distributions. Has the zero inflated negative binomial distribution (zinb) as default, but also accepts the zero inflated Poisson (zip), negative binomial (negbin) and Poisson distributions. Can also fit the global versions of each regression model. Da Silva, A. R. & De Sousa, M. D. R. (2023). "Geographically weighted zero-inflated negative binomial regression: A general case for count data", Spatial Statistics <doi:10.1016/j.spasta.2023.100790>. Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). "Geographically weighted regression: a method for exploring spatial nonstationarity", Geographical Analysis, <doi:10.1111/j.1538-4632.1996.tb00936.x>. Yau, K. K. W., Wang, K., & Lee, A. H. (2003). "Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros", Biometrical Journal, <doi:10.1002/bimj.200390024>.
Processes amino acid alignments produced by the IPD-IMGT/HLA (Immuno Polymorphism-ImMunoGeneTics/Human
Leukocyte Antigen) Database to identify user-defined amino acid residue motifs shared across HLA alleles, HLA alleles, or HLA haplotypes, and calculates frequencies based on HLA allele frequency data. SSHAARP (Searching Shared HLA Amino Acid Residue Prevalence) uses Generic Mapping Tools (GMT) software and the GMT R package to generate global frequency heat maps that illustrate the distribution of each user-defined map around the globe. SSHAARP analyzes the allele frequency data described by Solberg et al. (2008) <doi:10.1016/j.humimm.2008.05.001>, a global set of 497 population samples from 185 published datasets, representing 66,800 individuals total. Users may also specify their own datasets, but file conventions must follow the prebundled Solberg dataset, or the mock haplotype dataset.
Estimation of a density from grouped (tabulated) summary statistics evaluated in each of the big bins (or classes) partitioning the support of the variable. These statistics include class frequencies and central moments of order one up to four. The log-density is modelled using a linear combination of penalised B-splines. The multinomial log-likelihood involving the frequencies adds up to a roughness penalty based on the differences in the coefficients of neighbouring B-splines and the log of a root-n approximation of the sampling density of the observed vector of central moments in each class. The so-obtained penalized log-likelihood is maximized using the EM algorithm to get an estimate of the spline parameters and, consequently, of the variable density and related quantities such as quantiles, see Lambert, P. (2021) <arXiv:2107.03883>
for details.
This package provides functions for (1) soil water retention (SWC) and unsaturated hydraulic conductivity (Ku) (van Genuchten-Mualem (vGM
or vG
) [1, 2], Peters-Durner-Iden (PDI) [3, 4, 5], Brooks and Corey (bc) [8]), (2) fitting of parameter for SWC and/or Ku using Shuffled Complex Evolution (SCE) optimisation and (3) calculation of soil hydraulic properties (Ku and soil water contents) based on the simplified evaporation method (SEM) [6, 7]. Main references: [1] van Genuchten (1980) <doi:10.2136/sssaj1980.03615995004400050002x>, [2] Mualem (1976) <doi:10.1029/WR012i003p00513>, [3] Peters (2013) <doi:10.1002/wrcr.20548>, [4] Iden and Durner (2013) <doi:10.1002/2014WR015937>, [5] Peters (2014) <doi:10.1002/2014WR015937>, [6] Wind G. P. (1966), [7] Peters and Durner (2008) <doi:10.1016/j.jhydrol.2008.04.016> and [8] Brooks and Corey (1964).
The model estimates air pollution removal by dry deposition on trees. It also estimates or uses hourly values for aerodynamic resistance, boundary layer resistance, canopy resistance, stomatal resistance, cuticular resistance, mesophyll resistance, soil resistance, friction velocity and deposition velocity. It also allows plotting graphical results for a specific time period. The pollutants are nitrogen dioxide, ozone, sulphur dioxide, carbon monoxide and particulate matter. Baldocchi D (1994) <doi:10.1093/treephys/14.7-8-9.1069>. Farquhar GD, von Caemmerer S, Berry JA (1980) Planta 149: 78-90. Hirabayashi S, Kroll CN, Nowak DJ (2015) i-Tree Eco Dry Deposition Model. Nowak DJ, Crane DE, Stevens JC (2006) <doi:10.1016/j.ufug.2006.01.007>. US EPA (1999) PCRAMMET User's Guide. EPA-454/B-96-001. Weiss A, Norman JM (1985) Agricultural and Forest Meteorology 34: 205â 213.
This package provides an arsenal of R functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in R and RStudio and which use formulas and versatile summary statistics for summary tables and models. The primary functions include
tableby
, a Table-1-like summary of multiple variable types by the levels of one or more categorical variables;paired
, a Table-1-like summary of multiple variable types paired across two time points;modelsum
, which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates);freqlist
, a powerful frequency table across many categorical variables;comparedf
, a function for comparingdata.frames
; andwrite2
, a function to output tables to a document.
Perform state and parameter inference, and forecasting, in stochastic state-space systems using the ctsmTMB
class. This class, built with the R6 package, provides a user-friendly interface for defining and handling state-space models. Inference is based on maximum likelihood estimation, with derivatives efficiently computed through automatic differentiation enabled by the TMB'/'RTMB packages (Kristensen et al., 2016) <doi:10.18637/jss.v070.i05>. The available inference methods include Kalman filters, in addition to a Laplace approximation-based smoothing method. For further details of these methods refer to the documentation of the CTSMR package <https://ctsm.info/ctsmr-reference.pdf> and Thygesen (2025) <doi:10.48550/arXiv.2503.21358>
. Forecasting capabilities include moment predictions and stochastic path simulations, both implemented in C++ using Rcpp (Eddelbuettel et al., 2018) <doi:10.1080/00031305.2017.1375990> for computational efficiency.
Variable selection methods have been extensively developed for analyzing highdimensional omics data within both the frequentist and Bayesian frameworks. This package provides implementations of the spike-and-slab quantile (group) LASSO which have been developed along the line of Bayesian hierarchical models but deeply rooted in frequentist regularization methods by utilizing Expectationâ Maximization (EM) algorithm. The spike-and-slab quantile LASSO can handle data irregularity in terms of skewness and outliers in response variables, compared to its non-robust alternative, the spike-and-slab LASSO, which has also been implemented in the package. In addition, procedures for fitting the spike-and-slab quantile group LASSO and its non-robust counterpart have been implemented in the form of quantile/least-square varying coefficient mixed effect models for high-dimensional longitudinal data. The core module of this package is developed in C++'.
Density estimation for possibly large data sets and conditional/unconditional random number generation or bootstrapping with distribution element trees. The function det.construct translates a dataset into a distribution element tree. To evaluate the probability density based on a previously computed tree at arbitrary query points, the function det.query is available. The functions det1 and det2 provide density estimation and plotting for one- and two-dimensional datasets. Conditional/unconditional smooth bootstrapping from an available distribution element tree can be performed by det.rnd'. For more details on distribution element trees, see: Meyer, D.W. (2016) <arXiv:1610.00345>
or Meyer, D.W., Statistics and Computing (2017) <doi:10.1007/s11222-017-9751-9> and Meyer, D.W. (2017) <arXiv:1711.04632>
or Meyer, D.W., Journal of Computational and Graphical Statistics (2018) <doi:10.1080/10618600.2018.1482768>.
Collection of data sets from various assessments that can be used to evaluate psychometric models. These data sets have been analyzed in the following papers that introduced new methodology as part of the application section: Jimenez, A., Balamuta, J. J., & Culpepper, S. A. (2023) <doi:10.1111/bmsp.12307>, Culpepper, S. A., & Balamuta, J. J. (2021) <doi:10.1080/00273171.2021.1985949>, Yinghan Chen et al. (2021) <doi:10.1007/s11336-021-09750-9>, Yinyin Chen et al. (2020) <doi:10.1007/s11336-019-09693-2>, Culpepper, S. A. (2019a) <doi:10.1007/s11336-019-09683-4>, Culpepper, S. A. (2019b) <doi:10.1007/s11336-018-9643-8>, Culpepper, S. A., & Chen, Y. (2019) <doi:10.3102/1076998618791306>, Culpepper, S. A., & Balamuta, J. J. (2017) <doi:10.1007/s11336-015-9484-7>, and Culpepper, S. A. (2015) <doi:10.3102/1076998615595403>.
It is a hybrid spatial model that combines the variable selection capabilities of stepwise regression methods with the predictive power of the Geographically Weighted Regression(GWR) model.The developed hybrid model follows a two-step approach where the stepwise variable selection method is applied first to identify the subset of predictors that have the most significant impact on the response variable, and then a GWR model is fitted using those selected variables for spatial prediction at test or unknown locations. For method details,see Leung, Y., Mei, C. L. and Zhang, W. X. (2000).<DOI:10.1068/a3162>.This hybrid spatial model aims to improve the accuracy and interpretability of GWR predictions by selecting a subset of relevant variables through a stepwise selection process.This approach is particularly useful for modeling spatially varying relationships and improving the accuracy of spatial predictions.
Conditional graphical lasso estimator is an extension of the graphical lasso proposed to estimate the conditional dependence structure of a set of p response variables given q predictors. This package provides suitable extensions developed to study datasets with censored and/or missing values. Standard conditional graphical lasso is available as a special case. Furthermore, the package provides an integrated set of core routines for visualization, analysis, and simulation of datasets with censored and/or missing values drawn from a Gaussian graphical model. Details about the implemented models can be found in Augugliaro et al. (2023) <doi: 10.18637/jss.v105.i01>, Augugliaro et al. (2020b) <doi: 10.1007/s11222-020-09945-7>, Augugliaro et al. (2020a) <doi: 10.1093/biostatistics/kxy043>, Yin et al. (2001) <doi: 10.1214/11-AOAS494> and Stadler et al. (2012) <doi: 10.1007/s11222-010-9219-7>.
Partitioning clustering algorithms divide data sets into k subsets or partitions so-called clusters. They require some initialization procedures for starting the algorithms. Initialization of cluster prototypes is one of such kind of procedures for most of the partitioning algorithms. Cluster prototypes are the centers of clusters, i.e. centroids or medoids, representing the clusters in a data set. In order to initialize cluster prototypes, the package inaparc contains a set of the functions that are the implementations of several linear time-complexity and loglinear time-complexity methods in addition to some novel techniques. Initialization of fuzzy membership degrees matrices is another important task for starting the probabilistic and possibilistic partitioning algorithms. In order to initialize membership degrees matrices required by these algorithms, a number of functions based on some traditional and novel initialization techniques are also available in the package inaparc'.