Multiple comparison techniques are typically applied following an F test from an ANOVA to decide which means are significantly different from one another. As an alternative to traditional methods, cluster analysis can be performed to group the means of different treatments into non-overlapping clusters. Treatments in different groups are considered statistically different. Several approaches have been proposed, with varying clustering methods and cut-off criteria. This package implements cluster-based multiple comparisons tests and also provides a visual representation in the form of a dendrogram. Di Rienzo, J. A., Guzman, A. W., & Casanoves, F. (2002) <jstor.org/stable/1400690>. Bautista, M. G., Smith, D. W., & Steiner, R. L. (1997) <doi:10.2307/1400402>.
This package implements a flexible, versatile, and computationally tractable model for density regression based on a single-weights dependent Dirichlet process mixture of normal distributions model for univariate continuous responses. The model assumes an additive structure for the mean of each mixture component and the effects of continuous covariates are captured through smooth nonlinear functions. The key components of our modelling approach are penalised B-splines and their bivariate tensor product extension. The proposed method can also easily deal with parametric effects of categorical covariates, linear effects of continuous covariates, interactions between categorical and/or continuous covariates, varying coefficient terms, and random effects. Please see Rodriguez-Alvarez, Inacio et al. (2025) for more details.
Graphical tools and goodness-of-fit tests for right-censored data: 1. Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests, which use the empirical distribution function for complete data and are extended for right-censored data. 2. Generalized chi-squared-type test, which is based on the squared differences between observed and expected counts using random cells with right-censored data. 3. A series of graphical tools such as probability or cumulative hazard plots to guide the decision about the most suitable parametric model for the data. These functions share several features as they can handle both complete and right-censored data, and they provide parameter estimates for the distributions under study.
Generating, evaluating, and selecting initialization strategies for Gaussian Mixture Models (GMMs), along with functions to run the Expectation-Maximization (EM) algorithm. Initialization methods are compared using log-likelihood, and the best-fitting model can be selected using BIC. Methods build on initialization strategies for finite mixture models described in Michael and Melnykov (2016) <doi:10.1007/s11634-016-0264-8> and Biernacki et al. (2003) <doi:10.1016/S0167-9473(02)00163-9>, and on the EM algorithm of Dempster et al. (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x>. Background on model-based clustering includes Fraley and Raftery (2002) <doi:10.1198/016214502760047131> and McLachlan and Peel (2000, ISBN:9780471006268).
Nonparametric unfolding item response theory (IRT) model for dichotomous data (see W.H. Van Schuur (1984). Structure in Political Beliefs: A New Model for Stochastic Unfolding with Application to European Party Activists, and W.J.Post (1992). Nonparametric Unfolding Models: A Latent Structure Approach). The package implements MUDFOLD (Multiple UniDimensional unFOLDing), an iterative item selection algorithm that constructs unfolding scales from dichotomous preferential-choice data without explicitly assuming a parametric form of the item response functions. Scale diagnostics from Post(1992) and estimates for the person locations proposed by Johnson(2006) and Van Schuur(1984) are also available. This model can be seen as the unfolding variant of Mokken(1971) scaling method.
An easy-to-use tool for implementing Neural Ordinary Differential Equations (NODEs) in pharmacometric software such as Monolix', NONMEM', and nlmixr2', see Bräm et al. (2024) <doi:10.1007/s10928-023-09886-4> and Bräm et al. (2025) <doi:10.1002/psp4.13265>. The main functionality is to automatically generate structural model code describing computations within a neural network. Additionally, parameters and software settings can be initialized automatically. For using these additional functionalities with Monolix', pmxNODE interfaces with MonolixSuite via the lixoftConnectors package. The lixoftConnectors package is distributed with MonolixSuite (<https://monolixsuite.slp-software.com/r-functions/2024R1/package-lixoftconnectors>) and is not available from public repositories.
This package provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874> The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package shapper is a port of the Python library shap'.
Unlock the power of large-scale geospatial analysis, quickly generate high-resolution kernel density visualizations, supporting advanced analysis tasks such as bandwidth-tuning and spatiotemporal analysis. Regardless of the size of your dataset, our library delivers efficient and accurate results. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu, Reynold Cheng (2023) <doi:10.1145/3555041.3589401>. Tsz Nam Chan, Rui Zang, Pak Lon Ip, Leong Hou U, Jianliang Xu (2023) <doi:10.1145/3555041.3589711>. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.1145/3514221.3517823>. Tsz Nam Chan, Pak Lon Ip, Kaiyan Zhao, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3554821.3554855>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3503585.3503591>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3494124.3494135>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Weng Hou Tong, Shivansh Mittal, Ye Li, Reynold Cheng (2021) <doi:10.14778/3476311.3476312>. Tsz Nam Chan, Zhe Li, Leong Hou U, Jianliang Xu, Reynold Cheng (2021) <doi:10.14778/3461535.3461540>. Tsz Nam Chan, Reynold Cheng, Man Lung Yiu (2020) <doi:10.1145/3318464.3380561>. Tsz Nam Chan, Leong Hou U, Reynold Cheng, Man Lung Yiu, Shivansh Mittal (2020) <doi:10.1109/TKDE.2020.3018376>. Tsz Nam Chan, Man Lung Yiu, Leong Hou U (2019) <doi:10.1109/ICDE.2019.00055>.
This package provides methods for detecting and visualizing cladogenic shifts in multivariate trait data on phylogenies. Implements penalized-likelihood multivariate generalized least squares models, enabling analyses of high-dimensional trait datasets and large trees via searchOptimalConfiguration(). Includes a greedy step-wise shift-search algorithm following approaches developed in Smith et al. (2023) <doi:10.1111/nph.19099> and Berv et al. (2024) <doi:10.1126/sciadv.adp0114>. Methods build on multivariate GLS approaches described in Clavel et al. (2019) <doi:10.1093/sysbio/syy045> and implemented in the mvgls() function from the mvMORPH package. Documentation and vignettes are available at <https://jakeberv.com/bifrost/>, including worked examples for the jaw-shape dataset.
Analyses government debt sustainability using the standard debt dynamics framework from Blanchard (1990) <doi:10.1787/budget-v2-art12-en> and the IMF Debt Sustainability Analysis methodology (IMF, 2013) and the Sovereign Risk and Debt Sustainability Framework (IMF, 2022). Projects debt-to-GDP paths, decomposes historical debt changes into interest, growth, and primary balance contributions, and estimates fiscal reaction functions following Bohn (1998) <doi:10.1162/003355398555793>. Produces stochastic fan charts via Monte Carlo simulation, standardised stress tests, and IMF- style heat map risk assessments. Computes S1/S2 sustainability gap indicators used by the European Commission. All methods are pure computation with no external dependencies beyond base R; works with fiscal data from any source.
This package implements a Bayesian Optimal Phase II design (DTE-BOP2) for trials with delayed treatment effects, particularly relevant to immunotherapy studies where treatment benefits may emerge after a delay. The method builds upon the BOP2 framework and incorporates uncertainty in the delay timepoint through a truncated gamma prior, informed by expert knowledge or default settings. Supports two-arm trial designs with functionality for sample size determination, interim and final analyses, and comprehensive simulation under various delay and design scenarios. Ensures rigorous type I and II error control while improving trial efficiency and power when the delay effect is present. A manuscript describing the methodology is under development and will be formally referenced upon publication.
This package implements species distribution modeling and ecological niche modeling, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), interpolating across a time series of rasters, and use of spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability with other modeling packages, no new classes are created. The package works with PROJ6 geodetic objects and coordinate reference systems.
This package provides ability to create color palettes from image files. It offers control over the type of color palette to derive from an image (qualitative, sequential or divergent) and other palette properties. Quantiles of an image color distribution can be trimmed. Near-black or near-white colors can be trimmed in RGB color space independent of trimming brightness or saturation distributions in HSV color space. Creating sequential palettes also offers control over the order of HSV color dimensions to sort by. This package differs from other related packages like RImagePalette in approaches to quantizing and extracting colors in images to assemble color palettes and the level of user control over palettes construction.
Import, create and assemble data needed to fit spatial-statistical stream-network models using the SSN2 package for R'. Streams, observations, and prediction locations are represented as simple features and specific tools provided to define topological relationships between features; calculate the hydrologic distances (with flow-direction preserved) and the spatial additive function used to weight converging stream segments; and export the topological, spatial, and attribute information to an `SSN` (spatial stream network) object, which can be efficiently stored, accessed and analysed in R'. A detailed description of methods used to calculate and format the spatial data can be found in Peterson, E.E. and Ver Hoef, J.M., (2014) <doi:10.18637/jss.v056.i02>.
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", summarized according to economic status (class), sex, age and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are the individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are also the data sets downloaded from the Kaggle competition and thus lowers the barrier to entry for users new to R or machine learing.
treekoR is a novel framework that aims to utilise the hierarchical nature of single cell cytometry data to find robust and interpretable associations between cell subsets and patient clinical end points. These associations are aimed to recapitulate the nested proportions prevalent in workflows inovlving manual gating, which are often overlooked in workflows using automatic clustering to identify cell populations. We developed treekoR to: Derive a hierarchical tree structure of cell clusters; quantify a cell types as a proportion relative to all cells in a sample (%total), and, as the proportion relative to a parent population (%parent); perform significance testing using the calculated proportions; and provide an interactive html visualisation to help highlight key results.
Fit Bayesian multivariate GARCH models using Stan for full Bayesian inference. Generate (weighted) forecasts for means, variances (volatility) and correlations. Currently DCC(P,Q), CCC(P,Q), pdBEKK(P,Q), and BEKK(P,Q) parameterizations are implemented, based either on a multivariate gaussian normal or student-t distribution. DCC and CCC models are based on Engle (2002) <doi:10.1198/073500102288618487> and Bollerslev (1990). The BEKK parameterization follows Engle and Kroner (1995) <doi:10.1017/S0266466600009063> while the pdBEKK as well as the estimation approach for this package is described in Rast et al. (2020) <doi:10.31234/osf.io/j57pk>. The fitted models contain rstan objects and can be examined with rstan functions.
This package provides Bayesian probabilistic methods for record linkage and entity resolution across multiple datasets using the Comprehensive Hit Or Miss Probabilistic Entity Resolution (CHOMPER) model. The package implements three main inference approaches: (1) Evolutionary Variational Inference for record Linkage (EVIL), (2) Coordinate Ascent Variational Inference (CAVI), and (3) Markov Chain Monte Carlo (MCMC) with split and merge process. The model supports both discrete and continuous fields, and it performs locally-varying hit mechanism for the attributes with multiple truths. It also provides tools for performance evaluation based on either approximated variational factors or posterior samples. The package is designed to support parallel computing with multi-threading support for EVIL to estimate the linkage structure faster.
Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes, but often require collection of additional information may be cumbersome to apply to models that yield a continuous result. Decision curve analysis is a method for evaluating and comparing prediction models that incorporates clinical consequences, requires only the data set on which the models are tested, and can be applied to models that have either continuous or dichotomous results. See the following references for details on the methods: Vickers (2006) <doi:10.1177/0272989X06295361>, Vickers (2008) <doi:10.1186/1472-6947-8-53>, and Pfeiffer (2020) <doi:10.1002/bimj.201800240>.
Open source data allows for reproducible research and helps advance our knowledge. The purpose of this package is to collate open source ophthalmic data sets curated for direct use. This is real life data of people with intravitreal injections with anti-vascular endothelial growth factor (anti-VEGF), due to age-related macular degeneration or diabetic macular edema. Associated publications of the data sets: Fu et al. (2020) <doi:10.1001/jamaophthalmol.2020.5044>, Moraes et al (2020) <doi:10.1016/j.ophtha.2020.09.025>, Fasler et al. (2019) <doi:10.1136/bmjopen-2018-027441>, Arpa et al. (2020) <doi:10.1136/bjophthalmol-2020-317161>, Kern et al. 2020, <doi:10.1038/s41433-020-1048-0>.
Genotyping assays for bi-allelic markers (e.g. SNPs) produce signal intensities for the two alleles. fitPoly assigns genotypes (allele dosages) to a collection of polyploid samples based on these signal intensities. fitPoly replaces the older package fitTetra that was limited (a.o.) to only tetraploid populations whereas fitPoly accepts any ploidy level. Reference: Voorrips RE, Gort G, Vosman B (2011) <doi:10.1186/1471-2105-12-172>. New functions added on conversion of data from SNP array software formats, drawing of XY-scatterplots with or without genotype colors, checking against expected F1 segregation patterns, comparing results from two different assays (probes) for the same SNP, recovery from a saveMarkerModels() crash.
Network meta-analysis for survival outcome data often involves several studies only involve dichotomized outcomes (e.g., the numbers of event and sample sizes of individual arms). To combine these different outcome data, Woods et al. (2010) <doi:10.1186/1471-2288-10-54> proposed a Bayesian approach using complicated hierarchical models. Besides, frequentist approaches have been alternative standard methods for the statistical analyses of network meta-analysis, and the methodology has been well established. We proposed an easy-to-implement method for the network meta-analysis based on the frequentist framework in Noma and Maruo (2025) <doi:10.1101/2025.01.23.25321051>. This package involves some convenient functions to implement the simple synthesis method.
Set of functions that improves the graphical presentations of the functions: wave.correlation and spin.correlation (waveslim package, Whitcher 2012) and the wave.multiple.correlation and wave.multiple.cross.correlation (wavemulcor package, Fernandez-Macho 2012b). The plot outputs (heatmaps) can be displayed in the screen or can be saved as PNG or JPG images or as PDF or EPS formats. The W2CWM2C package also helps to handle the (input data) multivariate time series easily as a list of N elements (times series) and provides a multivariate data set (dataexample) to exemplify its use. A description of the package was published in a scientific paper: Polanco-Martinez and Fernandez-Macho (2014), <doi:10.1109/MCSE.2014.96>.
This package provides functions are designed to facilitate access to and utility with large scale, publicly available environmental data in R. The package contains functions for downloading raw data files from web URLs (download_data()), processing the raw data files into clean spatial objects (process_covariates()), and extracting values from the spatial data objects at point and polygon locations (calculate_covariates()). These functions call a series of source-specific functions which are tailored to each data sources/datasets particular URL structure, data format, and spatial/temporal resolution. The functions are tested, versioned, and open source and open access. For sum_edc() method details, see Messier, Akita, and Serre (2012) <doi:10.1021/es203152a>.