Fits time trend models for routine disease surveillance tasks and returns probability distributions for a variety of quantities of interest, including age-standardized rates, period and cumulative percent change, and measures of health inequality. The models are appropriate for count data such as disease incidence and mortality data, employing a Poisson or binomial likelihood and the first-difference (random-walk) prior for unknown risk. Optionally add a covariance matrix for multiple, correlated time series models. Inference is completed using Markov chain Monte Carlo via the Stan modeling language. References: Donegan, Hughes, and Lee (2022) <doi:10.2196/34589>; Stan Development Team (2021) <https://mc-stan.org>; Theil (1972, ISBN:0-444-10378-3).
This package provides a recent method proposed by Yi and Chen (2023) <doi:10.1177/09622802221146308> is used to estimate the average treatment effects using noisy data containing both measurement error and spurious variables. The package AteMeVs
contains a set of functions that provide a step-by-step estimation procedure, including the correction of the measurement error effects, variable selection for building the model used to estimate the propensity scores, and estimation of the average treatment effects. The functions contain multiple options for users to implement, including different ways to correct for the measurement error effects, distinct choices of penalty functions to do variable selection, and various regression models to characterize propensity scores.
Assigns standardized diagnoses using the Banff Classification (Category 1 to 6 diagnoses, including Acute and Chronic active T-cell mediated rejection as well as Active, Chronic active, and Chronic antibody mediated rejection). The main function considers a minimal dataset containing biopsies information in a specific format (described by a data dictionary), verifies its content and format (based on the data dictionary), assigns diagnoses, and creates a summary report. The package is developed on the reference guide to the Banff classification of renal allograft pathology Roufosse C, Simmonds N, Clahsen-van Groningen M, et al. A (2018) <doi:10.1097/TP.0000000000002366>. The full description of the Banff classification is available at <https://banfffoundation.org/>.
Multiple comparison techniques are typically applied following an F test from an ANOVA to decide which means are significantly different from one another. As an alternative to traditional methods, cluster analysis can be performed to group the means of different treatments into non-overlapping clusters. Treatments in different groups are considered statistically different. Several approaches have been proposed, with varying clustering methods and cut-off criteria. This package implements cluster-based multiple comparisons tests and also provides a visual representation in the form of a dendrogram. Di Rienzo, J. A., Guzman, A. W., & Casanoves, F. (2002) <jstor.org/stable/1400690>. Bautista, M. G., Smith, D. W., & Steiner, R. L. (1997) <doi:10.2307/1400402>.
This package implements a flexible, versatile, and computationally tractable model for density regression based on a single-weights dependent Dirichlet process mixture of normal distributions model for univariate continuous responses. The model assumes an additive structure for the mean of each mixture component and the effects of continuous covariates are captured through smooth nonlinear functions. The key components of our modelling approach are penalised B-splines and their bivariate tensor product extension. The proposed method can also easily deal with parametric effects of categorical covariates, linear effects of continuous covariates, interactions between categorical and/or continuous covariates, varying coefficient terms, and random effects. Please see Rodriguez-Alvarez, Inacio et al. (2025) for more details.
Graphical tools and goodness-of-fit tests for right-censored data: 1. Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests, which use the empirical distribution function for complete data and are extended for right-censored data. 2. Generalized chi-squared-type test, which is based on the squared differences between observed and expected counts using random cells with right-censored data. 3. A series of graphical tools such as probability or cumulative hazard plots to guide the decision about the most suitable parametric model for the data. These functions share several features as they can handle both complete and right-censored data, and they provide parameter estimates for the distributions under study.
This package implements multiple variants of the Information Bottleneck ('IB') method for clustering datasets containing mixed-type variables (nominal, ordinal, and continuous). The package provides deterministic, agglomerative, generalized, and standard IB clustering algorithms that preserve relevant information while forming interpretable clusters. The Deterministic Information Bottleneck is described in Costa et al. (2024) <doi:10.48550/arXiv.2407.03389>
. The standard IB method originates from Tishby et al. (2000) <doi:10.48550/arXiv.physics/0004057>
, the agglomerative variant from Slonim and Tishby (1999) <https://papers.nips.cc/paper/1651-agglomerative-information-bottleneck>, and the generalized IB for Gaussian variables from Chechik et al. (2005) <https://www.jmlr.org/papers/volume6/chechik05a/chechik05a.pdf>.
Nonparametric unfolding item response theory (IRT) model for dichotomous data (see W.H. Van Schuur (1984). Structure in Political Beliefs: A New Model for Stochastic Unfolding with Application to European Party Activists, and W.J.Post (1992). Nonparametric Unfolding Models: A Latent Structure Approach). The package implements MUDFOLD (Multiple UniDimensional
unFOLDing
), an iterative item selection algorithm that constructs unfolding scales from dichotomous preferential-choice data without explicitly assuming a parametric form of the item response functions. Scale diagnostics from Post(1992) and estimates for the person locations proposed by Johnson(2006) and Van Schuur(1984) are also available. This model can be seen as the unfolding variant of Mokken(1971) scaling method.
This package provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations
(SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874>
The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package shapper is a port of the Python library shap'.
Unlock the power of large-scale geospatial analysis, quickly generate high-resolution kernel density visualizations, supporting advanced analysis tasks such as bandwidth-tuning and spatiotemporal analysis. Regardless of the size of your dataset, our library delivers efficient and accurate results. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu, Reynold Cheng (2023) <doi:10.1145/3555041.3589401>. Tsz Nam Chan, Rui Zang, Pak Lon Ip, Leong Hou U, Jianliang Xu (2023) <doi:10.1145/3555041.3589711>. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.1145/3514221.3517823>. Tsz Nam Chan, Pak Lon Ip, Kaiyan Zhao, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3554821.3554855>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3503585.3503591>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3494124.3494135>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Weng Hou Tong, Shivansh Mittal, Ye Li, Reynold Cheng (2021) <doi:10.14778/3476311.3476312>. Tsz Nam Chan, Zhe Li, Leong Hou U, Jianliang Xu, Reynold Cheng (2021) <doi:10.14778/3461535.3461540>. Tsz Nam Chan, Reynold Cheng, Man Lung Yiu (2020) <doi:10.1145/3318464.3380561>. Tsz Nam Chan, Leong Hou U, Reynold Cheng, Man Lung Yiu, Shivansh Mittal (2020) <doi:10.1109/TKDE.2020.3018376>. Tsz Nam Chan, Man Lung Yiu, Leong Hou U (2019) <doi:10.1109/ICDE.2019.00055>.
This package implements a Bayesian Optimal Phase II design (DTE-BOP2) for trials with delayed treatment effects, particularly relevant to immunotherapy studies where treatment benefits may emerge after a delay. The method builds upon the BOP2 framework and incorporates uncertainty in the delay timepoint through a truncated gamma prior, informed by expert knowledge or default settings. Supports two-arm trial designs with functionality for sample size determination, interim and final analyses, and comprehensive simulation under various delay and design scenarios. Ensures rigorous type I and II error control while improving trial efficiency and power when the delay effect is present. A manuscript describing the methodology is under development and will be formally referenced upon publication.
This package implements species distribution modeling and ecological niche modeling, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), interpolating across a time series of rasters, and use of spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt
, MaxNet
, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability with other modeling packages, no new classes are created. The package works with PROJ6 geodetic objects and coordinate reference systems.
This package provides ability to create color palettes from image files. It offers control over the type of color palette to derive from an image (qualitative, sequential or divergent) and other palette properties. Quantiles of an image color distribution can be trimmed. Near-black or near-white colors can be trimmed in RGB color space independent of trimming brightness or saturation distributions in HSV color space. Creating sequential palettes also offers control over the order of HSV color dimensions to sort by. This package differs from other related packages like RImagePalette
in approaches to quantizing and extracting colors in images to assemble color palettes and the level of user control over palettes construction.
Import, create and assemble data needed to fit spatial-statistical stream-network models using the SSN2 package for R'. Streams, observations, and prediction locations are represented as simple features and specific tools provided to define topological relationships between features; calculate the hydrologic distances (with flow-direction preserved) and the spatial additive function used to weight converging stream segments; and export the topological, spatial, and attribute information to an `SSN` (spatial stream network) object, which can be efficiently stored, accessed and analysed in R'. A detailed description of methods used to calculate and format the spatial data can be found in Peterson, E.E. and Ver Hoef, J.M., (2014) <doi:10.18637/jss.v056.i02>.
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", summarized according to economic status (class), sex, age and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are the individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are also the data sets downloaded from the Kaggle competition and thus lowers the barrier to entry for users new to R or machine learing.
Compute the coordinates to produce a tendril plot. In the tendril plot, each tendril (branch) represents a type of events, and the direction of the tendril is dictated by on which treatment arm the event is occurring. If an event is occurring on the first of the two specified treatment arms, the tendril bends in a clockwise direction. If an event is occurring on the second of the treatment arms, the tendril bends in an anti-clockwise direction. Ref: Karpefors, M and Weatherall, J., "The Tendril Plot - a novel visual summary of the incidence, significance and temporal aspects of adverse events in clinical trials" - JAMIA 2018; 25(8): 1069-1073 <doi:10.1093/jamia/ocy016>.
Fit Bayesian multivariate GARCH models using Stan for full Bayesian inference. Generate (weighted) forecasts for means, variances (volatility) and correlations. Currently DCC(P,Q), CCC(P,Q), pdBEKK(P,Q
), and BEKK(P,Q) parameterizations are implemented, based either on a multivariate gaussian normal or student-t distribution. DCC and CCC models are based on Engle (2002) <doi:10.1198/073500102288618487> and Bollerslev (1990). The BEKK parameterization follows Engle and Kroner (1995) <doi:10.1017/S0266466600009063> while the pdBEKK
as well as the estimation approach for this package is described in Rast et al. (2020) <doi:10.31234/osf.io/j57pk>. The fitted models contain rstan objects and can be examined with rstan functions.
Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes, but often require collection of additional information may be cumbersome to apply to models that yield a continuous result. Decision curve analysis is a method for evaluating and comparing prediction models that incorporates clinical consequences, requires only the data set on which the models are tested, and can be applied to models that have either continuous or dichotomous results. See the following references for details on the methods: Vickers (2006) <doi:10.1177/0272989X06295361>, Vickers (2008) <doi:10.1186/1472-6947-8-53>, and Pfeiffer (2020) <doi:10.1002/bimj.201800240>.
Open source data allows for reproducible research and helps advance our knowledge. The purpose of this package is to collate open source ophthalmic data sets curated for direct use. This is real life data of people with intravitreal injections with anti-vascular endothelial growth factor (anti-VEGF), due to age-related macular degeneration or diabetic macular edema. Associated publications of the data sets: Fu et al. (2020) <doi:10.1001/jamaophthalmol.2020.5044>, Moraes et al (2020) <doi:10.1016/j.ophtha.2020.09.025>, Fasler et al. (2019) <doi:10.1136/bmjopen-2018-027441>, Arpa et al. (2020) <doi:10.1136/bjophthalmol-2020-317161>, Kern et al. 2020, <doi:10.1038/s41433-020-1048-0>.
Genotyping assays for bi-allelic markers (e.g. SNPs) produce signal intensities for the two alleles. fitPoly
assigns genotypes (allele dosages) to a collection of polyploid samples based on these signal intensities. fitPoly
replaces the older package fitTetra
that was limited (a.o.) to only tetraploid populations whereas fitPoly
accepts any ploidy level. Reference: Voorrips RE, Gort G, Vosman B (2011) <doi:10.1186/1471-2105-12-172>. New functions added on conversion of data from SNP array software formats, drawing of XY-scatterplots with or without genotype colors, checking against expected F1 segregation patterns, comparing results from two different assays (probes) for the same SNP, recovery from a saveMarkerModels()
crash.
Network meta-analysis for survival outcome data often involves several studies only involve dichotomized outcomes (e.g., the numbers of event and sample sizes of individual arms). To combine these different outcome data, Woods et al. (2010) <doi:10.1186/1471-2288-10-54> proposed a Bayesian approach using complicated hierarchical models. Besides, frequentist approaches have been alternative standard methods for the statistical analyses of network meta-analysis, and the methodology has been well established. We proposed an easy-to-implement method for the network meta-analysis based on the frequentist framework in Noma and Maruo (2025) <doi:10.1101/2025.01.23.25321051>. This package involves some convenient functions to implement the simple synthesis method.
Set of functions that improves the graphical presentations of the functions: wave.correlation and spin.correlation (waveslim package, Whitcher 2012) and the wave.multiple.correlation and wave.multiple.cross.correlation (wavemulcor package, Fernandez-Macho 2012b). The plot outputs (heatmaps) can be displayed in the screen or can be saved as PNG or JPG images or as PDF or EPS formats. The W2CWM2C package also helps to handle the (input data) multivariate time series easily as a list of N elements (times series) and provides a multivariate data set (dataexample) to exemplify its use. A description of the package was published in a scientific paper: Polanco-Martinez and Fernandez-Macho (2014), <doi:10.1109/MCSE.2014.96>.
This package provides functions are designed to facilitate access to and utility with large scale, publicly available environmental data in R. The package contains functions for downloading raw data files from web URLs (download_data()
), processing the raw data files into clean spatial objects (process_covariates()
), and extracting values from the spatial data objects at point and polygon locations (calculate_covariates()
). These functions call a series of source-specific functions which are tailored to each data sources/datasets particular URL structure, data format, and spatial/temporal resolution. The functions are tested, versioned, and open source and open access. For sum_edc()
method details, see Messier, Akita, and Serre (2012) <doi:10.1021/es203152a>.
Chemical analysis of proteins based on their amino acid compositions. Amino acid compositions can be read from FASTA files and used to calculate chemical metrics including carbon oxidation state and stoichiometric hydration state, as described in Dick et al. (2020) <doi:10.5194/bg-17-6145-2020>. Other properties that can be calculated include protein length, grand average of hydropathy (GRAVY), isoelectric point (pI
), molecular weight (MW), standard molal volume (V0), and metabolic costs (Akashi and Gojobori, 2002 <doi:10.1073/pnas.062526999>; Wagner, 2005 <doi:10.1093/molbev/msi126>; Zhang et al., 2018 <doi:10.1038/s41467-018-06461-1>). A database of amino acid compositions of human proteins derived from UniProt
is provided.