An easy-to-use tool for implementing Neural Ordinary Differential Equations (NODEs) in pharmacometric software such as Monolix', NONMEM', and nlmixr2', see Bräm et al. (2024) <doi:10.1007/s10928-023-09886-4> and Bräm et al. (2025) <doi:10.1002/psp4.13265>. The main functionality is to automatically generate structural model code describing computations within a neural network. Additionally, parameters and software settings can be initialized automatically. For using these additional functionalities with Monolix', pmxNODE interfaces with MonolixSuite via the lixoftConnectors package. The lixoftConnectors package is distributed with MonolixSuite (<https://monolixsuite.slp-software.com/r-functions/2024R1/package-lixoftconnectors>) and is not available from public repositories.
This package provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874> The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package shapper is a port of the Python library shap'.
Unlock the power of large-scale geospatial analysis, quickly generate high-resolution kernel density visualizations, supporting advanced analysis tasks such as bandwidth-tuning and spatiotemporal analysis. Regardless of the size of your dataset, our library delivers efficient and accurate results. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu, Reynold Cheng (2023) <doi:10.1145/3555041.3589401>. Tsz Nam Chan, Rui Zang, Pak Lon Ip, Leong Hou U, Jianliang Xu (2023) <doi:10.1145/3555041.3589711>. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.1145/3514221.3517823>. Tsz Nam Chan, Pak Lon Ip, Kaiyan Zhao, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3554821.3554855>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3503585.3503591>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3494124.3494135>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Weng Hou Tong, Shivansh Mittal, Ye Li, Reynold Cheng (2021) <doi:10.14778/3476311.3476312>. Tsz Nam Chan, Zhe Li, Leong Hou U, Jianliang Xu, Reynold Cheng (2021) <doi:10.14778/3461535.3461540>. Tsz Nam Chan, Reynold Cheng, Man Lung Yiu (2020) <doi:10.1145/3318464.3380561>. Tsz Nam Chan, Leong Hou U, Reynold Cheng, Man Lung Yiu, Shivansh Mittal (2020) <doi:10.1109/TKDE.2020.3018376>. Tsz Nam Chan, Man Lung Yiu, Leong Hou U (2019) <doi:10.1109/ICDE.2019.00055>.
Analyses government debt sustainability using the standard debt dynamics framework from Blanchard (1990) <doi:10.1787/budget-v2-art12-en> and the IMF Debt Sustainability Analysis methodology (IMF, 2013) and the Sovereign Risk and Debt Sustainability Framework (IMF, 2022). Projects debt-to-GDP paths, decomposes historical debt changes into interest, growth, and primary balance contributions, and estimates fiscal reaction functions following Bohn (1998) <doi:10.1162/003355398555793>. Produces stochastic fan charts via Monte Carlo simulation, standardised stress tests, and IMF- style heat map risk assessments. Computes S1/S2 sustainability gap indicators used by the European Commission. All methods are pure computation with no external dependencies beyond base R; works with fiscal data from any source.
This package implements a Bayesian Optimal Phase II design (DTE-BOP2) for trials with delayed treatment effects, particularly relevant to immunotherapy studies where treatment benefits may emerge after a delay. The method builds upon the BOP2 framework and incorporates uncertainty in the delay timepoint through a truncated gamma prior, informed by expert knowledge or default settings. Supports two-arm trial designs with functionality for sample size determination, interim and final analyses, and comprehensive simulation under various delay and design scenarios. Ensures rigorous type I and II error control while improving trial efficiency and power when the delay effect is present. A manuscript describing the methodology is under development and will be formally referenced upon publication.
This package implements species distribution modeling and ecological niche modeling, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), interpolating across a time series of rasters, and use of spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability with other modeling packages, no new classes are created. The package works with PROJ6 geodetic objects and coordinate reference systems.
This package provides ability to create color palettes from image files. It offers control over the type of color palette to derive from an image (qualitative, sequential or divergent) and other palette properties. Quantiles of an image color distribution can be trimmed. Near-black or near-white colors can be trimmed in RGB color space independent of trimming brightness or saturation distributions in HSV color space. Creating sequential palettes also offers control over the order of HSV color dimensions to sort by. This package differs from other related packages like RImagePalette in approaches to quantizing and extracting colors in images to assemble color palettes and the level of user control over palettes construction.
Import, create and assemble data needed to fit spatial-statistical stream-network models using the SSN2 package for R'. Streams, observations, and prediction locations are represented as simple features and specific tools provided to define topological relationships between features; calculate the hydrologic distances (with flow-direction preserved) and the spatial additive function used to weight converging stream segments; and export the topological, spatial, and attribute information to an `SSN` (spatial stream network) object, which can be efficiently stored, accessed and analysed in R'. A detailed description of methods used to calculate and format the spatial data can be found in Peterson, E.E. and Ver Hoef, J.M., (2014) <doi:10.18637/jss.v056.i02>.
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", summarized according to economic status (class), sex, age and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are the individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are also the data sets downloaded from the Kaggle competition and thus lowers the barrier to entry for users new to R or machine learing.
treekoR is a novel framework that aims to utilise the hierarchical nature of single cell cytometry data to find robust and interpretable associations between cell subsets and patient clinical end points. These associations are aimed to recapitulate the nested proportions prevalent in workflows inovlving manual gating, which are often overlooked in workflows using automatic clustering to identify cell populations. We developed treekoR to: Derive a hierarchical tree structure of cell clusters; quantify a cell types as a proportion relative to all cells in a sample (%total), and, as the proportion relative to a parent population (%parent); perform significance testing using the calculated proportions; and provide an interactive html visualisation to help highlight key results.
Fit Bayesian multivariate GARCH models using Stan for full Bayesian inference. Generate (weighted) forecasts for means, variances (volatility) and correlations. Currently DCC(P,Q), CCC(P,Q), pdBEKK(P,Q), and BEKK(P,Q) parameterizations are implemented, based either on a multivariate gaussian normal or student-t distribution. DCC and CCC models are based on Engle (2002) <doi:10.1198/073500102288618487> and Bollerslev (1990). The BEKK parameterization follows Engle and Kroner (1995) <doi:10.1017/S0266466600009063> while the pdBEKK as well as the estimation approach for this package is described in Rast et al. (2020) <doi:10.31234/osf.io/j57pk>. The fitted models contain rstan objects and can be examined with rstan functions.
This package provides Bayesian probabilistic methods for record linkage and entity resolution across multiple datasets using the Comprehensive Hit Or Miss Probabilistic Entity Resolution (CHOMPER) model. The package implements three main inference approaches: (1) Evolutionary Variational Inference for record Linkage (EVIL), (2) Coordinate Ascent Variational Inference (CAVI), and (3) Markov Chain Monte Carlo (MCMC) with split and merge process. The model supports both discrete and continuous fields, and it performs locally-varying hit mechanism for the attributes with multiple truths. It also provides tools for performance evaluation based on either approximated variational factors or posterior samples. The package is designed to support parallel computing with multi-threading support for EVIL to estimate the linkage structure faster.
Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes, but often require collection of additional information may be cumbersome to apply to models that yield a continuous result. Decision curve analysis is a method for evaluating and comparing prediction models that incorporates clinical consequences, requires only the data set on which the models are tested, and can be applied to models that have either continuous or dichotomous results. See the following references for details on the methods: Vickers (2006) <doi:10.1177/0272989X06295361>, Vickers (2008) <doi:10.1186/1472-6947-8-53>, and Pfeiffer (2020) <doi:10.1002/bimj.201800240>.
Open source data allows for reproducible research and helps advance our knowledge. The purpose of this package is to collate open source ophthalmic data sets curated for direct use. This is real life data of people with intravitreal injections with anti-vascular endothelial growth factor (anti-VEGF), due to age-related macular degeneration or diabetic macular edema. Associated publications of the data sets: Fu et al. (2020) <doi:10.1001/jamaophthalmol.2020.5044>, Moraes et al (2020) <doi:10.1016/j.ophtha.2020.09.025>, Fasler et al. (2019) <doi:10.1136/bmjopen-2018-027441>, Arpa et al. (2020) <doi:10.1136/bjophthalmol-2020-317161>, Kern et al. 2020, <doi:10.1038/s41433-020-1048-0>.
Genotyping assays for bi-allelic markers (e.g. SNPs) produce signal intensities for the two alleles. fitPoly assigns genotypes (allele dosages) to a collection of polyploid samples based on these signal intensities. fitPoly replaces the older package fitTetra that was limited (a.o.) to only tetraploid populations whereas fitPoly accepts any ploidy level. Reference: Voorrips RE, Gort G, Vosman B (2011) <doi:10.1186/1471-2105-12-172>. New functions added on conversion of data from SNP array software formats, drawing of XY-scatterplots with or without genotype colors, checking against expected F1 segregation patterns, comparing results from two different assays (probes) for the same SNP, recovery from a saveMarkerModels() crash.
Network meta-analysis for survival outcome data often involves several studies only involve dichotomized outcomes (e.g., the numbers of event and sample sizes of individual arms). To combine these different outcome data, Woods et al. (2010) <doi:10.1186/1471-2288-10-54> proposed a Bayesian approach using complicated hierarchical models. Besides, frequentist approaches have been alternative standard methods for the statistical analyses of network meta-analysis, and the methodology has been well established. We proposed an easy-to-implement method for the network meta-analysis based on the frequentist framework in Noma and Maruo (2025) <doi:10.1101/2025.01.23.25321051>. This package involves some convenient functions to implement the simple synthesis method.
Set of functions that improves the graphical presentations of the functions: wave.correlation and spin.correlation (waveslim package, Whitcher 2012) and the wave.multiple.correlation and wave.multiple.cross.correlation (wavemulcor package, Fernandez-Macho 2012b). The plot outputs (heatmaps) can be displayed in the screen or can be saved as PNG or JPG images or as PDF or EPS formats. The W2CWM2C package also helps to handle the (input data) multivariate time series easily as a list of N elements (times series) and provides a multivariate data set (dataexample) to exemplify its use. A description of the package was published in a scientific paper: Polanco-Martinez and Fernandez-Macho (2014), <doi:10.1109/MCSE.2014.96>.
This package provides functions are designed to facilitate access to and utility with large scale, publicly available environmental data in R. The package contains functions for downloading raw data files from web URLs (download_data()), processing the raw data files into clean spatial objects (process_covariates()), and extracting values from the spatial data objects at point and polygon locations (calculate_covariates()). These functions call a series of source-specific functions which are tailored to each data sources/datasets particular URL structure, data format, and spatial/temporal resolution. The functions are tested, versioned, and open source and open access. For sum_edc() method details, see Messier, Akita, and Serre (2012) <doi:10.1021/es203152a>.
Chemical analysis of proteins based on their amino acid compositions. Amino acid compositions can be read from FASTA files and used to calculate chemical metrics including carbon oxidation state and stoichiometric hydration state, as described in Dick et al. (2020) <doi:10.5194/bg-17-6145-2020>. Other properties that can be calculated include protein length, grand average of hydropathy (GRAVY), isoelectric point (pI), molecular weight (MW), standard molal volume (V0), and metabolic costs (Akashi and Gojobori, 2002 <doi:10.1073/pnas.062526999>; Wagner, 2005 <doi:10.1093/molbev/msi126>; Zhang et al., 2018 <doi:10.1038/s41467-018-06461-1>). A database of amino acid compositions of human proteins derived from UniProt is provided.
Given a multivariate dataset and some knowledge about the dependencies between its features, it is customary to fit a statistical model to the features to infer parameters of interest. Such a procedure implicitly assumes that the sample is exchangeable. This package provides a flexible non-parametric test of this exchangeability assumption, allowing the user to specify the feature dependencies by hand as long as features can be grouped into disjoint independent sets. This package also allows users to test a dual hypothesis, which is, given that the sample is exchangeable, does a proposed grouping of the features into disjoint sets also produce statistically independent sets of features? See Aw, Spence and Song (2023) for the accompanying paper.
This is a method for Allele-specific DNA Copy Number profiling for whole-Exome sequencing data. Given the allele-specific coverage and site biases at the variant loci, this program segments the genome into regions of homogeneous allele-specific copy number. It requires, as input, the read counts for each variant allele in a pair of case and control samples, as well as the site biases. For detection of somatic mutations, the case and control samples can be the tumor and normal sample from the same individual. The implemented method is based on the paper: Chen, H., Jiang, Y., Maxwell, K., Nathanson, K. and Zhang, N. (under review). Allele-specific copy number estimation by whole Exome sequencing.
Analyzes the function calls in an R package and creates a hive plot of the calls, dividing them among functions that only make outgoing calls (sources), functions that have only incoming calls (sinks), and those that have both incoming calls and make outgoing calls (managers). Function calls can be mapped by their absolute numbers, their normalized absolute numbers, or their rank. FuncMap should be useful for comparing packages at a high level for their overall design. Plus, it's just plain fun. The hive plot concept was developed by Martin Krzywinski (www.hiveplot.com) and inspired this package. Note: this package is maintained for historical reasons. HiveR is a full package for creating hive plots.
This package provides an exact Goodness-of-Fit test for multinomial data with fixed probabilities. It can be used to determine whether a set of counts fits a given expected ratio. To see whether a set of observed counts fits an expectation, one can examine all possible outcomes with xmulti() or a random sample of them with xmonte() and find the probability of an observation deviating from the expectation by at least as much as the observed. As a measure of deviation from the expected, one can use the log-likelihood ratio, the multinomial probability, or the classic chi-square statistic. A histogram of the test statistic can also be plotted and compared with the asymptotic curve.
Differential expression analysis is a prevalent method utilised in the examination of diverse biological data. The reproducibility-optimized test statistic (ROTS) modifies a t-statistic based on the data's intrinsic characteristics and ranks features according to their statistical significance for differential expression between two or more groups (f-statistic). Focussing on proteomics and metabolomics, the current ROTS implementation cannot account for technical or biological covariates such as MS batches or gender differences among the samples. Consequently, we developed LimROTS, which employs a reproducibility-optimized test statistic utilising the limma methodology to simulate complex experimental designs. LimROTS is a hybrid method integrating empirical bayes and reproducibility-optimized statistics for robust analysis of proteomics and metabolomics data.