Facilitates scalable spatiotemporally varying coefficient modelling with Bayesian kernelized tensor regression. The important features of this package are: (a) Enabling local temporal and spatial modeling of the relationship between the response variable and covariates. (b) Implementing the model described by Lei et al. (2023) <doi:10.48550/arXiv.2109.00046>. (c) Using a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to sample from the posterior distribution of the model parameters. (d) Employing a tensor decomposition to reduce the number of estimated parameters. (e) Accelerating tensor operations and enabling graphics processing unit (GPU) acceleration with the torch package.
Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals. Infrastructure for working with censored or truncated normal, logistic, and Student-t distributions, i.e., d/p/q/r functions and distributions3 objects.
This package provides a set of functions to conduct Conjunctive Analysis of Case Configurations (CACC) as described in Miethe, Hart, and Regoeczi (2008) <doi:10.1007/s10940-008-9044-8>, and identify and quantify situational clustering in dominant case configurations as described in Hart (2019) <doi:10.1177/0011128719866123>. Initially conceived as an exploratory technique for multivariate analysis of categorical data, CACC has developed to include formal statistical tests that can be applied in a wide variety of contexts. This technique allows examining composite profiles of different units of analysis in an alternative way to variable-oriented methods.
This package provides tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.
Integrates game theory and ecological theory to construct social-ecological models that simulate the management of populations and stakeholder actions. These models build off of a previously developed management strategy evaluation (MSE) framework to simulate all aspects of management: population dynamics, manager observation of populations, manager decision making, and stakeholder responses to management decisions. The newly developed generalised management strategy evaluation (GMSE) framework uses genetic algorithms to mimic the decision-making process of managers and stakeholders under conditions of change, uncertainty, and conflict. Simulations can be run using gmse(), gmse_apply(), and gmse_gui() functions.
Allows users to create high-quality heatmaps from labelled, hierarchical data. Specifically, for data with a two-level hierarchical structure, it will produce a heatmap where each row and column represents a category at the lower level. These rows and columns are then grouped by the higher-level group each category belongs to, with the names for each category and groups shown in the margins. While other packages (e.g. dendextend') allow heatmap rows and columns to be arranged by groups only, hhmR also allows the labelling of the data at both the category and group level.
This package provides basic tools and wrapper functions for computing clusters of instances described by multiple time-to-event censored endpoints. From long-format datasets, where one instance is described by one or more dated records, the main function, `make_state_matrices()`, creates state matrices. Based on these matrices, optimised procedures using the Jaccard distance between instances enable the construction of longitudinal typologies. The package is under active development, with additional tools for graphical representation of typologies planned. For methodological details, see our accompanying paper: `Delord M, Douiri A (2025) <doi:10.1186/s12874-025-02476-7>`.
The goal of midr is to provide a model-agnostic method for interpreting and explaining black-box predictive models by creating a globally interpretable surrogate model. The package implements Maximum Interpretation Decomposition (MID), a functional decomposition technique that finds an optimal additive approximation of the original model. This approximation is achieved by minimizing the squared error between the predictions of the black-box model and the surrogate model. The theoretical foundations of MID are described in Iwasawa & Matsumori (2025) [Forthcoming], and the package itself is detailed in Asashiba et al. (2025) <doi:10.48550/arXiv.2506.08338>.
Provide estimation for particular cases of the power series cure rate model <doi:10.1080/03610918.2011.639971>. For the distribution of the concurrent causes the alternative models are the Poisson, logarithmic, negative binomial and Bernoulli (which are includes in the original work), the polylogarithm model <doi:10.1080/00949655.2018.1451850> and the Flory-Schulz <doi:10.3390/math10244643>. The estimation procedure is based on the EM algorithm discussed in <doi:10.1080/03610918.2016.1202276>. For the distribution of the time-to-event the alternative models are slash half-normal, Weibull, gamma and Birnbaum-Saunders distributions.
This package implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.
This package implements functions for working with absorbing Markov chains. The implementation is based on the framework described in "Toward a unified framework for connectivity that disentangles movement and mortality in space and time" by Fletcher et al. (2019) <doi:10.1111/ele.13333>, which applies them to spatial ecology. This framework incorporates both resistance and absorption with spatial absorbing Markov chains (SAMC) to provide several short-term and long-term predictions for metrics related to connectivity in landscapes. Despite the ecological context of the framework, this package can be used in any application of absorbing Markov chains.
Scale invariant version of the original PNN proposed by Specht (1990) <doi:10.1016/0893-6080(90)90049-q> with the added functionality of allowing for smoothing along multiple dimensions while accounting for covariances within the data set. It is written in the R statistical programming language. Given a data set with categorical variables, we use this algorithm to estimate the probabilities of a new observation vector belonging to a specific category. This type of neural network provides the benefits of fast training time relative to backpropagation and statistical generalization with only a small set of known observations.
This package provides the necessary functions for performing the Partial Correlation coefficient with Information Theory (PCIT) (Reverter and Chan 2008) and Regulatory Impact Factors (RIF) (Reverter et al. 2010) algorithm. The PCIT algorithm identifies meaningful correlations to define edges in a weighted network and can be applied to any correlation-based network including but not limited to gene co-expression networks, while the RIF algorithm identify critical Transcription Factors (TF) from gene expression data. These two algorithms when combined provide a very relevant layer of information for gene expression studies (Microarray, RNA-seq and single-cell RNA-seq data).
Implementation of the Partitioned Local Depth (PaLD) approach which provides a measure of local depth and the cohesion of a point to another which (together with a universal threshold for distinguishing strong and weak ties) may be used to reveal local and global structure in data, based on methods described in Berenhaut, Moore, and Melvin (2022) <doi:10.1073/pnas.2003634119>. No extraneous inputs, distributional assumptions, iterative procedures nor optimization criteria are employed. This package includes functions for computing local depths and cohesion as well as flexible functions for plotting community networks and displays of cohesion against distance.
An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an autoplot function for producing standard diagnostic plots using ggplot2 graphics. The package currently supports cumulative link models from packages MASS', ordinal', rms', and VGAM'. Support for binary regression models using the standard glm function is also available.
Estimation, scoring, and plotting functions for the semi-parametric factor model proposed by Liu & Wang (2022) <doi:10.1007/s11336-021-09832-8> and Liu & Wang (2023) <arXiv:2303.10079>. Both the conditional densities of observed responses given the latent factors and the joint density of latent factors are estimated non-parametrically. Functional parameters are approximated by smoothing splines, whose coefficients are estimated by penalized maximum likelihood using an expectation-maximization (EM) algorithm. E- and M-steps can be parallelized on multi-thread computing platforms that support OpenMP'. Both continuous and unordered categorical response variables are supported.
Format dates and times flexibly and to whichever locales make sense. This package parses dates, times, and date-times in various formats (including string-based ISO 8601 constructions). The formatting syntax gives the user many options for formatting the date and time output in a precise manner. Time zones in the input can be expressed in multiple ways and there are many options for formatting time zones in the output as well. Several of the provided helper functions allow for automatic generation of locale-aware formatting patterns based on date/time skeleton formats and standardized date/time formats with varying specificity.
Multi-omic Pathway Analysis of Cells (MPAC), integrates multi-omic data for understanding cellular mechanisms. It predicts novel patient groups with distinct pathway profiles as well as identifying key pathway proteins with potential clinical associations. From CNA and RNA-seq data, it determines genes’ DNA and RNA states (i.e., repressed, normal, or activated), which serve as the input for PARADIGM to calculate Inferred Pathway Levels (IPLs). It also permutes DNA and RNA states to create a background distribution to filter IPLs as a way to remove events observed by chance. It provides multiple methods for downstream analysis and visualization.
Response surface methods for drug synergy analysis. Available methods include generalized and classical Loewe formulations as well as Highest Single Agent methodology. Response surfaces can be plotted in an interactive 3-D plot and formal statistical tests for presence of synergistic effects are available. Implemented methods and tests are described in the article "BIGL: Biochemically Intuitive Generalized Loewe null model for prediction of the expected combined effect compatible with partial agonism and antagonism" by Koen Van der Borght, Annelies Tourny, Rytis Bagdziunas, Olivier Thas, Maxim Nazarov, Heather Turner, Bie Verbist & Hugo Ceulemans (2017) <doi:10.1038/s41598-017-18068-5>.
Bayesian survival model using Weibull regression on both scale and shape parameters. Dependence of shape parameter on covariates permits deviation from proportional-hazard assumption, leading to dynamic - i.e. non-constant with time - hazard ratios between subjects. Bayesian Lasso shrinkage in the form of two Laplace priors - one for scale and one for shape coefficients - allows for many covariates to be included. Cross-validation helper functions can be used to tune the shrinkage parameters. Monte Carlo Markov Chain (MCMC) sampling using a Gibbs wrapper around Radford Neal's univariate slice sampler (R package MfUSampler) is used for coefficient estimation.
It aims to find significant pathways through network topology information. It has several advantages compared with current pathway enrichment tools. First, pathway node instead of single gene is taken as the basic unit when analysing networks to meet the fact that genes must be constructed into complexes to hold normal functions. Second, multiple network centrality measures are applied simultaneously to measure importance of nodes from different aspects to make a full view on the biological system. CePa extends standard pathway enrichment methods, which include both over-representation analysis procedure and gene-set analysis procedure. <doi:10.1093/bioinformatics/btt008>.
This package provides a collection of several functions related to construction and analysis of incomplete split-plot designs. The package contains functions to obtain and analyze incomplete split-plot designs for three kinds of situations namely (i) when blocks are complete with respect to main plot treatments and main plots are incomplete with respect to subplot treatments, (ii) when blocks are incomplete with respect to main plot treatments and main plots are complete with respect to subplot treatments and (iii) when blocks are incomplete with respect to main plot treatments and main plots are incomplete with respect to subplot treatments.
The popular population genetic software Treemix by Pickrell and Pritchard (2012) <DOI:10.1371/journal.pgen.1002967> estimates the number of migration edges on a population tree. However, it can be difficult to determine the number of migration edges to include. Previously, it was customary to stop adding migration edges when 99.8% of variation in the data was explained, but OptM automates this process using an ad hoc statistic based on the second-order rate of change in the log likelihood. OptM also has added functionality for various threshold modeling to compare with the ad hoc statistic.
This package provides a progression model for repeated measures (PMRM) is a continuous-time nonlinear mixed-effects model for longitudinal clinical trials in progressive diseases. Unlike mixed models for repeated measures (MMRMs), which estimate treatment effects as linear combinations of additive effects on the outcome scale, PMRMs characterize treatment effects in terms of the underlying disease trajectory. This framing yields clinically interpretable quantities such as average time saved and percent reduction in decline due to treatment. This package implements frequentist PMRMs by Raket (2022) <doi:10.1002/sim.9581> using RTMB by Kristensen (2016) <doi:10.18637/jss.v070.i05>.