The landmark approach allows survival predictions to be updated dynamically as new measurements from an individual are recorded. The idea is to set predefined time points, known as "landmark times", and form a model at each landmark time using only the individuals in the risk set. This package allows the longitudinal data to be modelled either using the last observation carried forward or linear mixed effects modelling. There is also the option to model competing risks, either through cause-specific Cox regression or Fine-Gray regression. To find out more about the methods in this package, please see <https://isobelbarrott.github.io/Landmarking/articles/Landmarking>.
This package provides a flexible and easy-to use interface for the soil vegetation atmosphere transport (SVAT) model LWF-BROOK90, written in Fortran. The model simulates daily transpiration, interception, soil and snow evaporation, streamflow and soil water fluxes through a soil profile covered with vegetation, as described in Hammel & Kennel (2001, ISBN:978-3-933506-16-0) and Federer et al. (2003) <doi:10.1175/1525-7541(2003)004%3C1276:SOAETS%3E2.0.CO;2>. A set of high-level functions for model set up, execution and parallelization provides easy access to plot-level SVAT simulations, as well as multi-run and large-scale applications.
Programs to find the sample size or power of studies using the Sequential Parallel Comparison Design (SPCD) and programs to analyze such studies. This is a clinical trial design where patients initially on placebo who did not respond are re-randomized between placebo and active drug in a second phase and the results of the two phases are pooled. The method of analyzing binary data with this design is described in Fava,Evins, Dorer and Schoenfeld(2003) <doi:10.1159/000069738>, and the method of analyzing continuous data is described in Chen, Yang, Hung and Wang (2011) <doi:10.1016/j.cct.2011.04.006>.
This package provides tools to assess the association between two spatial processes. Currently, several methodologies are implemented: A modified t-test to perform hypothesis testing about the independence between the processes, a suitable nonparametric correlation coefficient, the codispersion coefficient, and an F test for assessing the multiple correlation between one spatial process and several others. Functions for image processing and computing the spatial association between images are also provided. Functions contained in the package are intended to accompany Vallejos, R., Osorio, F., Bevilacqua, M. (2020). Spatial Relationships Between Two Georeferenced Variables: With Applications in R. Springer, Cham <doi:10.1007/978-3-030-56681-4>.
This package provides a conditional independence test that can be applied both to univariate and multivariate random variables. The test is based on a weighted form of the sample covariance of the residuals after a nonlinear regression on the conditioning variables. Details are described in Scheidegger, Hoerrmann and Buehlmann (2021) "The Weighted Generalised Covariance Measure" <arXiv:2111.04361>
. The test is a generalisation of the Generalised Covariance Measure (GCM) implemented in the R package GeneralisedCovarianceMeasure
by Jonas Peters and Rajen D. Shah based on Shah and Peters (2020) "The Hardness of Conditional Independence Testing and the Generalised Covariance Measure" <arXiv:1804.07203>
.
The main aim is to further facilitate the creation of exercises based on the package exams by Grün, B., and Zeileis, A. (2009) <doi:10.18637/jss.v029.i10>. Creating effective student exercises involves challenges such as creating appropriate data sets and ensuring access to intermediate values for accurate explanation of solutions. The functionality includes the generation of univariate and bivariate data including simple time series, functions for theoretical distributions and their approximation, statistical and mathematical calculations for tasks in basic statistics courses as well as general tasks such as string manipulation, LaTeX/HTML
formatting and the editing of XML task files for Moodle'.
Function and support for medication and dosing information extraction from free-text clinical notes. Medication entities for the basic medExtractR
implementation that can be extracted include drug name, strength, dose amount, dose, frequency, intake time, dose change, and time of last dose. The basic medExtractR
is outlined in Weeks, Beck, McNeer
, Williams, Bejan, Denny, Choi (2020) <doi: 10.1093/jamia/ocz207>. The extended medExtractR_tapering
implementation is intended to extract dosing information for more tapering schedules, which are far more complex. The tapering extension allows for the extraction of additional entities including dispense amount, refills, dose schedule, time keyword, transition, and preposition.
We consider the problem where we observe k vectors (possibly of different lengths), each representing an independent multinomial random vector. For a given function that takes in the concatenated vector of multinomial probabilities and outputs a real number, this is a Monte Carlo estimation procedure of an exact p-value and confidence interval. The resulting inference is valid even in small samples, when the parameter is on the boundary, and when the function is not differentiable at the parameter value, all situations where asymptotic methods and the bootstrap would fail. For more details see Sachs, Fay, and Gabriel (2025) <doi:10.48550/arXiv.2406.19141>
.
Get z-scores, percentiles, absolute values, and percent of predicted of a reference cohort. Functionality requires installing the data packages adiposerefdata and musclerefdata'. For more information on the underlying research, please visit our website which also includes a graphical interface. The models and underlying data are described in Marquardt JP et al.(planned publication 2025; reserved doi 10.1097/RLI.0000000000001104), "Subcutaneous and Visceral adipose tissue Reference Values from Framingham Heart Study Thoracic and Abdominal CT", *Investigative Radiology* and Tonnesen PE et al. (2023), "Muscle Reference Values from Thoracic and Abdominal CT for Sarcopenia Assessment [column] The Framingham Heart Study", *Investigative Radiology*, <doi:10.1097/RLI.0000000000001012>.
Efficiently implements the Graphical Lasso algorithm, utilizing the Armadillo C++ library for rapid computation. This algorithm introduces an L1 penalty to derive sparse inverse covariance matrices from observations of multivariate normal distributions. Features include the generation of random and structured sparse covariance matrices, beneficial for simulations, statistical method testing, and educational purposes in graphical modeling. A unique function for regularization parameter selection based on predefined sparsity levels is also offered, catering to users with specific sparsity requirements in their models. The methodology for sparse inverse covariance estimation implemented in this package is based on the work of Friedman, Hastie, and Tibshirani (2008) <doi:10.1093/biostatistics/kxm045>.
Automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. The main function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. modelStudio
facilitates the process of Interactive Explanatory Model Analysis introduced in Baniecki et al. (2023) <doi:10.1007/s10618-023-00924-w>.
The SparseArray
package is an infrastructure package that provides an array-like container for efficient in-memory representation of multidimensional sparse data in R. The package defines the SparseArray
virtual class and two concrete subclasses: COO_SparseArray
and SVT_SparseArray
. Each subclass uses its own internal representation of the nonzero multidimensional data, the "COO layout" and the "SVT layout", respectively. SVT_SparseArray
objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they support most of the "standard matrix and array API" defined in base R and in the matrixStats
package from CRAN.
Calculates the probabilities of k successes given n trials of a binomial random variable with non-negative correlation across trials. The function takes as inputs the scalar values the level of correlation or association between trials, the success probability, the number of trials, an optional input specifying the number of bits of precision used in the calculation, and an optional input specifying whether the calculation approach to be used is from Witt (2014) <doi:10.1080/03610926.2012.725148> or from Kuk (2004) <doi:10.1046/j.1467-9876.2003.05369.x>. The output is a (trials+1)-dimensional vector containing the likelihoods of 0, 1, ..., trials successes.
This package provides a comprehensive framework for batch effect diagnostics, harmonization, and post-harmonization downstream analysis. Features include interactive visualization tools, robust statistical tests, and a range of harmonization techniques. Additionally, ComBatFamQC
enables the creation of life-span age trend plots with estimated age-adjusted centiles and facilitates the generation of covariate-corrected residuals for analytical purposes. Methods for harmonization are based on approaches described in Johnson et al., (2007) <doi:10.1093/biostatistics/kxj037>, Beer et al., (2020) <doi:10.1016/j.neuroimage.2020.117129>, Pomponio et al., (2020) <doi:10.1016/j.neuroimage.2019.116450>, and Chen et al., (2021) <doi:10.1002/hbm.25688>.
The stepwise variable selection procedure (with iterations between the forward and backward steps) can be used to obtain the best candidate final regression model in regression analysis. All the relevant covariates are put on the variable list to be selected. The significance levels for entry (SLE) and for stay (SLS) are usually set to 0.15 (or larger) for being conservative. Then, with the aid of substantive knowledge, the best candidate final regression model is identified manually by dropping the covariates with p value > 0.05 one at a time until all regression coefficients are significantly different from 0 at the chosen alpha level of 0.05.
Implementations of the multiple testing procedures for discrete tests described in the paper Döhler, Durand and Roquain (2018) "New FDR bounds for discrete and heterogeneous tests" <doi:10.1214/18-EJS1441>. The main procedures of the paper (HSU and HSD), their adaptive counterparts (AHSU and AHSD), and the HBR variant are available and are coded to take as input the results of a test procedure from package DiscreteTests
', or a set of observed p-values and their discrete support under their nulls. A shortcut function to obtain such p-values and supports is also provided, along with a wrapper allowing to apply discrete procedures directly to data.
Pooling estimates reported in meta-analyses (literature-based, LB) and estimates based on individual participant data (IPD) is not straight-forward as the details of the LB nonlinear function estimate are not usually reported. This package pools the nonlinear IPD dose-response estimates based on a natural cubic spline from lm or glm with the pointwise LB estimates and their estimated variances. Details will be presented in Härkänen, Tapanainen, Sares-Jäske, Männistö, Kaartinen and Paalanen (2025) "Novel pooling method for nonlinear cohort analysis and meta-analysis estimates: Predicting health outcomes based on climate-friendly diets" (under revision) <https://journals.lww.com/epidem/pages/default.aspx>.
It provides functions to generate a correlation matrix from a genetic dataset and to use this matrix to predict the phenotype of an individual by using the phenotypes of the remaining individuals through kriging. Kriging is a geostatistical method for optimal prediction or best unbiased linear prediction. It consists of predicting the value of a variable at an unobserved location as a weighted sum of the variable at observed locations. Intuitively, it works as a reverse linear regression: instead of computing correlation (univariate regression coefficients are simply scaled correlation) between a dependent variable Y and independent variables X, it uses known correlation between X and Y to predict Y.
It estimates the parameters of a partially linear regression censored model via maximum penalized likelihood through of ECME algorithm. The model belong to the semiparametric class, that including a parametric and nonparametric component. The error term considered belongs to the scale-mixture of normal (SMN) distribution, that includes well-known heavy tails distributions as the Student-t distribution, among others. To examine the performance of the fitted model, case-deletion and local influence techniques are provided to show its robust aspect against outlying and influential observations. This work is based in Ferreira, C. S., & Paula, G. A. (2017) <doi:10.1080/02664763.2016.1267124> but considering the SMN family.
Allows plotting data on bathymetric maps using ggplot2'. Plotting oceanographic spatial data is made as simple as feasible, but also flexible for custom modifications. Data that contain geographic information from anywhere around the globe can be plotted on maps generated by the basemap()
or qmap()
functions using ggplot2 layers separated by the + operator. The package uses spatial shape- ('sf') and raster ('stars') files, geospatial packages for R to manipulate, and the ggplot2 package to plot these files. The package ships with low-resolution spatial data files and higher resolution files for detailed maps are stored in the ggOceanMapsLargeData
repository on GitHub
and downloaded automatically when needed.
Most of the time floating point arithmetic does approximately the right thing. When adding sums or having products of numbers that greatly differ in magnitude, the floating point arithmetic may be incorrect. This package implements the Kahan (1965) sum <doi:10.1145/363707.363723>, Neumaier (1974) sum <doi:10.1002/zamm.19740540106>, pairwise-sum (adapted from NumPy
', See Castaldo (2008) <doi:10.1137/070679946> for a discussion of accuracy), and arbitrary precision sum (adapted from the fsum in Python ; Shewchuk (1997) <https://people.eecs.berkeley.edu/~jrs/papers/robustr.pdf>). In addition, products are changed to long double precision for accuracy, or changed into a log-sum for accuracy.
Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.
Generates interactive Jellyfish plots to visualize spatiotemporal tumor evolution by integrating sample and phylogenetic trees into a unified plot. This approach provides an intuitive way to analyze tumor heterogeneity and evolution over time and across anatomical locations. The Jellyfish plot visualization design was first introduced by Lahtinen, Lavikka, et al. (2023, <doi:10.1016/j.ccell.2023.04.017>). This package also supports visualizing ClonEvol
results, a tool developed by Dang, et al. (2017, <doi:10.1093/annonc/mdx517>), for analyzing clonal evolution from multi-sample sequencing data. The clonevol package is not available on CRAN but can be installed from its GitHub
repository (<https://github.com/hdng/clonevol>).
This package provides a set of vectorised functions to calculate medical equations used in transplantation, focused mainly on transplantation of abdominal organs. These functions include donor and recipient risk indices as used by NHS Blood & Transplant, OPTN/UNOS and Eurotransplant, tools for quantifying HLA mismatches, functions for calculating estimated glomerular filtration rate (eGFR
), a function to calculate the APRI (AST to platelet ratio) score used in initial screening of suitability to receive a transplant from a hepatitis C seropositive donor and some biochemical unit converter functions. All functions are designed to work with either US or international units. References for the equations are provided in the vignettes and function documentation.