Medication adherence, defined as medication-taking behavior that aligns with the agreed-upon treatment protocol, is critical for realizing the benefits of prescription medications. Medication adherence can be assessed using electronic adherence monitoring devices (EAMDs), pill bottles or boxes that contain a computer chip that records the date and time of each opening (or â actuationâ ). Before researchers can use EAMD data, they must apply a series of decision rules to transform actuation data into adherence data. The purpose of this R package ('oncmap') is to transform EAMD actuations in the form of a raw .csv file, information about the patient, regimen, and non-monitored periods into two daily adherence values -- Dose Taken and Correct Dose Taken.
Bayesian optimal design with futility and efficacy stopping boundaries (BOP2-FE) is a novel statistical framework for single-arm Phase II clinical trials. It enables early termination for efficacy when interim data are promising, while explicitly controlling Type I and Type II error rates. The design supports a variety of endpoint structures, including single binary endpoints, nested endpoints, co-primary endpoints, and joint monitoring of efficacy and toxicity. The package provides tools for enumerating stopping boundaries prior to trial initiation and for conducting simulation studies to evaluate the designâ s operating characteristics. Users can flexibly specify design parameters to suit their specific applications. For methodological details, refer to Xu et al. (2025) <doi:10.1080/10543406.2025.2558142>.
Implementation of network integration approaches comprising unweighted and weighted integration methods. Unweighted integration is performed considering the average, per-edge average, maximum and minimum of networks edges. Weighted integration takes into account a weight for each network during the fusion process, where the weights express the predictiveness strength of each network considering a specific predictive task. Weights can be learned using a machine learning algorithm able to associate the weights to the assessment of the accuracy of the learning algorithm trained on the network itself. The implemented methods can be applied to effectively integrate different biological networks modelling a wide range of problems in bioinformatics (e.g. disease gene prioritization, protein function prediction, drug repurposing, clinical outcome prediction).
This package provides a low-level package for hosting persistence data. It is part of the TDAverse suite of packages, which is designed to provide a collection of packages for enabling machine learning and data science tasks using persistent homology. Implements a class for hosting persistence data, a number of coercers from and to already existing and used data structures from other packages and functions to compute distances between persistence diagrams. A formal definition and study of bottleneck and Wasserstein distances can be found in Bubenik, Scott and Stanley (2023) <doi:10.1007/s41468-022-00103-8>. Their implementation in phutil relies on the C++ Hera library developed by Kerber, Morozov and Nigmetov (2017) <doi:10.1145/3064175>.
This package provides tools to visualize ordination results in R by adding covariance-based ellipses, centroids, vectors, and confidence regions to plots created with ggplot2'. The package extends the vegan framework and supports Principal Component Analysis (PCA), Redundancy Analysis (RDA), and Non-metric Multidimensional Scaling (NMDS). Ellipses can represent either group dispersion (standard deviation, SD) or centroid precision (standard error, SE), following Wang et al. (2015) <doi:10.1371/journal.pone.0118537>. Robust estimators of covariance are implemented, including the Minimum Covariance Determinant (MCD) method of Hubert et al. (2018) <doi:10.1002/wics.1421>. This approach reduces the influence of outliers. barrel is particularly useful for multivariate ecological datasets, promoting reproducible, publication-quality ordination graphics with minimal effort.
The nonparametric methods for estimating copula entropy, transfer entropy, and the statistics for multivariate normality test and two-sample test are implemented. The methods for estimating transfer entropy and the statistics for multivariate normality test and two-sample test are based on the method for estimating copula entropy. The method for change point detection with copula entropy based two-sample test is also implemented. Please refer to Ma and Sun (2011) <doi:10.1016/S1007-0214(11)70008-6>, Ma (2019) <doi:10.48550/arXiv.1910.04375>, Ma (2022) <doi:10.48550/arXiv.2206.05956>, Ma (2023) <doi:10.48550/arXiv.2307.07247>, and Ma (2024) <doi:10.48550/arXiv.2403.07892> for more information.
This is an R implementation of Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure (DIFFEE). The DIFFEE algorithm can be used to fast estimate the differential network between two related datasets. For instance, it can identify differential gene network from datasets of case and control. By performing data-driven network inference from two high-dimensional data sets, this tool can help users effectively translate two aggregated data blocks into knowledge of the changes among entities between two Gaussian Graphical Model. Please run demo(diffeeDemo) to learn the basic functions provided by this package. For further details, please read the original paper: Beilun Wang, Arshdeep Sekhon, Yanjun Qi (2018) <arXiv:1710.11223>.
This package provides a user-friendly interface, using Shiny, to analyse glucose-stimulated insulin secretion (GSIS) assays in pancreatic beta cells or islets. The package allows the user to import several sets of experiments from different spreadsheets and to perform subsequent steps: summarise in a tidy format, visualise data quality and compare experimental conditions without omitting to account for technical confounders such as the date of the experiment or the technician. Together, insane is a comprehensive method that optimises pre-processing and analyses of GSIS experiments in a friendly-user interface. The Shiny App was initially designed for EndoC-betaH1 cell line following method described in Ndiaye et al., 2017 (<doi:10.1016/j.molmet.2017.03.011>).
This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
The creation of effective visualizations is a fundamental component of data analysis. In biomedical research, new challenges are emerging to visualize multi-dimensional data in a 2D space, but current data visualization tools have limited capabilities. To address this problem, we leverage Gestalt principles to improve the design and interpretability of multi-dimensional data in 2D data visualizations, layering aesthetics to display multiple variables. The proposed visualization can be applied to spatially-resolved transcriptomics data, but also broadly to data visualized in 2D space, such as embedding visualizations. We provide this open source R package escheR, which is built off of the state-of-the-art ggplot2 visualization framework and can be seamlessly integrated into genomics toolboxes and workflows.
The Clinical Trials Network (CTN) of the U.S. National Institute of Drug Abuse sponsored the CTN-0094 research team to harmonize data sets from three nationally-representative clinical trials for opioid use disorder (OUD). The CTN-0094 team herein provides a coded collection of trial outcomes and endpoints used in various OUD clinical trials over the past 50 years. These coded outcome functions are used to contrast and cluster different clinical outcome functions based on daily or weekly patient urine screenings. Note that we abbreviate urine drug screen as "UDS" and urine opioid screen as "UOS". For the example data sets (based on clinical trials data harmonized by the CTN-0094 research team), UDS and UOS are largely interchangeable.
Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in The C Programming Language did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in Clean Code: A Handbook of Agile Software Craftsmanship did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout <https://cran.r-project.org/package=lintr> instead.
Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. <https://en.wikipedia.org/wiki/FITS> for more information). Present low-level routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multi-dimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multi-dimensional arrays). Higher-level functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16-bit integers. Known incompletenesses are reading random group extensions, as well as complex and array descriptor data types in binary tables.
Models for non-linear time series analysis and causality detection. The main functionalities of this package consist of an implementation of the classical causality test (C.W.J.Granger 1980) <doi:10.1016/0165-1889(80)90069-X>, and a non-linear version of it based on feed-forward neural networks. This package contains also an implementation of the Transfer Entropy <doi:10.1103/PhysRevLett.85.461>, and the continuous Transfer Entropy using an approximation based on the k-nearest neighbors <doi:10.1103/PhysRevE.69.066138>. There are also some other useful tools, like the VARNN (Vector Auto-Regressive Neural Network) prediction model, the Augmented test of stationarity, and the discrete and continuous entropy and mutual information.
Fits by ABC, the parameters of a stochastic process modelling the phylogeny and evolution of a suite of traits following the tree. The user may define an arbitrary Markov process for the trait and phylogeny. Importantly, trait-dependent speciation models are handled and fitted to data. See K. Bartoszek, P. Lio (2019) <doi:10.5506/APhysPolBSupp.12.25>. The suggested geiger package can be obtained from CRAN's archive <https://cran.r-project.org/src/contrib/Archive/geiger/>, suggested to take latest version. Otherwise its required code is present in the pcmabc package. The suggested distory package can be obtained from CRAN's archive <https://cran.r-project.org/src/contrib/Archive/distory/>, suggested to take latest version.
This package implements a spatiotemporal boundary detection model with a dissimilarity metric for areal data with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). The response variable can be modeled as Gaussian (no nugget), probit or Tobit link and spatial correlation is introduced at each time point through a conditional autoregressive (CAR) prior. Temporal correlation is introduced through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method", by Berchuck et al (2019) <doi:10.1080/01621459.2018.1537911>.
This package provides a differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test, a Kruskal-Wallis test, a generalized linear model, or a correlation test. All tests report p-values and Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs.
Estimating and analyzing auto regressive integrated moving average (ARIMA) models. The primary function in this package is arima(), which fits an ARIMA model to univariate time series data using a random restart algorithm. This approach frequently leads to models that have model likelihood greater than or equal to that of the likelihood obtained by fitting the same model using the arima() function from the stats package. This package enables proper optimization of model likelihoods, which is a necessary condition for performing likelihood ratio tests. This package relies heavily on the source code of the arima() function of the stats package. For more information, please see Jesse Wheeler and Edward L. Ionides (2025) <doi:10.1371/journal.pone.0333993>.
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
Bayesian estimations of a covariance matrix for multivariate normal data. Assumes that the covariance matrix is sparse or band matrix and positive-definite. Methods implemented include the beta-mixture shrinkage prior (Lee et al. (2022) <doi:10.1016/j.jmva.2022.105067>), screened beta-mixture prior (Lee et al. (2024) <doi:10.1214/24-BA1495>), and post-processed posteriors for banded and sparse covariances (Lee et al. (2023) <doi:10.1214/22-BA1333>; Lee and Lee (2023) <doi:10.1016/j.jeconom.2023.105475>). This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea ('NRF') funded by the Ministry of Education ('RS-2023-00211979', NRF-2022R1A5A7033499', NRF-2020R1A4A1018207 and NRF-2020R1C1C1A01013338').
Supervised learning from a source distribution (with known segmentation into cell sub-populations) to fit a target distribution with unknown segmentation. It relies regularized optimal transport to directly estimate the different cell population proportions from a biological sample characterized with flow cytometry measurements. It is based on the regularized Wasserstein metric to compare cytometry measurements from different samples, thus accounting for possible mis-alignment of a given cell population across sample (due to technical variability from the technology of measurements). Supervised learning technique based on the Wasserstein metric that is used to estimate an optimal re-weighting of class proportions in a mixture model Details are presented in Freulon P, Bigot J and Hejblum BP (2023) <doi:10.1214/22-AOAS1660>.
Inference using a class of Hidden Markov models (HMMs) called oHMMed'(ordered HMM with emission densities <doi:10.1186/s12859-024-05751-4>): The oHMMed algorithms identify the number of comparably homogeneous regions within observed sequences with autocorrelation patterns. These are modelled as discrete hidden states; the observed data points are then realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are then inferred. Relevant for application to genomic sequences, time series, or any other sequence data with serial autocorrelation.
This package implements a semi-supervised learning framework for finite mixture models under a mixed-missingness mechanism. The approach models both missing completely at random (MCAR) and entropy-based missing at random (MAR) processes using a logisticâ entropy formulation. Estimation is carried out via an Expectationâ -Conditional Maximisation (ECM) algorithm with robust initialisation routines for stable convergence. The methodology relates to the statistical perspective and informative missingness behaviour discussed in Ahfock and McLachlan (2020) <doi:10.1007/s11222-020-09971-5> and Ahfock and McLachlan (2023) <doi:10.1016/j.ecosta.2022.03.007>. The package provides functions for data simulation, model estimation, prediction, and theoretical Bayes error evaluation for analysing partially labelled data under a mixed-missingness mechanism.
This package provides computational tools for estimating inverse regions and constructing the corresponding simultaneous outer and inner confidence regions. Acceptable input includes both one-dimensional and two-dimensional data for linear, logistic, functional, and spatial generalized least squares regression models. Functions are also available for constructing simultaneous confidence bands (SCBs) for these models. The definition of simultaneous confidence regions (SCRs) follows Sommerfeld et al. (2018) <doi:10.1080/01621459.2017.1341838>. Methods for estimating inverse regions, SCRs, and the nonparametric bootstrap are based on Ren et al. (2024) <doi:10.1093/jrsssc/qlae027>. Methods for constructing SCBs are described in Crainiceanu et al. (2024) <doi:10.1201/9781003278726> and Telschow et al. (2022) <doi:10.1016/j.jspi.2021.05.008>.