This package provides various R programming tools for data manipulation, including:
medical unit conversions
combining objects
character vector operations
factor manipulation
obtaining information about R objects
generating fixed-width format files
extricating components of date and time objects
operations on columns of data frames
matrix operations
operations on vectors and data frames
value of last evaluated expression
wrapper for
samplethat ensures consistent behavior for both scalar and vector arguments
Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification. Based on the models and algorithms described in Durbin et al (1998, ISBN: 9780521629713).
This package provides the functions for planning and conducting a clinical trial with adaptive sample size determination. Maximal statistical efficiency will be exploited even when dramatic or multiple adaptations are made. Such a trial consists of adaptive determination of sample size at an interim analysis and implementation of frequentist statistical test at the interim and final analysis with a prefixed significance level. The required assumptions for the stage-wise test statistics are independent and stationary increments and normality. Predetermination of adaptation rule is not required.
Distances on dual-weighted directed graphs using priority-queue shortest paths (Padgham (2019) <doi:10.32866/6945>). Weighted directed graphs have weights from A to B which may differ from those from B to A. Dual-weighted directed graphs have two sets of such weights. A canonical example is a street network to be used for routing in which routes are calculated by weighting distances according to the type of way and mode of transport, yet lengths of routes must be calculated from direct distances.
This package implements a likelihood-based method for genome polarization, identifying which alleles of SNV markers belong to either side of a barrier to gene flow. The approach co-estimates individual assignment, barrier strength, and divergence between sides, with direct application to studies of hybridization. Includes VCF-to-diem conversion and input checks, support for mixed ploidy and parallelization, and tools for visualization and diagnostic outputs. Based on diagnostic index expectation maximization as described in Baird et al. (2023) <doi:10.1111/2041-210X.14010>.
This package provides methods and tools designed to improve the forecast accuracy for a linearly constrained multiple time series, while fulfilling the linear/aggregation relationships linking the components (Girolimetto and Di Fonzo, 2024 <doi:10.48550/arXiv.2412.03429>). FoCo2 offers multi-task forecast combination and reconciliation approaches leveraging input from multiple forecasting models or experts and ensuring that the resulting forecasts satisfy specified linear constraints. In addition, linear inequality constraints (e.g., non-negativity of the forecasts) can be imposed, if needed.
Computes the power and sample size (PASS) required to test for the difference in the mean function between two groups under a repeatedly measured longitudinal or sparse functional design. See the manuscript by Koner and Luo (2023) <https://salilkoner.github.io/assets/PASS_manuscript.pdf> for details of the PASS formula and computational details. The details of the testing procedure for univariate and multivariate response are presented in Wang (2021) <doi:10.1214/21-EJS1802> and Koner and Luo (2023) <arXiv:2302.05612> respectively.
This package implements methods developed by Ding, Feller, and Miratrix (2016) <doi:10.1111/rssb.12124> <arXiv:1412.5000>, and Ding, Feller, and Miratrix (2018) <doi:10.1080/01621459.2017.1407322> <arXiv:1605.06566> for testing whether there is unexplained variation in treatment effects across observations, and for characterizing the extent of the explained and unexplained variation in treatment effects. The package includes wrapper functions implementing the proposed methods, as well as helper functions for analyzing and visualizing the results of the test.
Joint mean and dispersion effects models fit the mean and dispersion parameters of a response variable by two separate linear models, the mean and dispersion submodels, simultaneously. It also allows the users to choose either the deviance or the Pearson residuals as the response variable of the dispersion submodel. Furthermore, the package provides the possibility to nest the submodels in one another, if one of the parameters has significant explanatory power on the other. Wu & Li (2016) <doi:10.1016/j.csda.2016.04.015>.
Calculate a multivariate functional principal component analysis for data observed on different dimensional domains. The estimation algorithm relies on univariate basis expansions for each element of the multivariate functional data (Happ & Greven, 2018) <doi:10.1080/01621459.2016.1273115>. Multivariate and univariate functional data objects are represented by S4 classes for this type of data implemented in the package funData'. For more details on the general concepts of both packages and a case study, see Happ-Kurz (2020) <doi:10.18637/jss.v093.i05>.
This package implements methods for estimating generalized estimating equations (GEE) with advanced options for flexible modeling and handling missing data. This package provides tools to fit and analyze GEE models for longitudinal data, allowing users to address missingness using a variety of imputation techniques. It supports both univariate and multivariate modeling, visualization of missing data patterns, and facilitates the transformation of data for efficient statistical analysis. Designed for researchers working with complex datasets, it ensures robust estimation and inference in longitudinal and clustered data settings.
This package provides an interface to the Mapbox GL JS (<https://docs.mapbox.com/mapbox-gl-js/guides>) and the MapLibre GL JS (<https://maplibre.org/maplibre-gl-js/docs/>) interactive mapping libraries to help users create custom interactive maps in R. Users can create interactive globe visualizations; layer sf objects to create filled maps, circle maps, heatmaps', and three-dimensional graphics; and customize map styles and views. The package also includes utilities to use Mapbox and MapLibre maps in Shiny web applications.
Asymptotic efficient closed-form estimators (MLEces) are provided in this package for three multivariate distributions(gamma, Weibull and Dirichlet) whose maximum likelihood estimators (MLEs) are not in closed forms. Closed-form estimators are strong consistent, and have the similar asymptotic normal distribution like MLEs. But the calculation of MLEces are much faster than the corresponding MLEs. Further details and explanations of MLEces can be found in. Jang, et al. (2023) <doi:10.1111/stan.12299>. Kim, et al. (2023) <doi:10.1080/03610926.2023.2179880>.
This ONEST software implements the method of assessing the pathologist agreement in reading PD-L1 assays (Reisenbichler et al. (2020 <doi:10.1038/s41379-020-0544-x>)), to determine the minimum number of evaluators needed to estimate agreement involving a large number of raters. Input to the program should be binary(1/0) pathology data, where â 0â may stand for negative and â 1â for positive. Additional examples were given using the data from Rimm et al. (2017 <doi:10.1001/jamaoncol.2017.0013>).
This software is useful for loading .fasta or .gbk files, and for retrieving sequences from GenBank dataset <https://www.ncbi.nlm.nih.gov/genbank/>. This package allows to detect differences or asymmetries based on nucleotide composition by using local linear kernel smoothers. Also, it is possible to draw inference about critical points (i. e. maximum or minimum points) related with the derivative curves. Additionally, bootstrap methods have been used for estimating confidence intervals and speed computational techniques (binning techniques) have been implemented in seq2R'.
Augmenting a matched data set by generating multiple stochastic, matched samples from the data using a multi-dimensional histogram constructed from dropping the input matched data into a multi-dimensional grid built on the full data set. The resulting stochastic, matched sets will likely provide a collectively higher coverage of the full data set compared to the single matched set. Each stochastic match is without duplication, thus allowing downstream validation techniques such as cross-validation to be applied to each set without concern for overfitting.
This package provides helper functions and wrappers to simplify authentication, data retrieval, and result processing from the VALD APIs'. Designed to streamline integration for analysts and researchers working with VALD's external APIs'. For further documentation on integrating with VALD APIs', see: <https://support.vald.com/hc/en-au/articles/23415335574553-How-to-integrate-with-VALD-APIs>. For a step-by-step guide to using this package, see: <https://support.vald.com/hc/en-au/articles/48730811824281-A-guide-to-using-the-valdr-R-package>.
This package provides tools to convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model's statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.
Comprehensive set of tools for performing system identification of both linear and nonlinear dynamical systems directly from data. The Automatic Regression for Governing Equations (ARGOS) simplifies the complex task of constructing mathematical models of dynamical systems from observed input and output data, supporting various types of systems, including those described by ordinary differential equations. It employs optimal numerical derivatives for enhanced accuracy and employs formal variable selection techniques to help identify the most relevant variables, thereby enabling the development of predictive models for system behavior analysis.
This package provides functions to to compute a continuum of information-based measures for quantifying the temporal stability of populations, communities, and ecosystems, as well as their associated synchrony, based on species (or species assemblage) biomass or other key variables. When biodiversity data are available, the package also enables the assessment of the corresponding diversityâ stability relationships. All measures are applicable in both temporal and spatial contexts. The theoretical and methodological background is detailed in Chao et al. (2025) <doi:10.1101/2025.08.20.671203>.
This package provides coefficients of interrater reliability that are generalized to cope with randomly incomplete (i.e. unbalanced) datasets without any imputation of missing values or any (row-wise or column-wise) omissions of actually available data. Applied to complete (balanced) datasets, these generalizations yield the same results as the common procedures, namely the Intraclass Correlation according to McGraw & Wong (1996) \doi10.1037/1082-989X.1.1.30 and the Coefficient of Concordance according to Kendall & Babington Smith (1939) \doi10.1214/aoms/1177732186.
Latent binary Bayesian neural networks (LBBNNs) are implemented using torch', an R interface to the LibTorch backend. Supports mean-field variational inference as well as flexible variational posteriors using normalizing flows. The standard LBBNN implementation follows Hubin and Storvik (2024) <doi:10.3390/math12060788>, using the local reparametrization trick as in Skaaret-Lund et al. (2024) <https://openreview.net/pdf?id=d6kqUKzG3V>. Input-skip connections are also supported, as described in Høyheim et al. (2025) <doi:10.48550/arXiv.2503.10496>.
This package provides a comprehensive suite of statistical tools for analyzing, simulating, and computing properties of the Topp-Leone Cauchy Rayleigh (TLCAR) distribution, a versatile distribution amalgamating features of the Topp-Leone, Cauchy, and Rayleigh distributions, ideal for modeling intricate, heterogeneous data across scientific domains. See Atchadé, M.N., Bogninou, M.J., and Djibril, A.M. (2023) <doi:10.1007/s44199-023-00066-4> and Atchadé, M.N., Bogninou, M.J., and Djibril, A.M. (2024) <doi:10.1007/s44199-023-00069-1> for further insights.
The tdROC package facilitates the estimation of time-dependent ROC (Receiver Operating Characteristic) curves and the Area Under the time-dependent ROC Curve (AUC) in the context of survival data, accommodating scenarios with right censored data and the option to account for competing risks. In addition to the ROC/AUC estimation, the package also estimates time-dependent Brier score and survival difference. Confidence intervals of various estimated quantities can be obtained from bootstrap. The package also offers plotting functions for visualizing time-dependent ROC curves.