This package provides a port of the web-based software DAGitty for analyzing structural causal models (also known as directed acyclic graphs or DAGs). This package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
For tree ensembles such as random forests, regularized random forests and gradient boosted trees, this package provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner; calculating frequent variable interactions; formatting rules in latex code. Reference: Interpreting tree ensembles with inTrees (Houtao Deng, 2019, <doi:10.1007/s41060-018-0144-8>).
After the clustering step of a single-cell RNAseq experiment, this package aims to suggest labels/cell types for the clusters, on the basis of similarity to a reference dataset. It requires a table of read counts per cell per gene, and a list of the cells belonging to each of the clusters, (for both test and reference data).
Messina is a collection of algorithms for constructing optimally robust single-gene classifiers, and for identifying differential expression in the presence of outliers or unknown sample subgroups. The methods have application in identifying lead features to develop into clinical tests (both diagnostic and prognostic), and in identifying differential expression when a fraction of samples show unusual patterns of expression.
The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.
This package enables automated selection of group specific signature, especially for rare population. The package is developed for generating specifc lists of signature genes based on Term Frequency-Inverse Document Frequency (TF-IDF) modified methods. It can also be used as a new gene-set scoring method or data transformation method. Multiple visualization functions are implemented in this package.
This wrapper package for mgcv makes it easier to create high-performing Generalized Additive Models (GAMs). With its central function autogam(), by entering just a dataset and the name of the outcome column as inputs, AutoGAM tries to automate the procedure of configuring a highly accurate GAM which performs at reasonably high speed, even for large datasets.
This package provides a collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the Sudachi morphological analyzer and the NEologd (Neologism dictionary for MeCab'). These features are specific to Japanese and are not implemented in ICU (International Components for Unicode).
Computes the cosine-correlation coefficient for measuring the degree of linear dependence among variables in a multidimensional context. The package implements the generalized cosine-correlation theorem for p-1 variables, providing a quantitative assessment of interrelationships within experimental frameworks. This methodology extends classical correlation measures to higher-dimensional spaces using a dimensional exploration approach based on time scale calculus.
Package to fit diffusion-based IRT models to response and response time data. Models are fit using marginal maximum likelihood. Parameter restrictions (fixed value and equality constraints) are possible. In addition, factor scores (person drift rate and person boundary separation) can be estimated. Model fit assessment tools are also available. The traditional diffusion model can be estimated as well.
The amplitude-dependent autoregressive time series model (EXPAR) proposed by Haggan and Ozaki (1981) <doi:10.2307/2335819> was improved by incorporating the moving average (MA) framework for capturing the variability efficiently. Parameters of the EXPARMA model can be estimated using this package. The user is provided with the best fitted EXPARMA model for the data set under consideration.
This package provides tools to work with the Flexible Dirichlet distribution. The main features are an E-M algorithm for computing the maximum likelihood estimate of the parameter vector and a function based on conditional bootstrap to estimate its asymptotic variance-covariance matrix. It contains also functions to plot graphs, to generate random observations and to handle compositional data.
This package provides a simple and flexible tool designed to create enriched figures and tables by providing a way to add text around them through predefined or custom layouts. Any input which is convertible to grob is supported, like ggplot', gt or flextable'. Based on R grid graphics, for more details see Paul Murrell (2018) <doi:10.1201/9780429422768>.
Estimates treatment effects using covariate adjustment methods in Randomized Clinical Trials (RCT) motivated by higher-order influence functions (HOIF). Provides point estimates, oracle bias, variance, and approximate variance for HOIF-adjusted estimators. For methodology details, see Zhao et al. (2024) <doi:10.48550/arXiv.2411.08491> and Gu et al. (2025) <doi:10.48550/arXiv.2512.20046>.
This package provides semiparametric sufficient dimension reduction for central mean subspaces for heterogeneous data defined by combinations of binary factors (such as chronic conditions). Subspaces are estimated to be hierarchically nested to respect the structure of subpopulations with overlapping characteristics. This package is an implementation of the proposed methodology of Huling and Yu (2021) <doi:10.1111/biom.13546>.
This package provides diagnostic tools for understanding and debugging data frame joins. Analyzes key columns before joining to detect duplicates, mismatches, encoding issues, and other common problems. Explains unexpected row count changes and provides safe join wrappers with cardinality enforcement. Concepts and diagnostics build on tidy data principles as described in Wickham (2014) <doi:10.18637/jss.v059.i10>.
Implementation of the KCMeans regression estimator studied by Wiemann (2023) <arXiv:2311.17021> for expectation function estimation conditional on categorical variables. Computation leverages the unconditional KMeans implementation in one dimension using dynamic programming algorithm of Wang and Song (2011) <doi:10.32614/RJ-2011-015>, allowing for global solutions in time polynomial in the number of observed categories.
Implementations several algorithms for kernel k-means. The default OTQT algorithm is a fast alternative to standard implementations of kernel k-means, particularly in cases with many clusters. For a small number of clusters, the implemented MacQueen method typically performs the fastest. For more details and performance evaluations, see Berlinski and Maitra (2025) <doi:10.1002/sam.70032>.
This package performs Monte Carlo hypothesis tests, allowing a couple of different sequential stopping boundaries. For example, a truncated sequential probability ratio test boundary (Fay, Kim and Hachey, 2007 <DOI:10.1198/106186007X257025>) and a boundary proposed by Besag and Clifford, 1991 <DOI:10.1093/biomet/78.2.301>. Gives valid p-values and confidence intervals on p-values.
This package provides a general framework for clinical trial simulations based on the Clinical Scenario Evaluation (CSE) approach. The package supports a broad class of data models (including clinical trials with continuous, binary, survival-type and count-type endpoints as well as multivariate outcomes that are based on combinations of different endpoints), analysis strategies and commonly used evaluation criteria.
This package provides a method for fitting the entire regularization path of the principal components lasso for linear and logistic regression models. The algorithm uses cyclic coordinate descent in a path-wise fashion. See URL below for more information on the algorithm. See Tay, K., Friedman, J. ,Tibshirani, R., (2014) Principal component-guided sparse regression <arXiv:1810.04651>.
This package provides functions for estimating the potential dispersal of tree species using regeneration densities and dispersal distances to nearest seed trees. A quantile regression is implemented to determine the dispersal potential. Spatial prediction can be used to identify natural regeneration potential for forest restoration as described in Axer et al (2021) <doi:10.1016/j.foreco.2020.118802>.
This package implements stacked elastic net regression (Rauschenberger 2021 <doi:10.1093/bioinformatics/btaa535>). The elastic net generalises ridge and lasso regularisation (Zou 2005 <doi:10.1111/j.1467-9868.2005.00503.x>). Instead of fixing or tuning the mixing parameter alpha, we combine multiple alpha by stacked generalisation (Wolpert 1992 <doi:10.1016/S0893-6080(05)80023-1>).
The SAWNUTI algorithm performs sequence comparison for finite sequences of discrete events with non-uniform time intervals. Further description of the algorithm can be found in the paper: A. Murph, A. Flynt, B. R. King (2021). Comparing finite sequences of discrete events with non-uniform time intervals, Sequential Analysis, 40(3), 291-313. <doi:10.1080/07474946.2021.1940491>.