This is a package for estimation and inference from generalized linear models based on various methods for bias reduction and maximum penalized likelihood with powers of the Jeffreys prior as penalty. The brglmFit fitting method can achieve reduction of estimation bias by solving either the mean bias-reducing adjusted score equations in Firth (1993) <doi:10.1093/biomet/80.1.27> and Kosmidis and Firth (2009) <doi:10.1093/biomet/asp055>, or the median bias-reduction adjusted score equations in Kenne et al. (2017) <doi:10.1093/biomet/asx046>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <https://www.jstor.org/stable/2345592>.
TaxSEA is an R package for Taxon Set Enrichment Analysis, which utilises a Kolmogorov-Smirnov test analyses to investigate differential abundance analysis output for whether there are alternations in a-priori defined sets of taxa from public databases (BugSigDB, MiMeDB, GutMGene, mBodyMap, BacDive and GMRepoV2) and collated from the literature. TaxSEA takes as input a list of taxonomic identifiers (e.g. species names, NCBI IDs etc.) and a rank (E.g. fold change, correlation coefficient). TaxSEA be applied to any microbiota taxonomic profiling technology (array-based, 16S rRNA gene sequencing, shotgun metagenomics & metatranscriptomics etc.) and enables researchers to rapidly contextualize their findings within the broader literature to accelerate interpretation of results.
Create a blended curve from two survival curves, which is particularly useful for survival extrapolation in health technology assessment. The main idea is to mix a flexible model that fits the observed data well with a parametric model that encodes assumptions about long-term survival. The two curves are blended into a single survival curve that is identical to the first model over the range of observed times and gradually approaches the parametric model over the extrapolation period based on a given weight function. This approach allows for the inclusion of external information, such as data from registries or expert opinion, to guide long-term extrapolations, especially when dealing with immature trial data. See Che et al. (2022) <doi:10.1177/0272989X221134545>.
This package provides the dose transition pathways (DTP) to project in advance the doses recommended by a model-based design for subsequent patients (stay, escalate, deescalate or stop early) using all the accumulated toxicity information; See Yap et al (2017) <doi: 10.1158/1078-0432.CCR-17-0582>. DTP can be used as a design and an operational tool and can be displayed as a table or flow diagram. The dtpcrm package also provides the modified continual reassessment method (CRM) and time-to-event CRM (TITE-CRM) with added practical considerations to allow stopping early when there is sufficient evidence that the lowest dose is too toxic and/or there is a sufficient number of patients dosed at the maximum tolerated dose.
Researchers have been using simulated data from a multivariate linear model to compare and evaluate different methods, ideas and models. Additionally, teachers and educators have been using a simulation tool to demonstrate and teach various statistical and machine learning concepts. This package helps users to simulate linear model data with a wide range of properties by tuning few parameters such as relevant latent components. In addition, a shiny app as an RStudio gadget gives users a simple interface for using the simulation function. See more on: Sæbø, S., Almøy, T., Helland, I.S. (2015) <doi:10.1016/j.chemolab.2015.05.012> and Rimal, R., Almøy, T., Sæbø, S. (2018) <doi:10.1016/j.chemolab.2018.02.009>.
This function produces empirical best linier unbiased predictions (EBLUPs) for Zero-Inflated data and its Relative Standard Error. Small Area Estimation with Zero-Inflated Model (SAE-ZIP) is a model developed for Zero-Inflated data that can lead us to overdispersion situation. To handle this kind of situation, this model is created. The model in this package is based on Small Area Estimation with Zero-Inflated Poisson model proposed by Dian Christien Arisona (2018)<https://repository.ipb.ac.id/handle/123456789/92308>. For the data sample itself, we use combination method between Roberto Benavent and Domingo Morales (2015)<doi:10.1016/j.csda.2015.07.013> and Sabine Krieg, Harm Jan Boonstra and Marc Smeets (2016)<doi:10.1515/jos-2016-0051>.
Used in testing if the indirect effect from linear regression mediation analysis is equal to 0. Includes established methods such as the Sobel Test, Joint Significant test (maxP), and tests based off the distribution of the Product or Normal Random Variables. Additionally, this package adds more powerful tests based on Intersection-Union theory. These tests are the S-Test, the ps-test, and the ascending squares test. These new methods are uniformly more powerful than maxP, which is more powerful than Sobel and less anti-conservative than the Product of Normal Random Variables. These methods are explored by Kidd and Lin, (2024) <doi:10.1007/s12561-023-09386-6> and Kidd et al., (2025) <doi:10.1007/s10260-024-00777-7>.
Analysis of multivariate data with two-way completely randomized factorial design. The analysis is based on fully nonparametric, rank-based methods and uses test statistics based on the Dempster's ANOVA, Wilk's Lambda, Lawley-Hotelling and Bartlett-Nanda-Pillai criteria. The multivariate response is allowed to be ordinal, quantitative, binary or a mixture of the different variable types. The package offers two functions performing the analysis, one for small and the other for large sample sizes. The underlying methodology is largely described in Bathke and Harrar (2016) <doi:10.1007/978-3-319-39065-9_7> and in Munzel and Brunner (2000) <doi:10.1016/S0378-3758(99)00212-8> and in Kiefel and Bathke (2022) <doi:10.1515/stat-2022-0112>.
Computation of t-year survival probabilities and t-year risks with right censored survival data. The Kaplan-Meier estimator is used to provide estimates for data without competing risks and the Aalen-Johansen estimator is used when there are competing risks. Confidence intervals and p-values are obtained using either usual Wald-type inference or empirical likelihood inference, as described in Thomas and Grunkemeier (1975) <doi:10.1080/01621459.1975.10480315> and Blanche (2020) <doi:10.1007/s10985-018-09458-6>. Functions for both one-sample and two-sample inference are provided. Unlike Wald-type inference, empirical likelihood inference always leads to consistent conclusions, in terms of statistical significance, when comparing two risks (or survival probabilities) via either a ratio or a difference.
Component-wise gradient boosting for analysis of multiply imputed datasets. Implements the algorithm Boosting after Multiple Imputation (MIBoost), which enforces uniform variable selection across imputations and provides utilities for pooling. Includes a cross-validation workflow that first splits the data into training and validation sets and then performs imputation on the training data, applying the learned imputation models to the validation data to avoid information leakage. Supports Gaussian and logistic loss. Methods relate to gradient boosting and multiple imputation as in Buehlmann and Hothorn (2007) <doi:10.1214/07-STS242>, Friedman (2001) <doi:10.1214/aos/1013203451>, and van Buuren (2018, ISBN:9781138588318) and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; see also Kuchen (2025) <doi:10.48550/arXiv.2507.21807>.
Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. The evtree package implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. CPU and memory-intensive tasks are fully computed in C++ while the partykit package is leveraged to represent the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions.
This package implements the Exploratory Graph Analysis (EGA) framework for dimensionality and psychometric assessment. EGA estimates the number of dimensions in psychological data using network estimation methods and community detection algorithms. A bootstrap method is provided to assess the stability of dimensions and items. Fit is evaluated using the Entropy Fit family of indices. Unique Variable Analysis evaluates the extent to which items are locally dependent (or redundant). Network loadings provide similar information to factor loadings and can be used to compute network scores. A bootstrap and permutation approach are available to assess configural and metric invariance. Hierarchical structures can be detected using Hierarchical EGA. Time series and intensive longitudinal data can be analyzed using Dynamic EGA, supporting individual, group, and population level assessments.
Generates efficient balanced mixed-level k-circulant supersaturated designs by interchanging the elements of the generator vector. Attempts to generate a supersaturated design that has EfNOD efficiency more than user specified efficiency level (mef). Displays the progress of generation of an efficient mixed-level k-circulant design through a progress bar. The progress of 100 per cent means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs. For more details, please see Mandal, B.N., Gupta V. K. and Parsad, R. (2011). Construction of Efficient Mixed-Level k-Circulant Supersaturated Designs, Journal of Statistical Theory and Practice, 5:4, 627-648, <doi:10.1080/15598608.2011.10483735>.
Efficient framework to estimate high-dimensional generalized matrix factorization models using penalized maximum likelihood under a dispersion exponential family specification. Either deterministic and stochastic methods are implemented for the numerical maximization. In particular, the package implements the stochastic gradient descent algorithm with a block-wise mini-batch strategy to speed up the computations and an efficient adaptive learning rate schedule to stabilize the convergence. All the theoretical details can be found in Castiglione et al. (2024, <doi:10.48550/arXiv.2412.20509>). Other methods considered for the optimization are the alternated iterative re-weighted least squares and the quasi-Newton method with diagonal approximation of the Fisher information matrix discussed in Kidzinski et al. (2022, <http://jmlr.org/papers/v23/20-1104.html>).
An extension to the individual claim simulator called SynthETIC (on CRAN), to simulate the evolution of case estimates of incurred losses through the lifetime of an insurance claim. The transactional simulation output now comprises key dates, and both claim payments and revisions of estimated incurred losses. An initial set of test parameters, designed to mirror the experience of a real insurance portfolio, were set up and applied by default to generate a realistic test data set of incurred histories (see vignette). However, the distributional assumptions used to generate this data set can be easily modified by users to match their experiences. Reference: Avanzi B, Taylor G, Wang M (2021) "SPLICE: A Synthetic Paid Loss and Incurred Cost Experience Simulator" <arXiv:2109.04058>.
The focus is on simulating and modeling families with founders drawn from a structured population (for example, with different ancestries or other potentially non-family relatedness), in contrast to traditional pedigree analysis that treats all founders as equally unrelated. Main function simulates a random pedigree for many generations, avoiding close relatives, pairing closest individuals according to a 1D geography and their randomly-drawn sex, and with variable children sizes to result in a target population size per generation. Auxiliary functions calculate kinship matrices, admixture matrices, and draw random genotypes across arbitrary pedigree structures starting from the corresponding founder values. The code is built around the plink FAM table format for pedigrees. Described in Yao and Ochoa (2022) <doi:10.1101/2022.03.25.485885>.
Define, simulate, and validate stock-flow consistent (SFC) macroeconomic models. The godley R package offers tools to dynamically define model structures by adding variables and specifying governing systems of equations. With it, users can analyze how different macroeconomic structures affect key variables, perform parameter sensitivity analyses, introduce policy shocks, and visualize resulting economic scenarios. The accounting structure of SFC models follows the approach outlined in the seminal study by Godley and Lavoie (2007, ISBN:978-1-137-08599-3), ensuring a comprehensive integration of all economic flows and stocks. The algorithms implemented to solve the models are based on methodologies from Kinsella and O'Shea (2010) <doi:10.2139/ssrn.1729205>, Peressini and Sullivan (1988, ISBN:0-387-96614-5), and contributions by Joao Macalos.
This package implements methods for clustering mixed-type data, specifically combinations of continuous and nominal data. Special attention is paid to the often-overlooked problem of equitably balancing the contribution of the continuous and categorical variables. This package implements KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. Also implemented is Modha-Spangler clustering, which uses a brute-force strategy to maximize the cluster separation simultaneously in the continuous and categorical variables. For more information, see Foss, Markatou, Ray, & Heching (2016) <doi:10.1007/s10994-016-5575-7> and Foss & Markatou (2018) <doi:10.18637/jss.v083.i13>.
This package provides a suite of tools that can assist in enhancing the processing efficiency of SQL and R scripts. - The libr_unused() retrieves a vector of package names that are called within an R script but are never actually used in the script. - The libr_used() retrieves a vector of package names actively utilized within an R script; packages loaded using library() but not actually used in the script will not be included. - The libr_called() retrieves a vector of all package names which are called within an R script. - nolock() appends WITH (nolock) to all tables in SQL queries. This facilitates reading from databases in scenarios where non-blocking reads are preferable, such as in high-transaction environments.
Metapackage for implementing a variety of event-based models, with a focus on spatially explicit models. These include raster-based, event-based, and agent-based models. The core simulation components (provided by SpaDES.core') are built upon a discrete event simulation (DES; see Matloff (2011) ch 7.8.3 <https://nostarch.com/artofr.htm>) framework that facilitates modularity, and easily enables the user to include additional functionality by running user-built simulation modules (see also SpaDES.tools'). Included are numerous tools to visualize rasters and other maps (via quickPlot'), and caching methods for reproducible simulations (via reproducible'). Tools for running simulation experiments are provided by SpaDES.experiment'. Additional functionality is provided by the SpaDES.addins and SpaDES.shiny packages.
Likelihood-based estimation of mixed-effects transformation models using the Template Model Builder ('TMB', Kristensen et al., 2016) <doi:10.18637/jss.v070.i05>. The technical details of transformation models are given in Hothorn et al. (2018) <doi:10.1111/sjos.12291>. Likelihood contributions of exact, randomly censored (left, right, interval) and truncated observations are supported. The random effects are assumed to be normally distributed on the scale of the transformation function, the marginal likelihood is evaluated using the Laplace approximation, and the gradients are calculated with automatic differentiation (Tamasi & Hothorn, 2021) <doi:10.32614/RJ-2021-075>. Penalized smooth shift terms can be defined using the mgcv notation. Additive mixed-effects transformation models are described in Tamasi (2025) <doi:10.18637/jss.v114.i11>.
This package provides a collection of randomization tests, data sets and examples. The current version focuses on five testing problems and their implementation in empirical work. First, it facilitates the empirical researcher to test for particular hypotheses, such as comparisons of means, medians, and variances from k populations using robust permutation tests, which asymptotic validity holds under very weak assumptions, while retaining the exact rejection probability in finite samples when the underlying distributions are identical. Second, the description and implementation of a permutation test for testing the continuity assumption of the baseline covariates in the sharp regression discontinuity design (RDD) as in Canay and Kamat (2018) <https://goo.gl/UZFqt7>. More specifically, it allows the user to select a set of covariates and test the aforementioned hypothesis using a permutation test based on the Cramer-von Misses test statistic. Graphical inspection of the empirical CDF and histograms for the variables of interest is also supported in the package. Third, it provides the practitioner with an effortless implementation of a permutation test based on the martingale decomposition of the empirical process for testing for heterogeneous treatment effects in the presence of an estimated nuisance parameter as in Chung and Olivares (2021) <doi:10.1016/j.jeconom.2020.09.015>. Fourth, this version considers the two-sample goodness-of-fit testing problem under covariate adaptive randomization and implements a permutation test based on a prepivoted Kolmogorov-Smirnov test statistic. Lastly, it implements an asymptotically valid permutation test based on the quantile process for the hypothesis of constant quantile treatment effects in the presence of an estimated nuisance parameter.
Chromatin looping is an essential feature of eukaryotic genomes and can bring regulatory sequences, such as enhancers or transcription factor binding sites, in the close physical proximity of regulated target genes. Here, we provide sevenC, an R package that uses protein binding signals from ChIP-seq and sequence motif information to predict chromatin looping events. Cross-linking of proteins that bind close to loop anchors result in ChIP-seq signals at both anchor loci. These signals are used at CTCF motif pairs together with their distance and orientation to each other to predict whether they interact or not. The resulting chromatin loops might be used to associate enhancers or transcription factor binding sites (e.g., ChIP-seq peaks) to regulated target genes.
This package provides a routine to partial out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm described in Stammann (2018) <doi:10.48550/arXiv.1707.01815> and is restricted to glm's that are based on maximum likelihood estimation and nonlinear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides analytical bias corrections for binary choice models derived by Fernandez-Val and Weidner (2016) <doi:10.1016/j.jeconom.2015.12.014> and Hinz, Stammann, and Wanner (2020) <doi:10.48550/arXiv.2004.12655>.