This is a package for estimation and inference from generalized linear models based on various methods for bias reduction and maximum penalized likelihood with powers of the Jeffreys prior as penalty. The brglmFit
fitting method can achieve reduction of estimation bias by solving either the mean bias-reducing adjusted score equations in Firth (1993) <doi:10.1093/biomet/80.1.27> and Kosmidis and Firth (2009) <doi:10.1093/biomet/asp055>, or the median bias-reduction adjusted score equations in Kenne et al. (2017) <doi:10.1093/biomet/asx046>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <https://www.jstor.org/stable/2345592>.
Used in testing if the indirect effect from linear regression mediation analysis is equal to 0. Includes established methods such as the Sobel Test, Joint Significant test (maxP
), and tests based off the distribution of the Product or Normal Random Variables. Additionally, this package adds more powerful tests based on Intersection-Union theory. These tests are the S-Test, the ps-test, and the ascending squares test. These new methods are uniformly more powerful than maxP
, which is more powerful than Sobel and less anti-conservative than the Product of Normal Random Variables. These methods are explored by Kidd and Lin, (2024) <doi:10.1007/s12561-023-09386-6> and Kidd et al., (2025) <doi:10.1007/s10260-024-00777-7>.
Computation of t-year survival probabilities and t-year risks with right censored survival data. The Kaplan-Meier estimator is used to provide estimates for data without competing risks and the Aalen-Johansen estimator is used when there are competing risks. Confidence intervals and p-values are obtained using either usual Wald-type inference or empirical likelihood inference, as described in Thomas and Grunkemeier (1975) <doi:10.1080/01621459.1975.10480315> and Blanche (2020) <doi:10.1007/s10985-018-09458-6>. Functions for both one-sample and two-sample inference are provided. Unlike Wald-type inference, empirical likelihood inference always leads to consistent conclusions, in terms of statistical significance, when comparing two risks (or survival probabilities) via either a ratio or a difference.
This package implements the Exploratory Graph Analysis (EGA) framework for dimensionality and psychometric assessment. EGA estimates the number of dimensions in psychological data using network estimation methods and community detection algorithms. A bootstrap method is provided to assess the stability of dimensions and items. Fit is evaluated using the Entropy Fit family of indices. Unique Variable Analysis evaluates the extent to which items are locally dependent (or redundant). Network loadings provide similar information to factor loadings and can be used to compute network scores. A bootstrap and permutation approach are available to assess configural and metric invariance. Hierarchical structures can be detected using Hierarchical EGA. Time series and intensive longitudinal data can be analyzed using Dynamic EGA, supporting individual, group, and population level assessments.
Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. The evtree package implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. CPU and memory-intensive tasks are fully computed in C++ while the partykit package is leveraged to represent the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions.
Generates efficient balanced mixed-level k-circulant supersaturated designs by interchanging the elements of the generator vector. Attempts to generate a supersaturated design that has EfNOD
efficiency more than user specified efficiency level (mef). Displays the progress of generation of an efficient mixed-level k-circulant design through a progress bar. The progress of 100 per cent means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs. For more details, please see Mandal, B.N., Gupta V. K. and Parsad, R. (2011). Construction of Efficient Mixed-Level k-Circulant Supersaturated Designs, Journal of Statistical Theory and Practice, 5:4, 627-648, <doi:10.1080/15598608.2011.10483735>.
The focus is on simulating and modeling families with founders drawn from a structured population (for example, with different ancestries or other potentially non-family relatedness), in contrast to traditional pedigree analysis that treats all founders as equally unrelated. Main function simulates a random pedigree for many generations, avoiding close relatives, pairing closest individuals according to a 1D geography and their randomly-drawn sex, and with variable children sizes to result in a target population size per generation. Auxiliary functions calculate kinship matrices, admixture matrices, and draw random genotypes across arbitrary pedigree structures starting from the corresponding founder values. The code is built around the plink FAM table format for pedigrees. Described in Yao and Ochoa (2022) <doi:10.1101/2022.03.25.485885>.
Efficient framework to estimate high-dimensional generalized matrix factorization models using penalized maximum likelihood under a dispersion exponential family specification. Either deterministic and stochastic methods are implemented for the numerical maximization. In particular, the package implements the stochastic gradient descent algorithm with a block-wise mini-batch strategy to speed up the computations and an efficient adaptive learning rate schedule to stabilize the convergence. All the theoretical details can be found in Castiglione et al. (2024, <doi:10.48550/arXiv.2412.20509>
). Other methods considered for the optimization are the alternated iterative re-weighted least squares and the quasi-Newton method with diagonal approximation of the Fisher information matrix discussed in Kidzinski et al. (2022, <http://jmlr.org/papers/v23/20-1104.html>).
An extension to the individual claim simulator called SynthETIC
(on CRAN), to simulate the evolution of case estimates of incurred losses through the lifetime of an insurance claim. The transactional simulation output now comprises key dates, and both claim payments and revisions of estimated incurred losses. An initial set of test parameters, designed to mirror the experience of a real insurance portfolio, were set up and applied by default to generate a realistic test data set of incurred histories (see vignette). However, the distributional assumptions used to generate this data set can be easily modified by users to match their experiences. Reference: Avanzi B, Taylor G, Wang M (2021) "SPLICE: A Synthetic Paid Loss and Incurred Cost Experience Simulator" <arXiv:2109.04058>
.
Define, simulate, and validate stock-flow consistent (SFC) macroeconomic models. The godley R package offers tools to dynamically define model structures by adding variables and specifying governing systems of equations. With it, users can analyze how different macroeconomic structures affect key variables, perform parameter sensitivity analyses, introduce policy shocks, and visualize resulting economic scenarios. The accounting structure of SFC models follows the approach outlined in the seminal study by Godley and Lavoie (2007, ISBN:978-1-137-08599-3), ensuring a comprehensive integration of all economic flows and stocks. The algorithms implemented to solve the models are based on methodologies from Kinsella and O'Shea (2010) <doi:10.2139/ssrn.1729205>, Peressini and Sullivan (1988, ISBN:0-387-96614-5), and contributions by Joao Macalos.
This package implements methods for clustering mixed-type data, specifically combinations of continuous and nominal data. Special attention is paid to the often-overlooked problem of equitably balancing the contribution of the continuous and categorical variables. This package implements KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. Also implemented is Modha-Spangler clustering, which uses a brute-force strategy to maximize the cluster separation simultaneously in the continuous and categorical variables. For more information, see Foss, Markatou, Ray, & Heching (2016) <doi:10.1007/s10994-016-5575-7> and Foss & Markatou (2018) <doi:10.18637/jss.v083.i13>.
This package provides a suite of tools that can assist in enhancing the processing efficiency of SQL and R scripts. - The libr_unused()
retrieves a vector of package names that are called within an R script but are never actually used in the script. - The libr_used()
retrieves a vector of package names actively utilized within an R script; packages loaded using library()
but not actually used in the script will not be included. - The libr_called()
retrieves a vector of all package names which are called within an R script. - nolock()
appends WITH (nolock) to all tables in SQL queries. This facilitates reading from databases in scenarios where non-blocking reads are preferable, such as in high-transaction environments.
Computes standard error and confidence interval of various descriptive statistics under various designs and sampling schemes. The main function, superbPlot()
, return a plot. superbData()
returns a dataframe with the statistic and its precision interval so that other plotting package can be used. See Cousineau and colleagues (2021) <doi:10.1177/25152459211035109> or Cousineau (2017) <doi:10.5709/acp-0214-z> for a review as well as Cousineau (2005) <doi:10.20982/tqmp.01.1.p042>, Morey (2008) <doi:10.20982/tqmp.04.2.p061>, Baguley (2012) <doi:10.3758/s13428-011-0123-7>, Cousineau & Laurencelle (2016) <doi:10.1037/met0000055>, Cousineau & O'Brien (2014) <doi:10.3758/s13428-013-0441-z>, Calderini & Harding <doi:10.20982/tqmp.15.1.p001> for specific references.
Metapackage for implementing a variety of event-based models, with a focus on spatially explicit models. These include raster-based, event-based, and agent-based models. The core simulation components (provided by SpaDES.core
') are built upon a discrete event simulation (DES; see Matloff (2011) ch 7.8.3 <https://nostarch.com/artofr.htm>) framework that facilitates modularity, and easily enables the user to include additional functionality by running user-built simulation modules (see also SpaDES.tools
'). Included are numerous tools to visualize rasters and other maps (via quickPlot
'), and caching methods for reproducible simulations (via reproducible'). Tools for running simulation experiments are provided by SpaDES.experiment
'. Additional functionality is provided by the SpaDES.addins
and SpaDES.shiny
packages.
This package provides a general purpose simulation-based power analysis API for routine and customized simulation experimental designs. The package focuses exclusively on Monte Carlo simulation experiment variants of (expected) prospective power analyses, criterion analyses, compromise analyses, sensitivity analyses, and a priori/post-hoc analyses. The default simulation experiment functions defined within the package provide stochastic variants of the power analysis subroutines in G*Power 3.1 (Faul, Erdfelder, Buchner, and Lang, 2009) <doi:10.3758/brm.41.4.1149>, along with various other parametric and non-parametric power analysis applications (e.g., mediation analyses). Additional functions for building empirical power curves, reanalyzing simulation information, and for increasing the precision of the resulting power estimates are also included, each of which utilize similar API structures.
This package provides a collection of randomization tests, data sets and examples. The current version focuses on five testing problems and their implementation in empirical work. First, it facilitates the empirical researcher to test for particular hypotheses, such as comparisons of means, medians, and variances from k populations using robust permutation tests, which asymptotic validity holds under very weak assumptions, while retaining the exact rejection probability in finite samples when the underlying distributions are identical. Second, the description and implementation of a permutation test for testing the continuity assumption of the baseline covariates in the sharp regression discontinuity design (RDD) as in Canay and Kamat (2018) <https://goo.gl/UZFqt7>. More specifically, it allows the user to select a set of covariates and test the aforementioned hypothesis using a permutation test based on the Cramer-von Misses test statistic. Graphical inspection of the empirical CDF and histograms for the variables of interest is also supported in the package. Third, it provides the practitioner with an effortless implementation of a permutation test based on the martingale decomposition of the empirical process for testing for heterogeneous treatment effects in the presence of an estimated nuisance parameter as in Chung and Olivares (2021) <doi:10.1016/j.jeconom.2020.09.015>. Fourth, this version considers the two-sample goodness-of-fit testing problem under covariate adaptive randomization and implements a permutation test based on a prepivoted Kolmogorov-Smirnov test statistic. Lastly, it implements an asymptotically valid permutation test based on the quantile process for the hypothesis of constant quantile treatment effects in the presence of an estimated nuisance parameter.
Estimates the probability matrix for the RÃ C Ecological Inference problem using the Expectation-Maximization Algorithm with four approximation methods for the E-Step, and an exact method as well. It also provides a bootstrap function to estimate the standard deviation of the estimated probabilities. In addition, it has functions that aggregate rows optimally to have more reliable estimates in cases of having few data points. For comparing the probability estimates of two groups, a Wald test routine is implemented. The library has data from the first round of the Chilean Presidential Election 2021 and can also generate synthetic election data. Methods described in Thraves, Charles; Ubilla, Pablo; Hermosilla, Daniel (2024) A Fast Ecological Inference Algorithm for the RÃ C case <doi:10.2139/ssrn.4832834>.
The gRbase
package provides graphical modelling features used by e.g. the packages gRain
', gRim
and gRc
'. gRbase
implements graph algorithms including (i) maximum cardinality search (for marked and unmarked graphs). (ii) moralization, (iii) triangulation, (iv) creation of junction tree. gRbase
facilitates array operations, gRbase
implements functions for testing for conditional independence. gRbase
illustrates how hierarchical log-linear models may be implemented and describes concept of graphical meta data. The facilities of the package are documented in the book by Højsgaard, Edwards and Lauritzen (2012, <doi:10.1007/978-1-4614-2299-0>) and in the paper by Dethlefsen and Højsgaard, (2005, <doi:10.18637/jss.v014.i17>). Please see citation("gRbase
") for citation details.
This package provides a complete framework for frequency analysis is provided by LMoFit
'. It has functions related to the determination of sample L-moments as in Hosking, J.R.M. (1990) <doi:10.1111/j.2517-6161.1990.tb01775.x>, the fitting of various distributions as in Zaghloul et al. (2020) <doi:10.1016/j.advwatres.2020.103720> and Hosking, J.R.M. (2019) <https://CRAN.R-project.org/package=lmom>, besides plotting and manipulating L-space diagrams as in Papalexiou, S.M. & Koutsoyiannis, D. (2016) <doi:10.1016/j.advwatres.2016.05.005> for two-shape parametric distributions on the L-moment ratio diagram. Additionally, the quantile, probability density, and cumulative probability functions of various distributions are provided in a user-friendly manner.
Statistical decision in proteomics data using a hierarchical Bayesian model. There are two regression models for describing the mean-variance trend, a gamma regression or a latent gamma mixture regression. The regression model is then used as an Empirical Bayes estimator for the prior on the variance in a peptide. Further, it assumes that each measurement has an uncertainty (increased variance) associated with it that is also inferred. Finally, it tries to estimate the posterior distribution (by Hamiltonian Monte Carlo) for the differences in means for each peptide in the data. Once the posterior is inferred, it integrates the tails to estimate the probability of error from which a statistical decision can be made. See Berg and Popescu for details (<doi:10.1101/2023.05.11.540411>).
This package provides a system for calculating the optimal sampling effort, based on the ideas of "Ecological cost-benefit optimization" as developed by A. Underwood (1997, ISBN 0 521 55696 1). Data is obtained from simulated ecological communities with prep_data()
which formats and arranges the initial data, and then the optimization follows the following procedure of four functions: (1) scompvar()
calculates the variation components necessary for (2) sim_cbo()
to calculate the optimal combination of number of sites and samples depending on either an economic budget or on a desired statistical accuracy. Additionally, (3) sim_beta()
estimates statistical power and type 2 error by using Permutational Multivariate Analysis of Variance, and (6) plot_power()
represents the results of the previous function.
Fits a geographically weighted regression model with different scales for each covariate. Uses the negative binomial distribution as default, but also accepts the normal, Poisson, or logistic distributions. Can fit the global versions of each regression and also the geographically weighted alternatives with only one scale, since they are all particular cases of the multiscale approach. Hanchen Yu (2024). "Exploring Multiscale Geographically Weighted Negative Binomial Regression", Annals of the American Association of Geographers <doi:10.1080/24694452.2023.2289986>. Fotheringham AS, Yang W, Kang W (2017). "Multiscale Geographically Weighted Regression (MGWR)", Annals of the American Association of Geographers <doi:10.1080/24694452.2017.1352480>. Da Silva AR, Rodrigues TCV (2014). "Geographically Weighted Negative Binomial Regression - incorporating overdispersion", Statistics and Computing <doi:10.1007/s11222-013-9401-9>.
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
This package implements methods for variable selection in linear regression based on the "Sum of Single Effects" (SuSiE
) model, as described in Wang et al (2020) <DOI:10.1101/501114> and Zou et al (2021) <DOI:10.1101/2021.11.03.467167>. These methods provide simple summaries, called "Credible Sets", for accurately quantifying uncertainty in which variables should be selected. The methods are motivated by genetic fine-mapping applications, and are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse. The fitting algorithm, a Bayesian analogue of stepwise selection methods called "Iterative Bayesian Stepwise Selection" (IBSS), is simple and fast, allowing the SuSiE
model be fit to large data sets (thousands of samples and hundreds of thousands of variables).