An embedded proximal interior point quadratic programming solver, which can solve dense and sparse quadratic programs, described in Schwan, Jiang, Kuhn, and Jones (2023) <doi:10.48550/arXiv.2304.00290>. Combining an infeasible interior point method with the proximal method of multipliers, the algorithm can handle ill-conditioned convex quadratic programming problems without the need for linear independence of the constraints. The solver is written in header only C++ 14 leveraging the Eigen library for vectorized linear algebra. For small dense problems, vectorized instructions and cache locality can be exploited more efficiently. Allocation free problem updates and re-solves are also provided.
The detection of troubling approximate collinearity in a multiple linear regression model is a classical problem in Econometrics. This package is focused on determining whether or not the degree of approximate multicollinearity in a multiple linear regression model is of concern, meaning that it affects the statistical analysis (i.e. individual significance tests) of the model. This objective is achieved by using the variance inflation factor redefined and the scatterplot between the variance inflation factor and the coefficient of variation. For more details see Salmerón R., Garcà a C.B. and Garcà a J. (2018) <doi:10.1080/00949655.2018.1463376>, Salmerón, R., Rodrà guez, A. and Garcà a C. (2020) <doi:10.1007/s00180-019-00922-x>, Salmerón, R., Garcà a, C.B, Rodrà guez, A. and Garcà a, C. (2022) <doi:10.32614/RJ-2023-010>, Salmerón, R., Garcà a, C.B. and Garcà a, J. (2025) <doi:10.1007/s10614-024-10575-8> and Salmerón, R., Garcà a, C.B, Garcà a J. (2023, working paper) <doi:10.48550/arXiv.2005.02245>. You can also view the package vignette using browseVignettes("rvif")', the package website (<https://www.ugr.es/local/romansg/rvif/index.html>) using browseURL(system.file("docs/index.html", package = "rvif")) or version control on GitHub (<https://github.com/rnoremlas/rvif_package>).
This package implements algorithms for calculating microarray enrichment (ACME), and it is a set of tools for analysing tiling array of combined chromatin immunoprecipitation with DNA microarray (ChIP/chip), DNAse hypersensitivity, or other experiments that result in regions of the genome showing enrichment. It does not rely on a specific array technology (although the array should be a tiling array), is very general (can be applied in experiments resulting in regions of enrichment), and is very insensitive to array noise or normalization methods. It is also very fast and can be applied on whole-genome tiling array experiments quite easily with enough memory.
The purpose of this package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments. Gene expression is measured in counts of transcripts and modeled with the Negative Binomial (NB) distribution using a shrinkage approach for dispersion estimation. The method of moment (MM) estimates for dispersion are shrunk towards an estimated target, which minimizes the average squared difference between the shrinkage estimates and the initial estimates. The exact per-gene probability under the NB model is calculated, and used to test the hypothesis that the expected expression of a gene in two conditions identically follow a NB distribution.
Estimate, assess, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures, including estimation techniques such as partial least squares path modeling (PLS-PM) and its derivatives (PLSc, ordPLSc, robustPLSc), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA), principal component analysis (PCA), factor score regression (FSR) using sum score, regression or Bartlett scores (including bias correction using Croonâ s approach), as well as several tests and typical postestimation procedures (e.g., verify admissibility of the estimates, assess the model fit, test the model fit etc.).
Providing six different algorithms that can be used to split the available data into training, test and validation subsets with similar distribution for hydrological model developments. The dataSplit() function will help you divide the data according to specific requirements, and you can refer to the par.default() function to set the parameters for data splitting. The getAUC() function will help you measure the similarity of distribution features between the data subsets. For more information about the data splitting algorithms, please refer to: Chen et al. (2022) <doi:10.1016/j.jhydrol.2022.128340>, Zheng et al. (2022) <doi:10.1029/2021WR031818>.
Specify, solve, and estimate dynamic stochastic general equilibrium (DSGE) models by maximum likelihood and Bayesian methods. Supports both linear models via an equation-based formula interface and nonlinear models via string-based equations with first-order perturbation (linearization around deterministic steady state). Solution uses the method of undetermined coefficients (Klein, 2000 <doi:10.1016/S0165-1889(99)00045-7>). Likelihood evaluated via the Kalman filter. Bayesian estimation uses adaptive Random-Walk Metropolis-Hastings with prior specification. Additional tools include Kalman smoothing, historical shock decomposition, local identification diagnostics, parameter sensitivity analysis, second-order perturbation, occasionally binding constraints, impulse-response functions, forecasting, and robust standard errors.
It implements many univariate and multivariate permutation (and rotation) tests. Allowed tests: the t one and two samples, ANOVA, linear models, Chi Squared test, rank tests (i.e. Wilcoxon, Mann-Whitney, Kruskal-Wallis), Sign test and Mc Nemar. Test on Linear Models are performed also in presence of covariates (i.e. nuisance parameters). The permutation and the rotation methods to get the null distribution of the test statistics are available. It also implements methods for multiplicity control such as Westfall & Young minP procedure and Closed Testing (Marcus, 1976) and k-FWER. Moreover, it allows to test for fixed effects in mixed effects models.
This package provides Gaussian process (GP) regression tools for social science inference problems. GPs combine flexible nonparametric regression with principled uncertainty quantification: rather than committing to a single model fit, the posterior reflects lesser knowledge at the edge of or beyond the observed data, where other approaches become highly model-dependent. The package reduces user-chosen hyperparameters from three to zero and supplies convenience functions for regression discontinuity (gp_rdd()), interrupted time-series (gp_its()), and general GP fitting (gpss(), gp_train(), gp_predict()). Methods are described in Cho, Kim, and Hazlett (2026) <doi:10.1017/pan.2026.10032>.
Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).
Climate-sensitive, single-tree forest simulator based on data-driven machine learning. It simulates the main forest processesâ radial growth, height growth, mortality, crown recession, regeneration, and harvestingâ so users can assess stand development under climate and management scenarios. The height model is described by Skudnik and JevÅ¡enak (2022) <doi:10.1016/j.foreco.2022.120017>, the basal-area increment model by JevÅ¡enak and Skudnik (2021) <doi:10.1016/j.foreco.2020.118601>, and an overview of the MLFS package, workflow, and applications is provided by JevÅ¡enak, ArniÄ , Krajnc, and Skudnik (2023), Ecological Informatics <doi:10.1016/j.ecoinf.2023.102115>.
This package provides functionality to process text files created by Emacs Org mode, and decompose the content to the smallest components (headlines, body, tag, clock entries etc). Emacs is an extensible, customizable text editor and Org mode is for keeping notes, maintaining TODO lists, planning projects. Allows users to analyze org files as data frames in R, e.g., to convieniently group tasks by tag into project and calculate total working hours. Also provides some help functions like search.parent, gg.pie (visualise working hours in ggplot2) and tree.headlines (visualise headline stricture in tree format) to help user managing their complex org files.
This package provides a comprehensive framework for planning and executing analyses in R. It provides a structured approach to running the same function multiple times with different arguments, executing multiple functions on the same datasets, and creating systematic analyses across multiple strata or variables. The framework is particularly useful for applying the same analysis across multiple strata (e.g., locations, age groups), running statistical methods on multiple variables (e.g., exposures, outcomes), generating multiple tables or graphs for reports, and creating systematic surveillance analyses. Key features include efficient data management, structured analysis planning, flexible execution options, built-in debugging tools, and hash-based caching.
Set of tools to find coherent patterns in gene expression (microarray) data using a Bayesian Sparse Latent Factor Model (SLFM) <DOI:10.1007/978-3-319-12454-4_15>. Considerable effort has been put to build a fast and memory efficient package, which makes this proposal an interesting and computationally convenient alternative to study patterns of gene expressions exhibited in matrices. The package contains the implementation of two versions of the model based on different mixture priors for the loadings: one relies on a degenerate component at zero and the other uses a small variance normal distribution for the spike part of the mixture.
This package implements the Vine Copula Change Point (VCCP) methodology for the estimation of the number and location of multiple change points in the vine copula structure of multivariate time series. The method uses vine copulas, various state-of-the-art segmentation methods to identify multiple change points, and a likelihood ratio test or the stationary bootstrap for inference. The vine copulas allow for various forms of dependence between time series including tail, symmetric and asymmetric dependence. The functions have been extensively tested on simulated multivariate time series data and fMRI data. For details on the VCCP methodology, please see Xiong & Cribben (2021).
In order to achieve accurate estimation without sparsity assumption on the precision matrix, element-wise inference on the precision matrix, and joint estimation of multiple Gaussian graphical models, a novel method is proposed and efficient algorithm is implemented. FLAG() is the main function given a data matrix, and FlagOneEdge() will be used when one pair of random variables are interested where their indices should be given. Flexible and Accurate Methods for Estimation and Inference of Gaussian Graphical Models with Applications, see Qian Y (2023) <doi:10.14711/thesis-991013223054603412>, Qian Y, Hu X, Yang C (2023) <doi:10.48550/arXiv.2306.17584>.
This package provides a collection of string functions designed for writing compact and expressive R code. yasp (Yet Another String Package) is simple, fast, dependency-free, and written in pure R. The package provides: a coherent set of abbreviations for paste() from package base with a variety of defaults, such as p() for "paste" and pcc() for "paste and collapse with commas"; wrap(), bracket(), and others for wrapping a string in flanking characters; unwrap() for removing pairs of characters (at any position in a string); and sentence() for cleaning whitespace around punctuation and capitalization appropriate for prose sentences.
This package provides methods for manipulating regression models and for describing these in a style adapted for medical journals. It contains functions for generating an HTML table with crude and adjusted estimates, plotting hazard ratio, plotting model estimates and confidence intervals using forest plots, extending this to comparing multiple models in a single forest plots. In addition to the descriptive methods, there are functions for the robust covariance matrix provided by the sandwich package, a function for adding non-linearities to a model, and a wrapper around the Epi package's Lexis() functions for time-splitting a dataset when modeling non-proportional hazards in Cox regressions.
Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.
Statistical methods and related graphical representations for the Desirability of Outcome Ranking (DOOR) methodology. The DOOR is a paradigm for the design, analysis, interpretation of clinical trials and other research studies based on the patient centric benefit risk evaluation. The package provides functions for generating summary statistics from individual level/summary level datasets, conduct DOOR probability-based inference, and visualization of the results. For more details of DOOR methodology, see Hamasaki and Evans (2025) <doi:10.1201/9781003390855>. For more explanation of the statistical methods and the graphics, see the technical document and user manual of the DOOR Shiny apps at <https://methods.bsc.gwu.edu>.
Computes characteristics of independent rainfall events (duration, total rainfall depth, and intensity) extracted from a sub-daily rainfall time series based on the inter-event time definition (IETD) method. To have a reference value of IETD, it also analyzes/computes IETD values through three methods: autocorrelation analysis, the average annual number of events analysis, and coefficient of variation analysis. Ideal for analyzing the sensitivity of IETD to characteristics of independent rainfall events. Adams B, Papa F (2000) <ISBN: 978-0-471-33217-6>. Joo J et al. (2014) <doi:10.3390/w6010045>. Restrepo-Posada P, Eagleson P (1982) <doi:10.1016/0022-1694(82)90136-6>.
This package provides a general framework of two directional simultaneous inference is provided for high-dimensional as well as the fixed dimensional models with manifest variable or latent variable structure, such as high-dimensional mean models, high- dimensional sparse regression models, and high-dimensional latent factors models. It is making the simultaneous inference on a set of parameters from two directions, one is testing whether the estimated zero parameters indeed are zero and the other is testing whether there exists zero in the parameter set of non-zero. More details can be referred to Wei Liu, et al. (2022) <doi:10.48550/arXiv.2012.11100>.
The package obtains parameter estimation, i.e., maximum likelihood estimators (MLE), via the Expectation-Maximization (EM) algorithm for the Finite Mixture of Regression (FMR) models with Normal distribution, and MLE for the Finite Mixture of Accelerated Failure Time Regression (FMAFTR) subject to right censoring with Log-Normal and Weibull distributions via the EM algorithm and the Newton-Raphson algorithm (for Weibull distribution). More importantly, the package obtains the maximum penalized likelihood (MPLE) for both FMR and FMAFTR models (collectively called FMRs). A component-wise tuning parameter selection based on a component-wise BIC is implemented in the package. Furthermore, this package provides Ridge Regression and Elastic Net.
This package provides tools for the analysis of interval-valued data, including construction, visualization, and statistical modeling. The package provides the intData class for representing interval-valued data, along with functions to aggregate microdata and to estimate parameters of latent distributions. Barycenter and covariance matrix estimation is implemented based on the Mallows distance (Oliveira et al. (2025) <doi:10.48550/arXiv.2407.05105>). Robust estimation of the symbolic covariance matrix is implemented via the Interval Minimum Covariance Determinant (IMCD) estimator, enabling outlier detection based on the robust squared Interval-Mahalanobis distance, as proposed by Loureiro et al. (2026) <doi:10.48550/arXiv.2604.26769>.