This package implements the Fixed Effect Jackknife Instrumental Variables ('FEJIV') estimator of Chao, Swanson, and Woutersen (2023) <doi:10.1016/j.jeconom.2022.12.011>, allowing consistent IV estimation with many (possibly weak) instruments, cluster fixed effects, heteroskedastic errors, and many exogenous covariates. The estimator is recommended by SÅ oczyÅ ski (2024) <doi:10.48550/arXiv.2011.06695> as an alternative to two-stage least squares when estimating the interacted specification of Angrist and Imbens (1995) <doi:10.1080/01621459.1995.10476535>.
Kernel regularized least squares, also known as kernel ridge regression, is a flexible machine learning method. This package implements this method by providing a smooth term for use with mgcv and uses random sketching to facilitate scalable estimation on large datasets. It provides additional functions for calculating marginal effects after estimation and for use with ensembles ('SuperLearning'), double/debiased machine learning ('DoubleML'), and robust/clustered standard errors ('sandwich'). Chang and Goplerud (2024) <doi:10.1017/pan.2023.27> provide further details.
This package implements methods for analyzing latent variable models with measurement error correction, including Item Response Theory (IRT) models. Provides tools for various correction methods such as Bayesian Markov Chain Monte Carlo (MCMC), over-imputation, bootstrapping for robust standard errors, Ordinary Least Squares (OLS), and Instrumental Variables (IV) based approaches. Supports flexible specification of observable indicators and groupings for latent variable analyses in social sciences and other fields. Methods are described in a working paper (2025) <doi:10.48550/arXiv.2507.22218>.
Generates efficient balanced non-aliased multi-level k-circulant supersaturated designs by interchanging the elements of the generator vector. Attempts to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). Displays the progress of generation of an efficient multi-level k-circulant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
This package performs multivariate nonparametric regression/classification by the method of sieves (using orthogonal basis). The method is suitable for moderate high-dimensional features (dimension < 100). The l1-penalized sieve estimator, a nonparametric generalization of Lasso, is adaptive to the feature dimension with provable theoretical guarantees. We also include a nonparametric stochastic gradient descent estimator, Sieve-SGD, for online or large scale batch problems. Details of the methods can be found in: <arXiv:2206.02994> <arXiv:2104.00846><arXiv:2310.12140>.
Parsing (R)Markdown files with numerous regular expressions can be fraught with peril, but it does not have to be this way. Converting (R)Markdown files to XML using the commonmark package allows in-memory editing via of markdown elements via XPath through the extensible R6 class called yarn'. These modified XML representations can be written to (R)Markdown documents via an xslt stylesheet which implements an extended version of GitHub'-flavoured markdown so that you can tinker to your hearts content.
This package provides a framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the lda package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.
Define and use graphical elements of corporate design manuals in R. The unikn package provides color functions (by defining dedicated colors and color palettes, and commands for finding, changing, viewing, and using them) and styled text elements (e.g., for marking, underlining, or plotting colored titles). The pre-defined range of colors and text decoration functions is based on the corporate design of the University of Konstanz <https://www.uni-konstanz.de/>, but can be adapted and extended for other purposes or institutions.
The xtdml package implements partially linear panel regression (PLPR) models with high-dimensional confounding variables and an exogenous treatment variable within the double machine learning framework. The package is used to estimate the structural parameter (treatment effect) in static panel data models with fixed effects using the approaches established in Clarke and Polselli (2025) <doi:10.1093/ectj/utaf011>. xtdml is built on the object-oriented package DoubleML (Bach et al., 2024) <doi:10.18637/jss.v108.i03> using the mlr3 ecosystem.
RStudio is an integrated development environment (IDE) for the R programming language. Some of its features include: Customizable workbench with all of the tools required to work with R in one place (console, source, plots, workspace, help, history, etc.); syntax highlighting editor with code completion; execute code directly from the source editor (line, selection, or file); full support for authoring Sweave and TeX documents. RStudio can also be run as a server, enabling multiple users to access the RStudio IDE using a web browser.
The objective of AGDEX is to evaluate whether the results of a pair of two-group differential expression analysis comparisons show a level of agreement that is greater than expected if the group labels for each two-group comparison are randomly assigned. The agreement is evaluated for the entire transcriptome and (optionally) for a collection of pre-defined gene-sets. Additionally, the procedure performs permutation-based differential expression and meta analysis at both gene and gene-set levels of the data from each experiment.
This package provides a normalization and copy number variation calling procedure for whole exome DNA sequencing data. CODEX relies on the availability of multiple samples processed using the same sequencing pipeline for normalization, and does not require matched controls. The normalization model in CODEX includes terms that specifically remove biases due to GC content, exon length and targeting and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data.
The cmgnd implements the constrained mixture of generalized normal distributions model, a flexible statistical framework for modelling univariate data exhibiting non-normal features such as skewness, multi-modality, and heavy tails. By imposing constraints on model parameters, the cmgnd reduces estimation complexity while maintaining high descriptive power, offering an efficient solution in the presence of distributional irregularities. For more details see Duttilo and Gattone (2025) <doi:10.1007/s00180-025-01638-x> and Duttilo et al (2025) <doi:10.48550/arXiv.2506.03285>.
Interactive tools to explore topographic-like data sets. Such data sets take the form of a matrix in which the rows and columns provide location/frequency information, and the matrix elements contain altitude/response information. Such data is found in cartography, 2D spectroscopy and chemometrics. The functions in this package create interactive web pages showing the contoured data, possibly with slices from the original matrix parallel to each dimension. The interactive behavior is created using the D3.js JavaScript library by Mike Bostock.
Implementation of Energy Trees, a statistical model to perform classification and regression with structured and mixed-type data. The model has a similar structure to Conditional Trees, but brings in Energy Statistics to test independence between variables that are possibly structured and of different nature. Currently, the package covers functions and graphs as structured covariates. It builds upon partykit to provide functionalities for fitting, printing, plotting, and predicting with Energy Trees. Energy Trees are described in Giubilei et al. (2022) <arXiv:2207.04430>.
This package implements readers and writers for file formats associated with genetics data. Reading and writing Plink BED/BIM/FAM and GCTA binary GRM formats is fully supported, including a lightning-fast BED reader and writer implementations. Other functions are readr wrappers that are more constrained, user-friendly, and efficient for these particular applications; handles Plink and Eigenstrat tables (FAM, BIM, IND, and SNP files). There are also make functions for FAM and BIM tables with default values to go with simulated genotype data.
This package provides a functional programming based implementation of the super learner algorithm with an emphasis on supporting the use of formulas to specify learners. This approach offers several improvements compared to past implementations including the ability to easily use random-effects specified in formulas (like y ~ (age | strata) + ...) and construction of new learners is as simple as writing and passing a new function. The super learner algorithm was originally described in van der Laan et al. (2007) <https://biostats.bepress.com/ucbbiostat/paper222/>.
This package provides functions for the analysis of occupational and environmental data with non-detects. Maximum likelihood (ML) methods for censored log-normal data and non-parametric methods based on the product limit estimate (PLE) for left censored data are used to calculate all of the statistics recommended by the American Industrial Hygiene Association (AIHA) for the complete data case. Functions for the analysis of complete samples using exact methods are also provided for the lognormal model. Revised from 2007-11-05 survfit~1'.
XBSeq is a novel algorithm for testing RNA-seq differential expression (DE), where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measurable observed signal and background noise from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes.
The AnVIL is a cloud computing resource developed in part by the National Human Genome Research Institute. The AnVIL package provides end-user and developer functionality. AnVIL provides fast binary package installation, utilities for working with Terra/AnVIL table and data resources, and convenient functions for file movement to and from Google cloud storage. For developers, AnVIL provides programmatic access to the Terra, Leonardo, Rawls, Dockstore, and Gen3 RESTful programming interface, including helper functions to transform JSON responses to formats more amenable to manipulation in R.
This package provides various R programming tools for data manipulation, including:
medical unit conversions
combining objects
character vector operations
factor manipulation
obtaining information about R objects
generating fixed-width format files
extricating components of date and time objects
operations on columns of data frames
matrix operations
operations on vectors and data frames
value of last evaluated expression
wrapper for
samplethat ensures consistent behavior for both scalar and vector arguments
This package provides the functions for planning and conducting a clinical trial with adaptive sample size determination. Maximal statistical efficiency will be exploited even when dramatic or multiple adaptations are made. Such a trial consists of adaptive determination of sample size at an interim analysis and implementation of frequentist statistical test at the interim and final analysis with a prefixed significance level. The required assumptions for the stage-wise test statistics are independent and stationary increments and normality. Predetermination of adaptation rule is not required.
Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification. Based on the models and algorithms described in Durbin et al (1998, ISBN: 9780521629713).
This package implements a likelihood-based method for genome polarization, identifying which alleles of SNV markers belong to either side of a barrier to gene flow. The approach co-estimates individual assignment, barrier strength, and divergence between sides, with direct application to studies of hybridization. Includes VCF-to-diem conversion and input checks, support for mixed ploidy and parallelization, and tools for visualization and diagnostic outputs. Based on diagnostic index expectation maximization as described in Baird et al. (2023) <doi:10.1111/2041-210X.14010>.