Implementation for Kendall functional principal component analysis. Kendall functional principal component analysis is a robust functional principal component analysis technique for non-Gaussian functional/longitudinal data. The crucial function of this package is KFPCA() and KFPCA_reg(). Moreover, least square estimates of functional principal component scores are also provided. Refer to Rou Zhong, Shishi Liu, Haocheng Li, Jingxiao Zhang. (2021) <arXiv:2102.01286>. Rou Zhong, Shishi Liu, Haocheng Li, Jingxiao Zhang. (2021) <doi:10.1016/j.jmva.2021.104864>.
We introduce a generalized factor model designed to jointly analyze high-dimensional multi-modality data from multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among modality variables with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors. More details can be referred to Liu et al. (2025) <doi:10.48550/arXiv.2507.09889>.
The n-vector framework uses the normal vector to the Earth ellipsoid (called n-vector) as a non-singular position representation that turns out to be very convenient for practical position calculations. The n-vector is simple to use and gives exact answers for all global positions, and all distances, for both ellipsoidal and spherical Earth models. This package is a translation of the Matlab library from FFI, the Norwegian Defence Research Establishment, as described in Gade (2010) <doi:10.1017/S0373463309990415>.
Identifying spatially variable genes is critical in linking molecular cell functions with tissue phenotypes. This package utilizes a granularity-based dimension-agnostic tool, single-cell big-small patch (scBSP), implementing sparse matrix operation and KD tree methods for distance calculation, for the identification of spatially variable genes on large-scale data. The detailed description of this method is available at Wang, J. and Li, J. et al. 2023 (Wang, J. and Li, J. (2023), <doi:10.1038/s41467-023-43256-5>).
This package provides functions that automate accessing, downloading and exploring Soil Moisture and Ocean Salinity (SMOS) Level 4 (L4) data developed by Barcelona Expert Center (BEC). Particularly, it includes functions to search for, acquire, extract, and plot BEC-SMOS L4 soil moisture data downscaled to ~1 km spatial resolution. Note that SMOS is one of Earth Explorer Opportunity missions by the European Space Agency (ESA). More information about SMOS products can be found at <https://earth.esa.int/eogateway/missions/smos/data>.
Bio-Layer Interferometry (BLI) is a technology to determine the binding kinetics between biomolecules. BLI signals are small and noisy when small molecules are investigated as ligands (analytes). We develop this package to process and analyze the BLI data acquired on Octet Red96 from Fortebio more accurately. Sun Q., Li X., et al (2020) <doi:10.1038/s41467-019-14238-3>. In this new version, we organize the BLI experiment data and analysis methods into a S4 class with self-explaining structure.
Processes data from Molecular Dynamics simulations using Self Organising Maps. Features include the ability to read different input formats. Trajectories can be analysed to identify groups of important frames. Output visualisation can be generated for maps and pathways. Methodological details can be found in Motta S et al (2022) <doi:10.1021/acs.jctc.1c01163>. I/O functions for xtc format files were implemented using the xdrfile library available under open source license. The relevant information can be found in inst/COPYRIGHT.
Combining Predictive Analytics and Experimental Design to Optimize Results. To be utilized to select a test data calibrated training population in high dimensional prediction problems and assumes that the explanatory variables are observed for all of the individuals. Once a "good" training set is identified, the response variable can be obtained only for this set to build a model for predicting the response in the test set. The algorithms in the package can be tweaked to solve some other subset selection problems.
dplyr is the next iteration of plyr. It is focused on tools for working with data frames. It has three main goals: 1) identify the most important data manipulation tools needed for data analysis and make them easy to use in R; 2) provide fast performance for in-memory data by writing key pieces of code in C++; 3) use the same code interface to work with data no matter where it is stored, whether in a data frame, a data table or database.
The smurf package contains the implementation of the Sparse Multi-type Regularized Feature (SMuRF) modeling algorithm to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood. Next to the fitting procedure, following functionality is available:
Selection of the regularization tuning parameter lambda using three different approaches: in-sample, out-of-sample or using cross-validation.
S3 methods to handle the fitted object including visualization of the coefficients and a model summary.
Iteratively Adjusted Surrogate Variable Analysis (IA-SVA) is a statistical framework to uncover hidden sources of variation even when these sources are correlated. IA-SVA provides a flexible methodology to i) identify a hidden factor for unwanted heterogeneity while adjusting for all known factors; ii) test the significance of the putative hidden factor for explaining the unmodeled variation in the data; and iii), if significant, use the estimated factor as an additional known factor in the next iteration to uncover further hidden factors.
This package provides a comprehensive set of functions designed for multivariate mean monitoring using the Critical-to-X Control Chart. These functions enable the determination of optimal control limits based on a specified in-control Average Run Length (ARL), the calculation of out-of-control ARL for a given control limit, and post-signal analysis to identify the specific variable responsible for a detected shift in the mean. This suite of tools provides robust support for precise and effective process monitoring and analysis.
This package implements the Fixed Effect Jackknife Instrumental Variables ('FEJIV') estimator of Chao, Swanson, and Woutersen (2023) <doi:10.1016/j.jeconom.2022.12.011>, allowing consistent IV estimation with many (possibly weak) instruments, cluster fixed effects, heteroskedastic errors, and many exogenous covariates. The estimator is recommended by SÅ oczyÅ ski (2024) <doi:10.48550/arXiv.2011.06695> as an alternative to two-stage least squares when estimating the interacted specification of Angrist and Imbens (1995) <doi:10.1080/01621459.1995.10476535>.
Kernel regularized least squares, also known as kernel ridge regression, is a flexible machine learning method. This package implements this method by providing a smooth term for use with mgcv and uses random sketching to facilitate scalable estimation on large datasets. It provides additional functions for calculating marginal effects after estimation and for use with ensembles ('SuperLearning'), double/debiased machine learning ('DoubleML'), and robust/clustered standard errors ('sandwich'). Chang and Goplerud (2024) <doi:10.1017/pan.2023.27> provide further details.
This package implements methods for analyzing latent variable models with measurement error correction, including Item Response Theory (IRT) models. Provides tools for various correction methods such as Bayesian Markov Chain Monte Carlo (MCMC), over-imputation, bootstrapping for robust standard errors, Ordinary Least Squares (OLS), and Instrumental Variables (IV) based approaches. Supports flexible specification of observable indicators and groupings for latent variable analyses in social sciences and other fields. Methods are described in a working paper (2025) <doi:10.48550/arXiv.2507.22218>.
Generates efficient balanced non-aliased multi-level k-circulant supersaturated designs by interchanging the elements of the generator vector. Attempts to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). Displays the progress of generation of an efficient multi-level k-circulant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
This package performs multivariate nonparametric regression/classification by the method of sieves (using orthogonal basis). The method is suitable for moderate high-dimensional features (dimension < 100). The l1-penalized sieve estimator, a nonparametric generalization of Lasso, is adaptive to the feature dimension with provable theoretical guarantees. We also include a nonparametric stochastic gradient descent estimator, Sieve-SGD, for online or large scale batch problems. Details of the methods can be found in: <arXiv:2206.02994> <arXiv:2104.00846><arXiv:2310.12140>.
This package provides a framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the lda package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.
Parsing (R)Markdown files with numerous regular expressions can be fraught with peril, but it does not have to be this way. Converting (R)Markdown files to XML using the commonmark package allows in-memory editing via of markdown elements via XPath through the extensible R6 class called yarn'. These modified XML representations can be written to (R)Markdown documents via an xslt stylesheet which implements an extended version of GitHub'-flavoured markdown so that you can tinker to your hearts content.
Define and use graphical elements of corporate design manuals in R. The unikn package provides color functions (by defining dedicated colors and color palettes, and commands for finding, changing, viewing, and using them) and styled text elements (e.g., for marking, underlining, or plotting colored titles). The pre-defined range of colors and text decoration functions is based on the corporate design of the University of Konstanz <https://www.uni-konstanz.de/>, but can be adapted and extended for other purposes or institutions.
The xtdml package implements partially linear panel regression (PLPR) models with high-dimensional confounding variables and an exogenous treatment variable within the double machine learning framework. The package is used to estimate the structural parameter (treatment effect) in static panel data models with fixed effects using the approaches established in Clarke and Polselli (2025) <doi:10.1093/ectj/utaf011>. xtdml is built on the object-oriented package DoubleML (Bach et al., 2024) <doi:10.18637/jss.v108.i03> using the mlr3 ecosystem.
RStudio is an integrated development environment (IDE) for the R programming language. Some of its features include: Customizable workbench with all of the tools required to work with R in one place (console, source, plots, workspace, help, history, etc.); syntax highlighting editor with code completion; execute code directly from the source editor (line, selection, or file); full support for authoring Sweave and TeX documents. RStudio can also be run as a server, enabling multiple users to access the RStudio IDE using a web browser.
The objective of AGDEX is to evaluate whether the results of a pair of two-group differential expression analysis comparisons show a level of agreement that is greater than expected if the group labels for each two-group comparison are randomly assigned. The agreement is evaluated for the entire transcriptome and (optionally) for a collection of pre-defined gene-sets. Additionally, the procedure performs permutation-based differential expression and meta analysis at both gene and gene-set levels of the data from each experiment.
This package provides a normalization and copy number variation calling procedure for whole exome DNA sequencing data. CODEX relies on the availability of multiple samples processed using the same sequencing pipeline for normalization, and does not require matched controls. The normalization model in CODEX includes terms that specifically remove biases due to GC content, exon length and targeting and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data.