Implementation for Kendall functional principal component analysis. Kendall functional principal component analysis is a robust functional principal component analysis technique for non-Gaussian functional/longitudinal data. The crucial function of this package is KFPCA()
and KFPCA_reg()
. Moreover, least square estimates of functional principal component scores are also provided. Refer to Rou Zhong, Shishi Liu, Haocheng Li, Jingxiao Zhang. (2021) <arXiv:2102.01286>
. Rou Zhong, Shishi Liu, Haocheng Li, Jingxiao Zhang. (2021) <doi:10.1016/j.jmva.2021.104864>.
We introduce a generalized factor model designed to jointly analyze high-dimensional multi-modality data from multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among modality variables with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors. More details can be referred to Liu et al. (2025) <doi:10.48550/arXiv.2507.09889>
.
The n-vector framework uses the normal vector to the Earth ellipsoid (called n-vector) as a non-singular position representation that turns out to be very convenient for practical position calculations. The n-vector is simple to use and gives exact answers for all global positions, and all distances, for both ellipsoidal and spherical Earth models. This package is a translation of the Matlab library from FFI, the Norwegian Defence Research Establishment, as described in Gade (2010) <doi:10.1017/S0373463309990415>.
Processes data from Molecular Dynamics simulations using Self Organising Maps. Features include the ability to read different input formats. Trajectories can be analysed to identify groups of important frames. Output visualisation can be generated for maps and pathways. Methodological details can be found in Motta S et al (2022) <doi:10.1021/acs.jctc.1c01163>. I/O functions for xtc format files were implemented using the xdrfile library available under open source license. The relevant information can be found in inst/COPYRIGHT.
Bio-Layer Interferometry (BLI) is a technology to determine the binding kinetics between biomolecules. BLI signals are small and noisy when small molecules are investigated as ligands (analytes). We develop this package to process and analyze the BLI data acquired on Octet Red96 from Fortebio more accurately. Sun Q., Li X., et al (2020) <doi:10.1038/s41467-019-14238-3>. In this new version, we organize the BLI experiment data and analysis methods into a S4 class with self-explaining structure.
Identifying spatially variable genes is critical in linking molecular cell functions with tissue phenotypes. This package utilizes a granularity-based dimension-agnostic tool, single-cell big-small patch (scBSP
), implementing sparse matrix operation and KD tree methods for distance calculation, for the identification of spatially variable genes on large-scale data. The detailed description of this method is available at Wang, J. and Li, J. et al. 2023 (Wang, J. and Li, J. (2023), <doi:10.1038/s41467-023-43256-5>).
Combining Predictive Analytics and Experimental Design to Optimize Results. To be utilized to select a test data calibrated training population in high dimensional prediction problems and assumes that the explanatory variables are observed for all of the individuals. Once a "good" training set is identified, the response variable can be obtained only for this set to build a model for predicting the response in the test set. The algorithms in the package can be tweaked to solve some other subset selection problems.
This package provides functions that automate accessing, downloading and exploring Soil Moisture and Ocean Salinity (SMOS) Level 4 (L4) data developed by Barcelona Expert Center (BEC). Particularly, it includes functions to search for, acquire, extract, and plot BEC-SMOS L4 soil moisture data downscaled to ~1 km spatial resolution. Note that SMOS is one of Earth Explorer Opportunity missions by the European Space Agency (ESA). More information about SMOS products can be found at <https://earth.esa.int/eogateway/missions/smos/data>.
SGSeq is a package for analyzing splice events from RNA-seq data. Input data are RNA-seq reads mapped to a reference genome in BAM format. Genes are represented as a splice graph, which can be obtained from existing annotation or predicted from the mapped sequence reads. Splice events are identified from the graph and are quantified locally using structurally compatible reads at the start or end of each splice variant. The software includes functions for splice event prediction, quantification, visualization and interpretation.
This package wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. Conos focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes.
The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.
RStudio is an integrated development environment (IDE) for the R programming language. Some of its features include: Customizable workbench with all of the tools required to work with R in one place (console, source, plots, workspace, help, history, etc.); syntax highlighting editor with code completion; execute code directly from the source editor (line, selection, or file); full support for authoring Sweave and TeX documents. RStudio can also be run as a server, enabling multiple users to access the RStudio IDE using a web browser.
This package provides a comprehensive set of functions designed for multivariate mean monitoring using the Critical-to-X Control Chart. These functions enable the determination of optimal control limits based on a specified in-control Average Run Length (ARL), the calculation of out-of-control ARL for a given control limit, and post-signal analysis to identify the specific variable responsible for a detected shift in the mean. This suite of tools provides robust support for precise and effective process monitoring and analysis.
Simulates a population under the Fisher-Wright model (fixed or stochastic population size) with a one-step neutral mutation process (stepwise mutation model, logistic mutation model and exponential mutation model supported). The stochastic population sizes are random Poisson distributed and different kinds of population growth are supported. For the stepwise mutation model, it is possible to specify locus and direction specific mutation rate (in terms of upwards and downwards mutation rate). Intermediate generations can be saved in order to study e.g. drift.
Kernel regularized least squares, also known as kernel ridge regression, is a flexible machine learning method. This package implements this method by providing a smooth term for use with mgcv and uses random sketching to facilitate scalable estimation on large datasets. It provides additional functions for calculating marginal effects after estimation and for use with ensembles ('SuperLearning
'), double/debiased machine learning ('DoubleML
'), and robust/clustered standard errors ('sandwich'). Chang and Goplerud (2024) <doi:10.1017/pan.2023.27> provide further details.
Allows for the computation of mSHAP
values on two-part models as proposed by Matthews, S. and Hartman, B. (2021) <arXiv:2106.08990>
. Also contains functions for simple plotting of the results (or any SHAP values). For information about the TreeSHAP
algorithm that mSHAP
builds on, see Lundberg, S.M., Erion, G., Chen, H., DeGrave
, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.I. (2020) <doi:10.1038/s42256-019-0138-9>.
Generates efficient balanced non-aliased multi-level k-circulant supersaturated designs by interchanging the elements of the generator vector. Attempts to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). Displays the progress of generation of an efficient multi-level k-circulant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
This package performs multivariate nonparametric regression/classification by the method of sieves (using orthogonal basis). The method is suitable for moderate high-dimensional features (dimension < 100). The l1-penalized sieve estimator, a nonparametric generalization of Lasso, is adaptive to the feature dimension with provable theoretical guarantees. We also include a nonparametric stochastic gradient descent estimator, Sieve-SGD, for online or large scale batch problems. Details of the methods can be found in: <arXiv:2206.02994>
<arXiv:2104.00846><arXiv:2310.12140>
.
Parsing (R)Markdown files with numerous regular expressions can be fraught with peril, but it does not have to be this way. Converting (R)Markdown files to XML using the commonmark package allows in-memory editing via of markdown elements via XPath through the extensible R6 class called yarn'. These modified XML representations can be written to (R)Markdown documents via an xslt stylesheet which implements an extended version of GitHub'-flavoured
markdown so that you can tinker to your hearts content.
This package provides a framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the lda package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData
package, which is available on gitHub
: <https://github.com/Docma-TU/toscaData>
.
Define and use graphical elements of corporate design manuals in R. The unikn package provides color functions (by defining dedicated colors and color palettes, and commands for finding, changing, viewing, and using them) and styled text elements (e.g., for marking, underlining, or plotting colored titles). The pre-defined range of colors and text decoration functions is based on the corporate design of the University of Konstanz <https://www.uni-konstanz.de/>, but can be adapted and extended for other purposes or institutions.
The smurf
package contains the implementation of the Sparse Multi-type Regularized Feature (SMuRF) modeling algorithm to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood. Next to the fitting procedure, following functionality is available:
Selection of the regularization tuning parameter lambda using three different approaches: in-sample, out-of-sample or using cross-validation.
S3 methods to handle the fitted object including visualization of the coefficients and a model summary.
dplyr is the next iteration of plyr. It is focused on tools for working with data frames. It has three main goals: 1) identify the most important data manipulation tools needed for data analysis and make them easy to use in R; 2) provide fast performance for in-memory data by writing key pieces of code in C++; 3) use the same code interface to work with data no matter where it is stored, whether in a data frame, a data table or database.
The cmgnd implements the constrained mixture of generalized normal distributions model, a flexible statistical framework for modelling univariate data exhibiting non-normal features such as skewness, multi-modality, and heavy tails. By imposing constraints on model parameters, the cmgnd reduces estimation complexity while maintaining high descriptive power, offering an efficient solution in the presence of distributional irregularities. For more details see Duttilo and Gattone (2025) <doi:10.1007/s00180-025-01638-x> and Duttilo et al (2025) <doi:10.48550/arXiv.2506.03285>
.