This package provides functions for testing phylogenetic niche conservatism, a key prerequisite in community assembly studies. The package integrates global functional trait data across major taxonomic groups and implements methods such as Pagel's Lambda and Blomberg's K to quantify phylogenetic signals in ecological communities. Methods are described in Münkemüller et al. (2012) <doi:10.1111/j.2041-210X.2012.00196.x>.
This package creates an S4 class "SSM" and defines functions for fitting smooth supersaturated models, a polynomial model with spline-like behaviour. Functions are defined for the computation of Sobol indices for sensitivity analysis and plotting the main effects using FANOVA methods. It also implements the estimation of the SSM metamodel error using a GP model with a variety of defined correlation functions.
The msa package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.
This package implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixed-membership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.
This package provides functions for fitting continuous-time Markov and hidden Markov multi-state models to longitudinal data. It was designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewise-constant in time.
The dks package consists of a set of diagnostic functions for multiple testing methods. The functions can be used to determine if the p-values produced by a multiple testing procedure are correct. These functions are designed to be applied to simulated data. The functions require the entire set of p-values from multiple simulated studies, so that the joint distribution can be evaluated.
Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the QFeatures package and relies on SingleCellExpirement to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.
This package provides a method for pattern discovery in weighted graphs as outlined in Thistlethwaite et al. (2021) <doi:10.1371/journal.pcbi.1008550>. Two use cases are achieved: 1) Given a weighted graph and a subset of its nodes, do the nodes show significant connectedness? 2) Given a weighted graph and two subsets of its nodes, are the subsets close neighbors or distant?
An interface to the Python InterpretML framework for fitting explainable boosting machines (EBMs); see Nori et al. (2019) <doi:10.48550/arXiv.1909.09223> for details. EBMs are a modern type of generalized additive model that use tree-based, cyclic gradient boosting with automatic interaction detection. They are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.
Higher-order latent trait theory (item response theory). We implement the generalized partial credit model with a second-order latent trait structure. Latent regression can be done on the second-order latent trait. For a pre-print of the methods, see, "Latent Regression in Higher-Order Item Response Theory with the R Package hlt" <https://mkleinsa.github.io/doc/hlt_proof_draft_brmic.pdf>.
The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included.
Fit Gaussian Multinomial mixed-effects models for small area estimation: Model 1, with one random effect in each category of the response variable (Lopez-Vizcaino,E. et al., 2013) <doi:10.1177/1471082X13478873>; Model 2, introducing independent time effect; Model 3, introducing correlated time effect. mme calculates direct and parametric bootstrap MSE estimators (Lopez-Vizcaino,E et al., 2014) <doi:10.1111/rssa.12085>.
Processing Chlorophyll Fluorescence & P700 Absorbance data generated by WALZ hardware. Four models are provided for the regression of Pi curves, which can be compared with each other in order to select the most suitable model for the data set. Control plots ensure the successful verification of each regression. Bundled output of alpha, ETRmax, Ik etc. enables fast and reliable further processing of the data.
This package provides functions to extract and handle commonly occurring principal phrases obtained from collections of texts. Major speed improvements - core functions rewritten in C++ for faster phrase-document parsing, clustering, and text distance computations. Based on, Small, E., & Cabrera, J. (2025). Principal phrase mining, an automated method for extracting meaningful phrases from text. International Journal of Computers and Applications, 47(1), 84â 92.
An extensive set of functions to perform Qualitative Comparative Analysis: crisp sets ('csQCA'), temporal ('tQCA'), multi-value ('mvQCA') and fuzzy sets ('fsQCA'), using a GUI - graphical user interface. QCA is a methodology that bridges the qualitative and quantitative divide in social science research. It uses a Boolean minimization algorithm, resulting in a minimal causal configuration associated with a given phenomenon.
The Simulation-based Sampling Protocol (SSP) is an R package designed to estimate sampling effort in studies of ecological communities. It is based on the concept of pseudo-multivariate standard error (MultSE) (Anderson & Santana-Garcon, 2015, <doi:10.1111/ele.12385>) and the simulation of ecological data. The theoretical background is described in Guerra-Castro et al. (2020, <doi:10.1111/ecog.05284>).
Transmission Ratio Distortion (TRD) is a genetic phenomenon where the two alleles from either parent are not transmitted to the offspring at the expected 1:1 ratio under Mendelian inheritance, leading to spurious signals in genetic association studies. Functions in this package are developed to account for this phenomenon using loglinear model and Transmission Disequilibrium Test (TDT). Some population information can also be calculated.
The base class VirtualArray is defined, which acts as a wrapper around lists allowing users to fold arbitrary sequential data into n-dimensional, R-style virtual arrays. The derived XArray class is defined to be used for homogeneous lists that contain a single class of objects. The RasterArray and SfArray classes enable the use of stacked spatial data instead of lists.
Empirical models for runoff, erosion, and phosphorus loss across a vegetated filter strip, given slope, soils, climate, and vegetation (Gall et al., 2018) <doi:10.1007/s00477-017-1505-x>. It also includes functions for deriving climate parameters from measured daily weather data, and for simulating rainfall. Models implemented include MUSLE (Williams, 1975) and APLE (Vadas et al., 2009 <doi:10.2134/jeq2008.0337>).
The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.
This package provides a fast, lightweight, and vectorized base 64 engine to encode and decode character and raw vectors as well as files stored on disk. Common base 64 alphabets are supported out of the box including the standard, URL-safe, bcrypt, crypt, BinHex', and IMAP-modified UTF-7 alphabets. Custom engines can be created to support unique base 64 encoding and decoding needs.
Supervised learning using Boltzmann Bayes model inference, which extends naive Bayes model to include interactions. Enables classification of data into multiple response groups based on a large number of discrete predictors that can take factor values of heterogeneous levels. Either pseudo-likelihood or mean field inference can be used with L2 regularization, cross-validation, and prediction on new data. <doi:10.18637/jss.v101.i05>.
This package implements the Centroid Decision Forest (CDF) as a single user-facing function CDF(). The method selects discriminative features via a multi-class class separability score (CSS), splits by nearest class centroid, and aggregates tree votes to produce predictions and class probabilities. Returns CSS-based feature importance as well. Amjad Ali, Saeed Aldahmani, Zardad Khan (2025) <doi:10.48550/arXiv.2503.19306>.