Splits data into Gaussian type clusters using the Cross-Entropy Clustering ('CEC') method. This method allows for the simultaneous use of various types of Gaussian mixture models, for performing the reduction of unnecessary clusters, and for discovering new clusters by splitting them. CEC is based on the work of Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.
This package provides a flexible framework for calculating Elo ratings and resulting rankings of any two-team-per-matchup system (chess, sports leagues, Go', etc.). This implementation is capable of evaluating a variety of matchups, Elo rating updates, and win probabilities, all based on the basic Elo rating system. It also includes methods to benchmark performance, including logistic regression and Markov chain models.
Compute maximum likelihood estimators of parameters in a Gaussian factor model using the the matrix-free methodology described in Dai et al. (2020) <doi:10.1080/10618600.2019.1704296>. In contrast to the factanal()
function from stats package, fad()
can handle high-dimensional datasets where number of variables exceed the sample size and is also substantially faster than the EM algorithms.
This package implements the generalized integration model, which integrates individual-level data and summary statistics under a generalized linear model framework. It supports continuous and binary outcomes to be modeled by the linear and logistic regression models. For binary outcome, data can be sampled in prospective cohort studies or case-control studies. Described in Zhang et al. (2020)<doi:10.1093/biomet/asaa014>.
Implementation of Tyler, Critchley, Duembgen and Oja's (JRSS B, 2009, <doi:10.1111/j.1467-9868.2009.00706.x>) and Oja, Sirkia and Eriksson's (AJS, 2006, <https://www.ajs.or.at/index.php/ajs/article/view/vol35,%20no2%263%20-%207>) method of two different scatter matrices to obtain an invariant coordinate system or independent components, depending on the underlying assumptions.
This package provides methods and tools for model selection and multi-model inference (Burnham and Anderson (2002) <doi:10.1007/b97636>, among others). SUR (for parameter estimation), logit'/'probit (for binary classification), and VARMA (for time-series forecasting) are implemented. Evaluations are both in-sample and out-of-sample. It is designed to be efficient in terms of CPU usage and memory consumption.
Calculates the amplification efficiency and curves from real-time quantitative PCR (Polymerase Chain Reaction) data. Estimates the relative expression from PCR data using the double delta CT and the standard curve methods Livak & Schmittgen (2001) <doi:10.1006/meth.2001.1262>. Tests for statistical significance using two-group tests and linear regression Yuan et al. (2006) <doi: 10.1186/1471-2105-7-85>.
This package creates an S4 class "SSM" and defines functions for fitting smooth supersaturated models, a polynomial model with spline-like behaviour. Functions are defined for the computation of Sobol indices for sensitivity analysis and plotting the main effects using FANOVA methods. It also implements the estimation of the SSM metamodel error using a GP model with a variety of defined correlation functions.
TOP constructs a transferable model across gene expression platforms for prospective experiments. Such a transferable model can be trained to make predictions on independent validation data with an accuracy that is similar to a re-substituted model. The TOP procedure also has the flexibility to be adapted to suit the most common clinical response variables, including linear response, binomial and Cox PH models.
Rygel is a home media solution (UPnP AV MediaServer and MediaRenderer) for GNOME that allows you to easily share audio, video, and pictures, and to control a media player on your home network.
Rygel achieves interoperability with other devices by trying to conform to the strict requirements of DLNA and by converting media on-the-fly to formats that client devices can handle.
This package provides a method for pattern discovery in weighted graphs as outlined in Thistlethwaite et al. (2021) <doi:10.1371/journal.pcbi.1008550>. Two use cases are achieved: 1) Given a weighted graph and a subset of its nodes, do the nodes show significant connectedness? 2) Given a weighted graph and two subsets of its nodes, are the subsets close neighbors or distant?
An interface to the Python InterpretML
framework for fitting explainable boosting machines (EBMs); see Nori et al. (2019) <doi:10.48550/arXiv.1909.09223>
for details. EBMs are a modern type of generalized additive model that use tree-based, cyclic gradient boosting with automatic interaction detection. They are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.
Higher-order latent trait theory (item response theory). We implement the generalized partial credit model with a second-order latent trait structure. Latent regression can be done on the second-order latent trait. For a pre-print of the methods, see, "Latent Regression in Higher-Order Item Response Theory with the R Package hlt" <https://mkleinsa.github.io/doc/hlt_proof_draft_brmic.pdf>.
The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included.
This package provides tools for cleaning, processing, and preparing microbiome sequencing data (e.g., 16S rRNA
) for downstream analysis. Supports CSV, TXT, and Excel file formats. The main function, ezclean()
, automates microbiome data transformation, including format validation, transposition, numeric conversion, and metadata integration. Also ensures efficient handling of taxonomic levels, resolves duplicated taxa entries, and outputs a well-structured, analysis-ready dataset.
Fit Gaussian Multinomial mixed-effects models for small area estimation: Model 1, with one random effect in each category of the response variable (Lopez-Vizcaino,E. et al., 2013) <doi:10.1177/1471082X13478873>; Model 2, introducing independent time effect; Model 3, introducing correlated time effect. mme calculates direct and parametric bootstrap MSE estimators (Lopez-Vizcaino,E et al., 2014) <doi:10.1111/rssa.12085>.
Processing Chlorophyll Fluorescence & P700 Absorbance data generated by WALZ hardware. Four models are provided for the regression of Pi curves, which can be compared with each other in order to select the most suitable model for the data set. Control plots ensure the successful verification of each regression. Bundled output of alpha, ETRmax, Ik etc. enables fast and reliable further processing of the data.
An extensive set of functions to perform Qualitative Comparative Analysis: crisp sets ('csQCA
'), temporal ('tQCA
'), multi-value ('mvQCA
') and fuzzy sets ('fsQCA
'), using a GUI - graphical user interface. QCA is a methodology that bridges the qualitative and quantitative divide in social science research. It uses a Boolean minimization algorithm, resulting in a minimal causal configuration associated with a given phenomenon.
The Simulation-based Sampling Protocol (SSP) is an R package designed to estimate sampling effort in studies of ecological communities. It is based on the concept of pseudo-multivariate standard error (MultSE
) (Anderson & Santana-Garcon, 2015, <doi:10.1111/ele.12385>) and the simulation of ecological data. The theoretical background is described in Guerra-Castro et al. (2020, <doi:10.1111/ecog.05284>).
Transmission Ratio Distortion (TRD) is a genetic phenomenon where the two alleles from either parent are not transmitted to the offspring at the expected 1:1 ratio under Mendelian inheritance, leading to spurious signals in genetic association studies. Functions in this package are developed to account for this phenomenon using loglinear model and Transmission Disequilibrium Test (TDT). Some population information can also be calculated.
The base class VirtualArray
is defined, which acts as a wrapper around lists allowing users to fold arbitrary sequential data into n-dimensional, R-style virtual arrays. The derived XArray class is defined to be used for homogeneous lists that contain a single class of objects. The RasterArray
and SfArray
classes enable the use of stacked spatial data instead of lists.
Empirical models for runoff, erosion, and phosphorus loss across a vegetated filter strip, given slope, soils, climate, and vegetation (Gall et al., 2018) <doi:10.1007/s00477-017-1505-x>. It also includes functions for deriving climate parameters from measured daily weather data, and for simulating rainfall. Models implemented include MUSLE (Williams, 1975) and APLE (Vadas et al., 2009 <doi:10.2134/jeq2008.0337>).
The dks package consists of a set of diagnostic functions for multiple testing methods. The functions can be used to determine if the p-values produced by a multiple testing procedure are correct. These functions are designed to be applied to simulated data. The functions require the entire set of p-values from multiple simulated studies, so that the joint distribution can be evaluated.
Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the QFeatures package and relies on SingleCellExpirement
to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant
, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.