The dks package consists of a set of diagnostic functions for multiple testing methods. The functions can be used to determine if the p-values produced by a multiple testing procedure are correct. These functions are designed to be applied to simulated data. The functions require the entire set of p-values from multiple simulated studies, so that the joint distribution can be evaluated.
This package provides methods to analyse the stability of non-deterministic prediction models. Prediction stability is quantified either as data-based prediction stability (phi) or as model-based prediction stability (psi). The package implements measures for categorical, ordinal, and metric predictions based on repeated model fitting and corresponding predictions. Methods are based on Lange et al. (2025) <doi:10.1186/s12859-025-06097-1>.
This package provides a method for pattern discovery in weighted graphs as outlined in Thistlethwaite et al. (2021) <doi:10.1371/journal.pcbi.1008550>. Two use cases are achieved: 1) Given a weighted graph and a subset of its nodes, do the nodes show significant connectedness? 2) Given a weighted graph and two subsets of its nodes, are the subsets close neighbors or distant?
An interface to the Python InterpretML framework for fitting explainable boosting machines (EBMs); see Nori et al. (2019) <doi:10.48550/arXiv.1909.09223> for details. EBMs are a modern type of generalized additive model that use tree-based, cyclic gradient boosting with automatic interaction detection. They are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.
The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included.
Higher-order latent trait theory (item response theory). We implement the generalized partial credit model with a second-order latent trait structure. Latent regression can be done on the second-order latent trait. For a pre-print of the methods, see, "Latent Regression in Higher-Order Item Response Theory with the R Package hlt" <https://mkleinsa.github.io/doc/hlt_proof_draft_brmic.pdf>.
Fit Gaussian Multinomial mixed-effects models for small area estimation: Model 1, with one random effect in each category of the response variable (Lopez-Vizcaino,E. et al., 2013) <doi:10.1177/1471082X13478873>; Model 2, introducing independent time effect; Model 3, introducing correlated time effect. mme calculates direct and parametric bootstrap MSE estimators (Lopez-Vizcaino,E et al., 2014) <doi:10.1111/rssa.12085>.
An extensive set of functions to perform Qualitative Comparative Analysis: crisp sets ('csQCA'), temporal ('tQCA'), multi-value ('mvQCA') and fuzzy sets ('fsQCA'), using a GUI - graphical user interface. QCA is a methodology that bridges the qualitative and quantitative divide in social science research. It uses a Boolean minimization algorithm, resulting in a minimal causal configuration associated with a given phenomenon.
The Simulation-based Sampling Protocol (SSP) is an R package designed to estimate sampling effort in studies of ecological communities. It is based on the concept of pseudo-multivariate standard error (MultSE) (Anderson & Santana-Garcon, 2015, <doi:10.1111/ele.12385>) and the simulation of ecological data. The theoretical background is described in Guerra-Castro et al. (2020, <doi:10.1111/ecog.05284>).
Calculates vote-specific and traditional Shapley-Owen power indices (vs-SOVs and SOVs) for spatial voting games in one to four dimensions. Evaluates voter influence through an a posteriori analysis of relative preferences. Supports weighted voting and various voting thresholds. Compatible with ideal point estimates from NOMINATE, Optimal Classification, and MCMCpack'. The method builds on Bibina and Dougherty (2025) <doi:10.2139/ssrn.6324519>.
Transmission Ratio Distortion (TRD) is a genetic phenomenon where the two alleles from either parent are not transmitted to the offspring at the expected 1:1 ratio under Mendelian inheritance, leading to spurious signals in genetic association studies. Functions in this package are developed to account for this phenomenon using loglinear model and Transmission Disequilibrium Test (TDT). Some population information can also be calculated.
Empirical models for runoff, erosion, and phosphorus loss across a vegetated filter strip, given slope, soils, climate, and vegetation (Gall et al., 2018) <doi:10.1007/s00477-017-1505-x>. It also includes functions for deriving climate parameters from measured daily weather data, and for simulating rainfall. Models implemented include MUSLE (Williams, 1975) and APLE (Vadas et al., 2009 <doi:10.2134/jeq2008.0337>).
The base class VirtualArray is defined, which acts as a wrapper around lists allowing users to fold arbitrary sequential data into n-dimensional, R-style virtual arrays. The derived XArray class is defined to be used for homogeneous lists that contain a single class of objects. The RasterArray and SfArray classes enable the use of stacked spatial data instead of lists.
The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.
Supervised learning using Boltzmann Bayes model inference, which extends naive Bayes model to include interactions. Enables classification of data into multiple response groups based on a large number of discrete predictors that can take factor values of heterogeneous levels. Either pseudo-likelihood or mean field inference can be used with L2 regularization, cross-validation, and prediction on new data. <doi:10.18637/jss.v101.i05>.
This package provides a fast, lightweight, and vectorized base 64 engine to encode and decode character and raw vectors as well as files stored on disk. Common base 64 alphabets are supported out of the box including the standard, URL-safe, bcrypt, crypt, BinHex', and IMAP-modified UTF-7 alphabets. Custom engines can be created to support unique base 64 encoding and decoding needs.
This package implements the Centroid Decision Forest (CDF) as a single user-facing function CDF(). The method selects discriminative features via a multi-class class separability score (CSS), splits by nearest class centroid, and aggregates tree votes to produce predictions and class probabilities. Returns CSS-based feature importance as well. Amjad Ali, Saeed Aldahmani, Zardad Khan (2025) <doi:10.48550/arXiv.2503.19306>.
This MCMC method takes a data numeric vector (Y) and assigns the elements of Y to a (potentially infinite) number of normal distributions. The individual normal distributions from a mixture of normals can be inferred. Following the method described in Escobar (1994) <doi:10.2307/2291223> we use a Dirichlet Process Prior (DPP) to describe stochastically our prior assumptions about the dimensionality of the data.
Function and data sets in the book entitled "R ile Temel Ekonometri", S.Guris, E.C.Akay, B. Guris(2020). The book published in Turkish. It is possible to makes Durbin two stage method for autocorrelation, generalized differencing method for correction autocorrelation, Hausman Test for identification and computes LM, LR and Wald test statistics for redundant variable by using the functions written in this package.
This package provides an interface to the Gibbs SeaWater ('TEOS-10') C library, version 3.06-16-0 (commit 657216dd4f5ea079b5f0e021a4163e2d26893371', dated 2022-10-11, available at <https://github.com/TEOS-10/GSW-C>, which stems from Matlab and other code written by members of Working Group 127 of SCOR'/'IAPSO (Scientific Committee on Oceanic Research / International Association for the Physical Sciences of the Oceans).
This package produces a group screening procedure that is based on maximum Lq-likelihood estimation, to simultaneously account for the group structure and data contamination in variable screening. The methods are described in Li, Y., Li, R., Qin, Y., Lin, C., & Yang, Y. (2021) Robust Group Variable Screening Based on Maximum Lq-likelihood Estimation. Statistics in Medicine, 40:6818-6834.<doi:10.1002/sim.9212>.
The purpose of this package is to share a collection of functions the author wrote during weekends for managing kitchen and garden tasks, e.g. making plant growth charts or Thanksgiving kitchen schedule charts, etc. Functions might include but not limited to: (1) aiding summarizing time related data; (2) generating axis transformation from data; and (3) aiding Markdown (with html output) and Shiny file editing.
Calculates spatial pattern analysis using a T-square sample procedure. This method is based on two measures "x" and "y". "x" - Distance from the random point to the nearest individual. "y" - Distance from individual to its nearest neighbor. This is a methodology commonly used in phytosociology or marine benthos ecology to analyze the species distribution (random, uniform or clumped patterns). Ludwig & Reynolds (1988, ISBN:0471832359).