This package provides functions to extract and handle commonly occurring principal phrases obtained from collections of texts. Major speed improvements - core functions rewritten in C++ for faster phrase-document parsing, clustering, and text distance computations. Based on, Small, E., & Cabrera, J. (2025). Principal phrase mining, an automated method for extracting meaningful phrases from text. International Journal of Computers and Applications, 47(1), 84รข 92.
Processing Chlorophyll Fluorescence & P700 Absorbance data generated by WALZ hardware. Four models are provided for the regression of Pi curves, which can be compared with each other in order to select the most suitable model for the data set. Control plots ensure the successful verification of each regression. Bundled output of alpha, ETRmax, Ik etc. enables fast and reliable further processing of the data.
An extensive set of functions to perform Qualitative Comparative Analysis: crisp sets ('csQCA'), temporal ('tQCA'), multi-value ('mvQCA') and fuzzy sets ('fsQCA'), using a GUI - graphical user interface. QCA is a methodology that bridges the qualitative and quantitative divide in social science research. It uses a Boolean minimization algorithm, resulting in a minimal causal configuration associated with a given phenomenon.
The Simulation-based Sampling Protocol (SSP) is an R package designed to estimate sampling effort in studies of ecological communities. It is based on the concept of pseudo-multivariate standard error (MultSE) (Anderson & Santana-Garcon, 2015, <doi:10.1111/ele.12385>) and the simulation of ecological data. The theoretical background is described in Guerra-Castro et al. (2020, <doi:10.1111/ecog.05284>).
Transmission Ratio Distortion (TRD) is a genetic phenomenon where the two alleles from either parent are not transmitted to the offspring at the expected 1:1 ratio under Mendelian inheritance, leading to spurious signals in genetic association studies. Functions in this package are developed to account for this phenomenon using loglinear model and Transmission Disequilibrium Test (TDT). Some population information can also be calculated.
The base class VirtualArray is defined, which acts as a wrapper around lists allowing users to fold arbitrary sequential data into n-dimensional, R-style virtual arrays. The derived XArray class is defined to be used for homogeneous lists that contain a single class of objects. The RasterArray and SfArray classes enable the use of stacked spatial data instead of lists.
Empirical models for runoff, erosion, and phosphorus loss across a vegetated filter strip, given slope, soils, climate, and vegetation (Gall et al., 2018) <doi:10.1007/s00477-017-1505-x>. It also includes functions for deriving climate parameters from measured daily weather data, and for simulating rainfall. Models implemented include MUSLE (Williams, 1975) and APLE (Vadas et al., 2009 <doi:10.2134/jeq2008.0337>).
The dks package consists of a set of diagnostic functions for multiple testing methods. The functions can be used to determine if the p-values produced by a multiple testing procedure are correct. These functions are designed to be applied to simulated data. The functions require the entire set of p-values from multiple simulated studies, so that the joint distribution can be evaluated.
Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the QFeatures package and relies on SingleCellExpirement to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.
The msa package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.
This package implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixed-membership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.
This package provides functions for fitting continuous-time Markov and hidden Markov multi-state models to longitudinal data. It was designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewise-constant in time.
This package provides a fast, lightweight, and vectorized base 64 engine to encode and decode character and raw vectors as well as files stored on disk. Common base 64 alphabets are supported out of the box including the standard, URL-safe, bcrypt, crypt, BinHex', and IMAP-modified UTF-7 alphabets. Custom engines can be created to support unique base 64 encoding and decoding needs.
Supervised learning using Boltzmann Bayes model inference, which extends naive Bayes model to include interactions. Enables classification of data into multiple response groups based on a large number of discrete predictors that can take factor values of heterogeneous levels. Either pseudo-likelihood or mean field inference can be used with L2 regularization, cross-validation, and prediction on new data. <doi:10.18637/jss.v101.i05>.
This MCMC method takes a data numeric vector (Y) and assigns the elements of Y to a (potentially infinite) number of normal distributions. The individual normal distributions from a mixture of normals can be inferred. Following the method described in Escobar (1994) <doi:10.2307/2291223> we use a Dirichlet Process Prior (DPP) to describe stochastically our prior assumptions about the dimensionality of the data.
Function and data sets in the book entitled "R ile Temel Ekonometri", S.Guris, E.C.Akay, B. Guris(2020). The book published in Turkish. It is possible to makes Durbin two stage method for autocorrelation, generalized differencing method for correction autocorrelation, Hausman Test for identification and computes LM, LR and Wald test statistics for redundant variable by using the functions written in this package.
This package provides an interface to the Gibbs SeaWater ('TEOS-10') C library, version 3.06-16-0 (commit 657216dd4f5ea079b5f0e021a4163e2d26893371', dated 2022-10-11, available at <https://github.com/TEOS-10/GSW-C>, which stems from Matlab and other code written by members of Working Group 127 of SCOR'/'IAPSO (Scientific Committee on Oceanic Research / International Association for the Physical Sciences of the Oceans).
This package produces a group screening procedure that is based on maximum Lq-likelihood estimation, to simultaneously account for the group structure and data contamination in variable screening. The methods are described in Li, Y., Li, R., Qin, Y., Lin, C., & Yang, Y. (2021) Robust Group Variable Screening Based on Maximum Lq-likelihood Estimation. Statistics in Medicine, 40:6818-6834.<doi:10.1002/sim.9212>.
The purpose of this package is to share a collection of functions the author wrote during weekends for managing kitchen and garden tasks, e.g. making plant growth charts or Thanksgiving kitchen schedule charts, etc. Functions might include but not limited to: (1) aiding summarizing time related data; (2) generating axis transformation from data; and (3) aiding Markdown (with html output) and Shiny file editing.
Calculates spatial pattern analysis using a T-square sample procedure. This method is based on two measures "x" and "y". "x" - Distance from the random point to the nearest individual. "y" - Distance from individual to its nearest neighbor. This is a methodology commonly used in phytosociology or marine benthos ecology to analyze the species distribution (random, uniform or clumped patterns). Ludwig & Reynolds (1988, ISBN:0471832359).
Predicate helper functions for testing atomic vectors in R. All functions take a single argument x and check whether it's of the target type of base-R atomic vector (i.e. no class extensions nor attributes other than names'), returning TRUE or FALSE. Some additionally check for value (e.g. absence of missing values, infinities, blank characters, or names attribute; or having length 1).
The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.
The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
Implementation of the categorical instrumental variable (CIV) estimator proposed by Wiemann (2023) <arXiv:2311.17021>. CIV allows for optimal instrumental variable estimation in settings with relatively few observations per category. To obtain valid inference in these challenging settings, CIV leverages a regularization assumption that implies existence of a latent categorical variable with fixed finite support achieving the same first stage fit as the observed instrument.