This package provides a fast generalized edit distance and string alignment computation mainly for linguistic aims. As a generalization to the classic edit distance algorithms, the package allows users to define custom cost for every symbol's insertion, deletion, and substitution. The package also allows character combinations in any length to be seen as a single symbol which is very useful for International Phonetic Alphabet (IPA) transcriptions with diacritics. In addition to edit distance result, users can get detailed alignment information such as all possible alignment scenarios between two strings which is useful for testing, illustration or any further usage. Either the distance matrix or its long table form can be obtained and tools to do such conversions are provided. All functions in the package are implemented in C++ and the distance matrix computation is parallelized leveraging the RcppThread package.
This package provides a set of tools to facilitate data sonification and handle the musicXML format <https://usermanuals.musicxml.com/MusicXML/Content/XS-MusicXML.htm>. Several classes are defined for basic musical objects such as note pitch, note duration, note, measure and score. Moreover, sonification utilities functions are provided, e.g. to map data into musical attributes such as pitch, loudness or duration. A typical sonification workflow hence looks like: get data; map them to musical attributes; create and write the musicXML score, which can then be further processed using specialized music software (e.g. MuseScore', GuitarPro', etc.). Examples can be found in the blog <https://globxblog.github.io/>, the presentation by Renard and Le Bescond (2022, <https://hal.science/hal-03710340v1>) or the poster by Renard et al. (2023, <https://hal.inrae.fr/hal-04388845v1>).
There is variation across AgNPs due to differences in characterization techniques and testing metrics employed in studies. To address this problem, we have developed a systematic evaluation framework called sysAgNPs'. Within this framework, Distribution Entropy (DE) is utilized to measure the uncertainty of feature categories of AgNPs, Proclivity Entropy (PE) assesses the preference of these categories, and Combination Entropy (CE) quantifies the uncertainty of feature combinations of AgNPs. Additionally, a Markov chain model is employed to examine the relationships among the sub-features of AgNPs and to determine a Transition Score (TS) scoring standard that is based on steady-state probabilities. The sysAgNPs framework provides metrics for evaluating AgNPs, which helps to unravel their complexity and facilitates effective comparisons among different AgNPs, thereby advancing the scientific research and application of these AgNPs.
This package implements Lagrangian multiplier smoothing splines for flexible nonparametric regression and function estimation. Provides tools for fitting, prediction, and inference using a constrained optimization approach to enforce smoothness. Supports generalized linear models, Weibull accelerated failure time (AFT) models, quadratic programming problems, and customizable arbitrary correlation structures. Options for fitting in parallel are provided. The method builds upon the framework described by Ezhov et al. (2018) <doi:10.1515/jag-2017-0029> using Lagrangian multipliers to fit cubic splines. For more information on correlation structure estimation, see Searle et al. (2009) <ISBN:978-0470009598>. For quadratic programming and constrained optimization in general, see Nocedal & Wright (2006) <doi:10.1007/978-0-387-40065-5>. For a comprehensive background on smoothing splines, see Wahba (1990) <doi:10.1137/1.9781611970128> and Wood (2006) <ISBN:978-1584884743> "Generalized Additive Models: An Introduction with R".
Dichotomous responses having two categories can be analyzed with stats::glm() or lme4::glmer() using the family=binomial option. Unfortunately, polytomous responses with three or more unordered categories cannot be analyzed similarly because there is no analogous family=multinomial option. For between-subjects data, nnet::multinom() can address this need, but it cannot handle random factors and therefore cannot handle repeated measures. To address this gap, we transform nominal response data into counts for each categorical alternative. These counts are then analyzed using (mixed) Poisson regression as per Baker (1994) <doi:10.2307/2348134>. Omnibus analyses of variance can be run along with post hoc pairwise comparisons. For users wishing to analyze nominal responses from surveys or experiments, the functions in this package essentially act as though stats::glm() or lme4::glmer() provide a family=multinomial option.
This package provides a set of functions which use the Expectation Maximisation (EM) algorithm (Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x> Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39(1), 1--22) to take a finite mixture model approach to clustering. The package is designed to cluster multivariate data that have categorical and continuous variables and that possibly contain missing values. The method is described in Hunt, L. and Jorgensen, M. (1999) <doi:10.1111/1467-842X.00071> Australian & New Zealand Journal of Statistics 41(2), 153--171 and Hunt, L. and Jorgensen, M. (2003) <doi:10.1016/S0167-9473(02)00190-1> Mixture model clustering for mixed data with missing information, Computational Statistics & Data Analysis, 41(3-4), 429--440.
In a typical microarray setting with gene expression data observed under two conditions, the local false discovery rate describes the probability that a gene is not differentially expressed between the two conditions given its corrresponding observed score or p-value level. The resulting curve of p-values versus local false discovery rate offers an insight into the twilight zone between clear differential and clear non-differential gene expression. Package twilight contains two main functions: Function twilight.pval performs a two-condition test on differences in means for a given input matrix or expression set and computes permutation based p-values. Function twilight performs a stochastic downhill search to estimate local false discovery rates and effect size distributions. The package further provides means to filter for permutations that describe the null distribution correctly. Using filtered permutations, the influence of hidden confounders could be diminished.
Simulation of the stochastic 3D structure model for the nanoporous binder-conductive additive phase in battery cathodes introduced in P. Gräfensteiner, M. Osenberg, A. Hilger, N. Bohn, J. R. Binder, I. Manke, V. Schmidt, M. Neumann (2024) <doi:10.48550/arXiv.2409.11080>. The model is developed for a binder-conductive additive phase of consisting of carbon black, polyvinylidene difluoride binder and graphite particles. For its stochastic 3D modeling, a three-step procedure based on methods from stochastic geometry is used. First, the graphite particles are described by a Boolean model with ellipsoidal grains. Second, the mixture of carbon black and binder is modeled by an excursion set of a Gaussian random field in the complement of the graphite particles. Third, large pore regions within the mixture of carbon black and binder are described by a Boolean model with spherical grains.
Linear cross-section factor model fitting with least-squares and robust fitting the lmrobdetMM() function from RobStatTM'; related volatility, Value at Risk and Expected Shortfall risk and performance attribution (factor-contributed vs idiosyncratic returns); tabular displays of risk and performance reports; factor model Monte Carlo. The package authors would like to thank Chicago Research on Security Prices,LLC for the cross-section of about 300 CRSP stocks data (in the data.table object stocksCRSP', and S&P GLOBAL MARKET INTELLIGENCE for contributing 14 factor scores (a.k.a "alpha factors".and "factor exposures") fundamental data on the 300 companies in the data.table object factorSPGMI'. The stocksCRSP and factorsSPGMI data are not covered by the GPL-2 license, are not provided as open source of any kind, and they are not to be redistributed in any form.
This package provides convenience functions for common data modification and analysis tasks in communication research. This includes functions for univariate and bivariate data analysis, index generation and reliability computation, and intercoder reliability tests. All functions follow the style and syntax of the tidyverse, and are construed to perform their computations on multiple variables at once. Functions for univariate and bivariate data analysis comprise summary statistics for continuous and categorical variables, as well as several tests of bivariate association including effect sizes. Functions for data modification comprise index generation and automated reliability analysis of index variables. Functions for intercoder reliability comprise tests of several intercoder reliability estimates, including simple and mean pairwise percent agreement, Krippendorff's Alpha (Krippendorff 2004, ISBN: 9780761915454), and various Kappa coefficients (Brennan & Prediger 1981 <doi: 10.1177/001316448104100307>; Cohen 1960 <doi: 10.1177/001316446002000104>; Fleiss 1971 <doi: 10.1037/h0031619>).
The textrank algorithm is an extension of the Pagerank algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the Pagerank algorithm which identifies the most important sentences in your text and ranks them. In a similar way textrank can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the Pagerank algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.
An R frontend for the WhiteboxTools library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. WhiteboxTools can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. WhiteboxTools also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
LOBSTAHS is a multifunction package for screening, annotation, and putative identification of mass spectral features in large, HPLC-MS lipid datasets. In silico data for a wide range of lipids, oxidized lipids, and oxylipins can be generated from user-supplied structural criteria with a database generation function. LOBSTAHS then applies these databases to assign putative compound identities to features in any high-mass accuracy dataset that has been processed using xcms and CAMERA. Users can then apply a series of orthogonal screening criteria based on adduct ion formation patterns, chromatographic retention time, and other properties, to evaluate and assign confidence scores to this list of preliminary assignments. During the screening routine, LOBSTAHS rejects assignments that do not meet the specified criteria, identifies potential isomers and isobars, and assigns a variety of annotation codes to assist the user in evaluating the accuracy of each assignment.
Synapsis is a Bioconductor software package for automated (unbiased and reproducible) analysis of meiotic immunofluorescence datasets. The primary functions of the software can i) identify cells in meiotic prophase that are labelled by a synaptonemal complex axis or central element protein, ii) isolate individual synaptonemal complexes and measure their physical length, iii) quantify foci and co-localise them with synaptonemal complexes, iv) measure interference between synaptonemal complex-associated foci. The software has applications that extend to multiple species and to the analysis of other proteins that label meiotic prophase chromosomes. The software converts meiotic immunofluorescence images into R data frames that are compatible with machine learning methods. Given a set of microscopy images of meiotic spread slides, synapsis crops images around individual single cells, counts colocalising foci on strands on a per cell basis, and measures the distance between foci on any given strand.
Implementation of No-Effect-Concentration estimation that uses brms (see Burkner (2017)<doi:10.18637/jss.v080.i01>; Burkner (2018)<doi:10.32614/RJ-2018-017>; Carpenter et al. (2017)<doi:10.18637/jss.v076.i01> to fit concentration(dose)-response data using Bayesian methods for the purpose of estimating ECx values, but more particularly NEC (see Fox (2010)<doi:10.1016/j.ecoenv.2009.09.012>), NSEC (see Fisher and Fox (2023)<doi:10.1002/etc.5610>), and N(S)EC (see Fisher et al. 2023<doi:10.1002/ieam.4809>). A full description of this package can be found in Fisher et al. (2024)<doi:10.18637/jss.v110.i05>. This package expands and supersedes an original version implemented in R2jags (see Su and Yajima (2020)<https://CRAN.R-project.org/package=R2jags>; Fisher et al. (2020)<doi:10.5281/ZENODO.3966864>).
Computes confidence intervals for the positive predictive value (PPV) and negative predictive value (NPV) based on varied scenarios. In situations where the proportion of diseased subjects does not correspond to the disease prevalence (e.g. case-control studies), this package provides two types of solutions: 1) five methods for estimating confidence intervals for PPV and NPV via ratio of two binomial proportions including Gart & Nam (1988), Walter (1975), MOVER-J (Laud, 2017), Fieller (1954), and Bootstrap (Efron, 1979); 2) three direct methods that compute the confidence intervals including Pepe (2003), Zhou (2007), and Delta. In prospective studies where the proportion of diseased subjects is an unbiased estimate of the disease prevalence, this package provides several methods for calculating the confidence intervals for PPV and NPV including Clopper-Pearson, Wald, Wilson, Agresti-Coull, and Beta. See the Details and References sections in the corresponding functions.
Deep compositional spatial models are standard spatial covariance models coupled with an injective warping function of the spatial domain. The warping function is constructed through a composition of multiple elemental injective functions in a deep-learning framework. The package implements two cases for the univariate setting; first, when these warping functions are known up to some weights that need to be estimated, and, second, when the weights in each layer are random. In the multivariate setting only the former case is available. Estimation and inference is done using `tensorflow`, which makes use of graphics processing units. For more details see Zammit-Mangion et al. (2022) <doi:10.1080/01621459.2021.1887741>, Vu et al. (2022) <doi:10.5705/ss.202020.0156>, Vu et al. (2023) <doi:10.1016/j.spasta.2023.100742>, and Shao et al. (2025) <doi:10.48550/arXiv.2505.12548>.
Illustrate graphically the most common Null Hypothesis Significance Testing procedures. More specifically, this package provides functions to plot Chi-Squared, F, t (one- and two-tailed) and z (one- and two-tailed) tests, by plotting the probability density under the null hypothesis as a function of the different test statistic values. Although highly flexible (color theme, fonts, etc.), only the minimal number of arguments (observed test statistic, degrees of freedom) are necessary for a clear and useful graph to be plotted, with the observed test statistic and the p value, as well as their corresponding value labels. The axes are automatically scaled to present the relevant part and the overall shape of the probability density function. This package is especially intended for education purposes, as it provides a helpful support to help explain the Null Hypothesis Significance Testing process, its use and/or shortcomings.
Is designed to interactively and reproducibly visualize and filter SNP (single-nucleotide polymorphism) datasets. This R-based implementation of SNP and genotype filters facilitates an interactive and iterative SNP filtering pipeline, which can be documented reproducibly via rmarkdown'. SNPfiltR contains functions for visualizing various quality and missing data metrics for a SNP dataset, and then filtering the dataset based on user specified cutoffs. All functions take vcfR objects as input, which can easily be generated by reading standard vcf (variant call format) files into R using the R package vcfR authored by Knaus and Grünwald (2017) <doi:10.1111/1755-0998.12549>. Each SNPfiltR function can return a newly filtered vcfR object, which can then be written to a local directory in standard vcf format using the vcfR package, for downstream population genetic and phylogenetic analyses.
Algorithms for computing and generating plots with and without error bars for Bayesian cluster validity index (BCVI) (O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. <doi:10.1016/j.csda.2024.108053>) based on several underlying cluster validity indexes (CVIs) including Calinski-Harabasz, Chou-Su-Lai, Davies-Bouldin, Dunn, Pakhira-Bandyopadhyay-Maulik, Point biserial correlation, the score function, Starczewski, and Wiroonsri indices for hard clustering, and Correlation Cluster Validity, the generalized C, HF, KWON, KWON2, Modified Pakhira-Bandyopadhyay-Maulik, Pakhira-Bandyopadhyay-Maulik, Tang, Wiroonsri-Preedasawakul, Wu-Li, and Xie-Beni indices for soft clustering. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). Though BCVI is compatible with any underlying existing CVIs, we recommend users to use either WI or WP as the underlying CVI.
The four functions svdcp() ('cp for column partitioned), svdbip() or svdbip2() ('bip for bipartitioned), and svdbips() ('s for a simultaneous optimization of a set of r solutions), correspond to a singular value decomposition (SVD) by blocks notion, by supposing each block depending on relative subspaces, rather than on two whole spaces as usual SVD does. The other functions, based on this notion, are relative to two column partitioned data matrices x and y defining two sets of subsets x_i and y_j of variables and amount to estimate a link between x_i and y_j for the pair (x_i, y_j) relatively to the links associated to all the other pairs. These methods were first presented in: Lafosse R. & Hanafi M.,(1997) <https://eudml.org/doc/106424> and Hanafi M. & Lafosse, R. (2001) <https://eudml.org/doc/106494>.
Fit data to an ellipse, hyperbola, or parabola. Bootstrapping is available when needed. The conic curve can be rotated through an arbitrary angle and the fit will still succeed. Helper functions are provided to convert generator coefficients from one style to another, generate test data sets, rotate conic section parameters, and so on. References include Nikolai Chernov (2014) "Fitting ellipses, circles, and lines by least squares" <https://people.cas.uab.edu/~mosya/cl/>; A. W. Fitzgibbon, M. Pilu, R. B. Fisher (1999) "Direct Least Squares Fitting of Ellipses" IEEE Trans. PAMI, Vol. 21, pages 476-48; N. Chernov, Q. Huang, and H. Ma (2014) "Fitting quadratic curves to data points", British Journal of Mathematics & Computer Science, 4, 33-60; N. Chernov and H. Ma (2011) "Least squares fitting of quadratic curves and surfaces", Computer Vision, Editor S. R. Yoshida, Nova Science Publishers, pp. 285-302.
The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.
omicsGMF is a Bioconductor package that uses the sgdGMF-framework of the \codesgdGMF package for highly performant and fast matrix factorization that can be used for dimensionality reduction, visualization and imputation of omics data. It considers data from the general exponential family as input, and therefore suits the use of both RNA-seq (Poisson or Negative Binomial data) and proteomics data (Gaussian data). It does not require prior transformation of counts to the log-scale, because it rather optimizes the deviances from the data family specified. Also, it allows to correct for known sample-level and feature-level covariates, therefore enabling visualization and dimensionality reduction upon batch correction. Last but not least, it deals with missing values, and allows to impute these after matrix factorization, useful for proteomics data. This Bioconductor package allows input of SummarizedExperiment, SingleCellExperiment, and QFeature classes.