Analyzing responses to check-all-that-apply survey items often requires data transformations and subjective decisions for combining categories. CATAcode contains tools for exploring response patterns, facilitating data transformations, applying a set of decision rules for coding responses, and summarizing response frequencies.
Expectation-Maximization (EM) algorithm for point estimation and variance estimation to the nonparametric maximum likelihood estimator (NPMLE) for logistic-Cox cure-rate model with left truncation and right- censoring. See Hou, Chambers and Xu (2017) <doi:10.1007/s10985-017-9415-2>.
Biclustering, row clustering and column clustering using the proportional odds model (POM), ordered stereotype model (OSM) or binary model for ordinal categorical data. Fernández, D., Arnold, R., Pledger, S., Liu, I., & Costilla, R. (2019) <doi:10.1007/s11634-018-0324-3>.
Computes a new measure, DNSL betweenness, via the creation of a new graph from an existing one, duplicating nodes with self-loops. This betweenness centrality does not drop this essential information. Implements Merelo & Molinari (2024) <doi:10.1007/s42001-023-00245-4>.
Semi-Binary and Semi-Ternary Matrix Decomposition are performed based on Non-negative Matrix Factorization (NMF) and Singular Value Decomposition (SVD). For the details of the methods, see the reference section of GitHub README.md <https://github.com/rikenbit/dcTensor>.
This package performs an exploratory data analysis through a shiny interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-Means Method.
Clinical coding and diagnosis of patients with kidney using clinical practice guidelines. The guidelines used are the evidence-based KDIGO guidelines, see <https://kdigo.org/guidelines/> for more information. This package covers acute kidney injury (AKI), anemia, and chronic kidney disease (CKD).
Binary segmentation methods for detecting and estimating multiple change-points in the mean or second-order structure of high-dimensional time series as described in Cho and Fryzlewicz (2014) <doi:10.1111/rssb.12079> and Cho (2016) <doi:10.1214/16-EJS1155>.
This package provides user-friendly and configurable print debugging via a single function, ic(). Wrap an expression in ic() to print the expression, its value and (where available) its source location. Debugging output can be toggled globally without modifying code.
Implementation of tandem clustering with invariant coordinate selection with different scatter matrices and several choices for the selection of components as described in Alfons, A., Archimbaud, A., Nordhausen, K.and Ruiz-Gazen, A. (2024) <doi:10.1016/j.ecosta.2024.03.002>.
Josa in Korean is often determined by judging the previous word. When writing reports using Rmd, a function that prints the appropriate investigation for each case is helpful. The josaplay package then evaluates the previous word to determine which josa is appropriate.
This package provides a fast and computationally efficient algorithm designed to enable researchers to efficiently and quickly extract semantically-related keywords using a fitted embedding model. For more details about the methods applied, see Chester (2025). <doi:10.17605/OSF.IO/5B7RQ>.
Offers a graphical user interface for the evaluation of inter-rater agreement with Cohen's and Fleiss Kappa. The calculation of kappa statistics is done using the R package irr', so that KappaGUI is essentially a Shiny front-end for irr'.
This comprehensive toolkit provide a consistent and extensible framework for working with missing values in vectors. The companion package tidyimpute provides similar functionality for list-like and table-like structures). Functions exist for detection, removal, replacement, imputation, recollection, etc. of NAs'.
Helper functions for Org files (<https://orgmode.org/>): a generic function toOrg for transforming R objects into Org markup (most useful for data frames; there are also methods for Dates/POSIXt) and a function to read Org tables into data frames.
SigClust is a statistical method for testing the significance of clustering results. SigClust can be applied to assess the statistical significance of splitting a data set into two clusters. For more than two clusters, SigClust can be used iteratively.
An implementation of image processing effects that convert a photo into a line drawing image. For details, please refer to Tsuda, H. (2020). sketcher: An R package for converting a photo into a sketch style image. <doi:10.31234/osf.io/svmw5>.
This package implements Bayesian methods, described in Hugh-Jones (2019) <doi:10.1007/s40881-019-00069-x>, for estimating the proportion of liars in coin flip-style experiments, where subjects report a random outcome and are paid for reporting a "good" outcome.
Defines the classes used to identify outliers (threshing) and compute the number of significant principal components and number of clusters (reaping) in a joint application of PCA and hierarchical clustering. See Wang et al., 2018, <doi:10.1186/s12859-017-1998-9>.
This package provides methods and tools for analysing and validating the outputs and modelled functions of artificial neural networks (ANNs) in terms of predictive, replicative and structural validity. Also provides a method for fitting feed-forward ANNs with a single hidden layer.
CellMixS provides metrics and functions to evaluate batch effects, data integration and batch effect correction in single cell trancriptome data with single cell resolution. Results can be visualized and summarised on different levels, e.g. on cell, celltype or dataset level.
The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.
This package provides utilities based on libpoppler for extracting text, fonts, attachments and metadata from a PDF file. It also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.
The goal of this method is to identify associations between bacteria and an environmental variable in 16S or other compositional data. The environmental variable is any variable which is measure for each microbiome sample, for example, a butyrate measurement paired with every sample in the data. Microbiome data is compositional, meaning that the total abundance of each sample sums to 1, and this introduces severe statistical distortions. This method takes a Bayesian approach to correcting for these statistical distortions, in which the total abundance is treated as an unknown variable. This package runs the python implementation using reticulate.