This package provides a set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis.
Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq
) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.
Simplifies the process of creating essential visualizations in R, offering a range of plotting functions for common chart types like violin plots, pie charts, and histograms. With an intuitive interface, users can effortlessly customize colors, labels, and styles, making it an ideal tool for both beginners and experienced data analysts. Whether exploring datasets or producing quick visual summaries, this package provides a streamlined solution for fundamental graphics in R.
It predicts any attribute (categorical) given a set of input numeric predictor values. Note that only numeric input predictors should be given. The k value can be chosen according to accuracies provided. The attribute to be predicted can be selected from the dropdown provided (select categorical attribute). This is because categorical attributes cannot be given as inputs here. A handsontable is also provided to enter the input predictor values.
The provided package implements multiple contrast tests for functional data (Munko et al., 2023, <arXiv:2306.15259>
). These procedures enable us to evaluate the overall hypothesis regarding equality, as well as specific hypotheses defined by contrasts. In particular, we can perform post hoc tests to examine particular comparisons of interest. Different experimental designs are supported, e.g., one-way and multi-way analysis of variance for functional data.
This package performs support vectors analysis for data sets with survival outcome. Three approaches are available in the package: The regression approach takes censoring into account when formulating the inequality constraints of the support vector problem. In the ranking approach, the inequality constraints set the objective to maximize the concordance index for comparable pairs of observations. The hybrid approach combines the regression and ranking constraints in the same model.
The cfToolsData
package supplies the data for the cfTools
package. It contains two pre-trained deep neural network (DNN) models for the cfSort
function. Additionally, it includes the shape parameters of beta distribution characterizing methylation markers associated with four tumor types for the CancerDetector
function, as well as the parameters characterizing methylation markers specific to 29 primary human tissue types for the cfDeconvolve
function.
This package contains the data for the paper by L. David et al. in PNAS 2006 (PMID 16569694): 8 CEL files of Affymetrix genechips, an ExpressionSet
object with the raw feature data, a probe annotation data structure for the chip and the yeast genome annotation (GFF file) that was used. In addition, some custom-written analysis functions are provided, as well as R scripts in the scripts directory.
Rank results by confident effect sizes, while maintaining False Discovery Rate and False Coverage-statement Rate control. Topconfects is an alternative presentation of TREAT results with improved usability, eliminating p-values and instead providing confidence bounds. The main application is differential gene expression analysis, providing genes ranked in order of confident log2 fold change, but it can be applied to any collection of effect sizes with associated standard errors.
This package provides tools for working with Type S (Sign) and Type M (Magnitude) errors, as proposed in Gelman and Tuerlinckx (2000) <doi:10.1007/s001800000040> and Gelman & Carlin (2014) <doi:10.1177/1745691614551642>. In addition to simply calculating the probability of Type S/M error, the package includes functions for calculating these errors across a variety of effect sizes for comparison, and recommended sample size given "tolerances" for Type S/M errors. To improve the speed of these calculations, closed forms solutions for the probability of a Type S/M error from Lu, Qiu, and Deng (2018) <doi:10.1111/bmsp.12132> are implemented. As of 1.0.0, this includes support only for simple research designs. See the package vignette for a fuller exposition on how Type S/M errors arise in research, and how to analyze them using the type of design analysis proposed in the above papers.
Tracking accrual in clinical trials is important for trial success. If accrual is too slow, the trial will take too long and be too expensive. If accrual is much faster than expected, time sensitive tasks such as the writing of statistical analysis plans might need to be rushed. accrualPlot
provides functions to aid the tracking of accrual and predict when a trial will reach it's intended sample size.
Israeli baby names provided by Israel's Central Bureau of Statistics. The package contains only names used for at least 5 children in at least one gender and sector ("Jewish", "Muslim", "Christian", "Druze" and "Other"). Data was downloaded from: <https://www.cbs.gov.il/he/publications/LochutTlushim/2020/%D7%A9%D7%9E%D7%95%D7%AA-%D7%A4%D7%A8%D7%98%D7%99%D7%99%D7%9D.xlsx>
.
An implementation of several functions for feature extraction in categorical time series datasets. Specifically, some features related to marginal distributions and serial dependence patterns can be computed. These features can be used to feed clustering and classification algorithms for categorical time series, among others. The package also includes some interesting datasets containing biological sequences. Practitioners from a broad variety of fields could benefit from the general framework provided by ctsfeatures'.
The gasanalyzer R package offers methods for importing, preprocessing, and analyzing data related to photosynthetic characteristics (gas exchange, chlorophyll fluorescence and isotope ratios). It translates variable names into a standard format, and can recalculate derived, physiological quantities using imported or predefined equations. The package also allows users to assess the sensitivity of their results to different assumptions used in the calculations. See also Tholen (2024) <doi:10.1093/aobpla/plae035>.
This package provides matrix Gaussian mixture models, matrix transformation mixture models and their model-based clustering results. The parsimonious models of the mean matrices and variance covariance matrices are implemented with a total of 196 variations. For more information, please check: Xuwen Zhu, Shuchismita Sarkar, and Volodymyr Melnykov (2021), "MatTransMix
: an R package for matrix model-based clustering and parsimonious mixture modeling", <doi:10.1007/s00357-021-09401-9>.
This package provides a latent variable model based on factor analytic and mixture of experts models, designed to infer food intake from multiple biomarkers data. The model is framed within a Bayesian hierarchical framework, which provides flexibility to adapt to different biomarker distributions and facilitates inference on food intake from biomarker data alone, along with the associated uncertainty. Details are in D'Angelo, et al. (2020) <arXiv:2006.02995>
.
Greedy Bayesian algorithm to fit the noisy stochastic block model to an observed sparse graph. Moreover, a graph inference procedure to recover Gaussian Graphical Model (GGM) from real data. This procedure comes with a control of the false discovery rate. The method is described in the article "Enhancing the Power of Gaussian Graphical Model Inference by Modeling the Graph Structure" by Kilian, Rebafka, and Villers (2024) <arXiv:2402.19021>
.
Analyze repertory grids, a qualitative-quantitative data collection technique devised by George A. Kelly in the 1950s. Today, grids are used across various domains ranging from clinical psychology to marketing. The package contains functions to quantitatively analyze and visualize repertory grid data (e.g. Fransella', Bell', & Bannister', 2004, ISBN: 978-0-470-09080-0). The package is part of the The package is part of the <https://openrepgrid.org/> project.
An implementation of the selectboost algorithm (Bertrand et al. 2020, Bioinformatics', <doi:10.1093/bioinformatics/btaa855>), which is a general algorithm that improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. It can either produce a confidence index for variable selection or it can be used in an experimental design planning perspective.
This package provides a probability tree allows to compute probabilities of complex events, such as genotype probabilities in intermediate generations of inbreeding through recurrent self-fertilization (selfing). This package implements functionality to compute probability trees for two- and three-marker genotypes in the F2 to F7 selfing generations. The conditional probabilities are derived automatically and in symbolic form. The package also provides functionality to extract and evaluate the relevant probabilities.
This package implements inverse and augmented inverse probability weighted estimators for common treatment effect parameters at an interim analysis with time-lagged outcome that may not be available for all enrolled subjects. Produces estimators, standard errors, and information that can be used to compute stopping boundaries using software that assumes that the estimators/test statistics have independent increments. Tsiatis, A. A. and Davidian, M., (2022) <doi:10.1002/sim.9580> .
This package contains selected data from two publications, Campbell et al'. (2016) <DOI:10.1080/14486563.2015.1028486> and Pacioni et al'. (2017) <DOI:10.1071/PC17002>. The data is provided both as raw outputs from the population viability analysis software Vortex and packaged as R objects. The R package vortexR
uses the raw data provided here to illustrate its functionality of parsing raw Vortex output into R objects.
An approach to filter out and/or identify phytoplankton cells from all particles measured via flow cytometry pigment and cell complexity information. It does this using a sequence of one-dimensional gates on pre-defined channels measuring certain pigmentation and complexity. The package is especially tuned for cyanobacteria, but will work fine for phytoplankton communities where there is at least one cell characteristic that differentiates every phytoplankton in the community.
This R package helps the user identify k-mers (e.g. di- or tri-nucleotides) present periodically in a set of genomic loci (typically regulatory elements). The functions of this package provide a straightforward approach to find periodic occurrences of k-mers in DNA sequences, such as regulatory elements. It is not aimed at identifying motifs separated by a conserved distance; for this type of analysis, please visit MEME website.