The algorithm of semi-supervised learning is based on finite Gaussian mixture models and includes a mechanism for handling missing data. It aims to fit a g-class Gaussian mixture model using maximum likelihood. The algorithm treats the labels of unclassified features as missing data, building on the framework introduced by Rubin (1976) <doi:10.2307/2335739> for missing data analysis. By taking into account the dependencies in the missing pattern, the algorithm provides more information for determining the optimal classifier, as specified by Bayes rule.
This package provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). It also provides other tools for analysis and graphical display of the models such as robust methods and homogeneity of variance covariance matrices. The related candisc package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.
This package provides efficient implementation of the Isolate-Detect methodology for the consistent estimation of the number and location of multiple change-points in one-dimensional data sequences from the "deterministic + noise" model. For details on the Isolate-Detect methodology, please see Anastasiou and Fryzlewicz (2018) <https://docs.wixstatic.com/ugd/24cdcc_6a0866c574654163b8255e272bc0001b.pdf>. Currently implemented scenarios are: piecewise-constant signal with Gaussian noise, piecewise-constant signal with heavy-tailed noise, continuous piecewise-linear signal with Gaussian noise, continuous piecewise-linear signal with heavy-tailed noise.
Nonparametric Failure Time (NFT) Bayesian Additive Regression Trees (BART): Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees (HBART) and Low Information Omnibus (LIO) Dirichlet Process Mixtures (DPM). An NFT BART model is of the form Y = mu + f(x) + sd(x) E where functions f and sd have BART and HBART priors, respectively, while E is a nonparametric error distribution due to a DPM LIO prior hierarchy. See the following for a complete description of the model at <doi:10.1111/biom.13857>.
This package implements a method that builds the coefficients of a polynomial model that performs almost equivalently as a given neural network (densely connected). This is achieved using Taylor expansion at the activation functions. The obtained polynomial coefficients can be used to explain features (and their interactions) importance in the neural network, therefore working as a tool for interpretability or eXplainable
Artificial Intelligence (XAI). See Morala et al. 2021 <doi:10.1016/j.neunet.2021.04.036>, and 2023 <doi:10.1109/TNNLS.2023.3330328>.
Deduplicates datasets by retaining the most complete and informative records. Identifies duplicated entries based on a specified key column, calculates completeness scores for each row, and compares values within groups. When differences between duplicates exceed a user-defined threshold, records are split into unique IDs; otherwise, they are coalesced into a single, most complete entry. Returns a list containing the original duplicates, the split entries, and the final coalesced dataset. Useful for cleaning survey or administrative data where duplicated IDs may reflect minor data entry inconsistencies.
Estimation of two- and three-way dynamic panel threshold regression models (Di Lascio and Perazzini (2024) <https://repec.unibz.it/bemps104.pdf>; Di Lascio and Perazzini (2022, ISBN:978-88-9193-231-0); Seo and Shin (2016) <doi:10.1016/j.jeconom.2016.03.005>) through the generalized method of moments based on the first difference transformation and the use of instrumental variables. The models can be used to find a change point detection in the time series. In addition, random number generation is also implemented.
Determine sample sizes, draw samples, and conduct data analysis using data frames. It specifically enables you to determine simple random sample sizes, stratified sample sizes, and complex stratified sample sizes using a secondary variable such as population; draw simple random samples and stratified random samples from sampling data frames; determine which observations are missing from a random sample, missing by strata, duplicated within a dataset; and perform data analysis, including proportions, margins of error and upper and lower bounds for simple, stratified and cluster sample designs.
Identifying cell types based on expression profiles is a pillar of single cell analysis. scROSHI
identifies cell types based on expression profiles of single cell analysis by utilizing previously obtained cell type specific gene sets. It takes into account the hierarchical nature of cell type relationship and does not require training or annotated data. A detailed description of the method can be found at: Prummer, Bertolini, Bosshard, Barkmann, Yates, Boeva, The Tumor Profiler Consortium, Stekhoven, and Singer (2022) <doi:10.1101/2022.04.05.487176>.
Calculates a Satorra-Bentler scaled chi-squared difference test between nested models that were estimated using maximum likelihood (ML) with robust standard errors, which cannot be calculated the traditional way. For details see Satorra & Bentler (2001) <doi:10.1007/bf02296192> and Satorra & Bentler (2010) <doi:10.1007/s11336-009-9135-y>. This package may be particularly helpful when used in conjunction with Mplus software, specifically when implementing the complex survey option. In such cases, the model estimator in Mplus defaults to ML with robust standard errors.
This package provides tools to convert from specific formats to more general forms of spatial data. Using tables to store the actual entities present in spatial data provides flexibility, and the functions here deliberately minimize the level of interpretation applied, leaving that for specific applications. Includes support for simple features, round-trip for Spatial classes and long-form tables, analogous to ggplot2::fortify'. There is also a more normal form representation that decomposes simple features and their kin to tables of objects, parts, and unique coordinates.
Computes the t* statistic corresponding to the tau* population coefficient introduced by Bergsma and Dassios (2014) <DOI:10.3150/13-BEJ514> and does so in O(n^2) time following the algorithm of Heller and Heller (2016) <DOI:10.48550/arXiv.1605.08732>
building off of the work of Weihs, Drton, and Leung (2016) <DOI:10.1007/s00180-015-0639-x>. Also allows for independence testing using the asymptotic distribution of t* as described by Nandy, Weihs, and Drton (2016) <DOI:10.1214/16-EJS1166>.
This package provides functions for the selection of thresholds for use in extreme value models, based mainly on the methodology in Northrop, Attalides and Jonathan (2017) <doi:10.1111/rssc.12159>. It also performs predictive inferences about future extreme values, based either on a single threshold or on a weighted average of inferences from multiple thresholds, using the revdbayes package <https://cran.r-project.org/package=revdbayes>. At the moment only the case where the data can be treated as independent identically distributed observations is considered.
Density, distribution function, quantile function, and random generation function, maximum likelihood estimation (MLE), penalized maximum likelihood estimation (PMLE), the quartiles method estimation (QM), and median rank estimation (MEDRANK) for the two-parameter exponential distribution. MLE and PMLE are based on Mengjie Zheng (2013)<https://scse.d.umn.edu/sites/scse.d.umn.edu/files/mengjie-thesis_masters-1.pdf>. QM is based on Entisar Elgmati and Nadia Gregni (2016)<doi:10.5539/ijsp.v5n5p12>. MEDRANK is based on Matthew Reid (2022)<doi:10.5281/ZENODO.3938000>.
This package provides a collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters.
This package provides a scripting and command-line front-end is provided by r (aka littler) as a lightweight binary wrapper around the GNU R language and environment for statistical computing and graphics. While R can be used in batch mode, the r binary adds full support for both shebang-style scripting (i.e. using a hash-mark-exclamation-path expression as the first line in scripts) as well as command-line use in standard pipelines. In other words, r provides the R language without the environment.
The rust libc crate provides all of the definitions necessary to easily interoperate with C code (or "C-like" code) on each of the platforms that Rust supports. This includes type definitions (e.g., c_int), constants (e.g., EINVAL) as well as function headers (e.g., malloc).
This crate exports all underlying platform types, functions, and constants under the crate root, so all items are accessible as libc::foo. The types and values of all the exported APIs match the platform that libc is compiled for.
This package provides a suite of functions for analyzing and visualizing the health economic outputs of mathematical models. This package was developed with funding from the National Institutes of Allergy and Infectious Diseases of the National Institutes of Health under award no. R01AI138783. The content of this package is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The theoretical underpinnings of dampack''s functionality are detailed in Hunink et al. (2014) <doi:10.1017/CBO9781139506779>.
Interval estimation of the population allele frequency from qPCR
analysis based on the restriction enzyme digestion (RED)-DeltaDeltaCq
method (Osakabe et al. 2017, <doi:10.1016/j.pestbp.2017.04.003>), as well as general DeltaDeltaCq
analysis. Compatible with the Cq measurement of DNA extracted from multiple individuals at once, so called "group-testing", this model assumes that the quantity of DNA extracted from an individual organism follows a gamma distribution. Therefore, the point estimate is robust regarding the uncertainty of the DNA yield.
Filling in the missing entries of a partially observed data is one of fundamental problems in various disciplines of mathematical science. For many cases, data at our interests have canonical form of matrix in that the problem is posed upon a matrix with missing values to fill in the entries under preset assumptions and models. We provide a collection of methods from multiple disciplines under Matrix Completion, Imputation, and Inpainting. See Davenport and Romberg (2016) <doi:10.1109/JSTSP.2016.2539100> for an overview of the topic.
This package provides tools are provided to expand vectors of short URLs into long URLs'. No API services are used, which may mean that this operates more slowly than API services do (since they usually cache results of expansions that every user of the service requests). You can setup your own caching layer with the memoise package if you wish to have a speedup during single sessions or add larger dependencies, such as Redis', to gain a longer-term performance boost at the expense of added complexity.
Designed to automate the calculation of Emergency Medical Service (EMS) quality metrics, nemsqar implements measures defined by the National EMS Quality Alliance (NEMSQA). By providing reliable, evidence-based quality assessments, the package supports EMS agencies, healthcare providers, and researchers in evaluating and improving patient outcomes. Users can find details on all approved NEMSQA measures at <https://www.nemsqa.org/measures>. Full technical specifications, including documentation and pseudocode used to develop nemsqar', are available on the NEMSQA website after creating a user profile at <https://www.nemsqa.org>.
This package provides a systematic bioinformatics tool to perform single-sample mutation-based pathway analysis by integrating somatic mutation data with the Protein-Protein Interaction (PPI) network. In this method, we use local and global weighted strategies to evaluate the effects of network genes from mutations according to the network topology and then calculate the mutation-based pathway enrichment score (ssMutPES
) to reflect the accumulated effect of mutations of each pathway. Subsequently, the ssMutPES
profiles are used for unsupervised spectral clustering to identify cancer subtypes.
Fit Thurstonian forced-choice models (CFA (simple and factor) and IRT) in R. This package allows for the analysis of item response modeling (IRT) as well as confirmatory factor analysis (CFA) in the Thurstonian framework. Currently, estimation can be performed by Mplus and lavaan'. References: Brown & Maydeu-Olivares (2011) <doi:10.1177/0013164410375112>; Jansen, M. T., & Schulze, R. (in review). The Thurstonian linked block design: Improving Thurstonian modeling for paired comparison and ranking data.; Maydeu-Olivares & Böckenholt (2005) <doi:10.1037/1082-989X.10.3.285>.