This package implements methods to automate the Auer-Gervini graphical Bayesian approach for determining the number of significant principal components. Automation uses clustering, change points, or simple statistical models to distinguish "long" from "short" steps in a graph showing the posterior number of components as a function of a prior parameter. See <doi:10.1101/237883>.
An R package providing extended biological annotations for the SomaScan Assay, a proteomics platform developed by SomaLogic Operating Co., Inc. The annotations in this package were assembled using data from public repositories. For more information about the SomaScan assay and its data, please reference the SomaLogic/SomaLogic-Data GitHub repository.
This package provides tools for assessing and selecting auxiliary variables using LASSO. The package includes functions for variable selection and diagnostics, facilitating survey calibration analysis with emphasis on robust auxiliary vector selection. For more details see Tibshirani (1996) <doi:10.1111/j.2517-6161.1996.tb02080.x> and Caughrey and Hartman (2017) <doi:10.2139/ssrn.3494436>.
This package implements Bayesian estimation and inference for alpha-mixture survival models, including Weibull and Exponential based components, with tools for simulation and posterior summaries. The methods target applications in reliability and biomedical survival analysis. The package implements Bayesian estimation for the alpha-mixture methodology introduced in Asadi et al. (2019) <doi:10.1017/jpr.2019.72>.
Counts colors within color range(s) in images, and provides a masked version of the image with targeted pixels changed to a different color. Output includes the locations of the pixels in the images, and the proportion of the image within the target color range with optional background masking. Users can specify multiple color ranges for masking.
Simulating bivariate survival data from copula models. Estimation of the association parameter in copula models. Two different ways to estimate the association parameter in copula models are implemented. A goodness-of-fit test for a given copula model is implemented. See Emura, Lin and Wang (2010) <doi:10.1016/j.csda.2010.03.013> for details.
Quick and easy access to datasets that let you replicate the empirical examples in Cameron and Trivedi (2005) "Microeconometrics: Methods and Applications" (ISBN: 9780521848053).The data are available as soon as you install and load the package (lazy-loading) as data frames. The documentation includes reference to chapter sections and page numbers where the datasets are used.
Set of functions for step-wise generation of (weighted) graphs. Aimed for research in the field of single- and multi-objective combinatorial optimization. Graphs are generated adding nodes, edges and weights. Each step may be repeated multiple times with different predefined and custom generators resulting in high flexibility regarding the graph topology and structure of edge weights.
This package provides a sparklyr <https://spark.rstudio.com/> extension that provides an R interface for GraphFrames <https://graphframes.github.io/>. GraphFrames is a package for Apache Spark that provides a DataFrame-based API for working with graphs. Functionality includes motif finding and common graph algorithms, such as PageRank and Breadth-first search.
This package provides a bridge between the loon and ggplot2 packages. Extends the grammar of ggplot to add clauses to create interactive loon plots. Existing ggplot(s) can be turned into interactive loon plots and loon plots into static ggplot(s); the function loon.ggplot() is the bridge from one plot structure to the other.
This package contains functions intended to facilitate the production of plant taxonomic monographs. The package includes functions to convert tables into taxonomic descriptions, lists of collectors, examined specimens, identification keys (dichotomous and interactive), and can generate a monograph skeleton. Additionally, wrapper functions to batch the production of phenology histograms and distributional and diversity maps are also available.
R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
This package provides a collection of various oversampling techniques developed from SMOTE is provided. SMOTE is a oversampling technique which synthesizes a new minority instance between a pair of one minority instance and one of its K nearest neighbor. Other techniques adopt this concept with other criteria in order to generate balanced dataset for class imbalance problem.
Balancing computational and statistical efficiency, subsampling techniques offer a practical solution for handling large-scale data analysis. Subsampling methods enhance statistical modeling for massive datasets by efficiently drawing representative subsamples from full dataset based on tailored sampling probabilities. These probabilities are optimized for specific goals, such as minimizing the variance of coefficient estimates or reducing prediction error.
This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.
Suppose we have data that has so many series that it is hard to identify them by their colors as the differences are so subtle. With gghighlight we can highlight those lines that match certain criteria. The result is a usual ggplot object, so it is fully customizable and can be used with custom themes and facets.
This package performs angle-based outlier detection on a given data frame. It offers three methods to process data:
full but slow implementation using all the data that has cubic complexity;
a fully randomized method;
a method using k-nearest neighbours.
These algorithms are well suited for high dimensional data outlier detection.
Construct an explainable nomogram for a machine learning (ML) model to improve availability of an ML prediction model in addition to a computer application, particularly in a situation where a computer, a mobile phone, an internet connection, or the application accessibility are unreliable. This package enables a nomogram creation for any ML prediction models, which is conventionally limited to only a linear/logistic regression model. This nomogram may indicate the explainability value per feature, e.g., the Shapley additive explanation value, for each individual. However, this package only allows a nomogram creation for a model using categorical without or with single numerical predictors. Detailed methodologies and examples are documented in our vignette, available at <https://htmlpreview.github.io/?https://github.com/herdiantrisufriyana/rmlnomogram/blob/master/doc/ml_nomogram_exemplar.html>.
This package provides tools to process and analyze chest expansion using 3D marker data from motion capture systems. Includes functions for data processing, marker position adjustment, volume calculation using convex hulls, and visualization in 2D and 3D. Barber et al. (1996) <doi:10.1145/235815.235821>. TAMIYA Hiroyuki et al. (2021) <doi:10.1038/s41598-021-01033-8>.
Utility functions to be used to analyse datasets obtained from seed germination/emergence assays. Fits several types of seed germination/emergence models, including those reported in Onofri et al. (2018) "Hydrothermal-time-to-event models for seed germination", European Journal of Agronomy, 101, 129-139 <doi:10.1016/j.eja.2018.08.011>. Contains several datasets for practicing.
Open, read data from and modify Data Packages. Data Packages are an open standard for bundling and describing data sets (<https://datapackage.org>). When data is read from a Data Package care is taken to convert the data as much a possible to R appropriate data types. The package can be extended with plugins for additional data types.
Analyzes group patterns using discourse analysis data with graph theory mathematics. Takes the order of which individuals talk and converts it to a network edge and weight list. Returns the density, centrality, centralization, and subgroup information for each group. Based on the analytical framework laid out in Chai et al. (2019) <doi:10.1187/cbe.18-11-0222>.
Four fertility models are fitted using non-linear least squares. These are the Hadwiger, the Gamma, the Model1 and Model2, following the terminology of the following paper: Peristera P. and Kostaki A. (2007). "Modeling fertility in modern populations". Demographic Research, 16(6): 141--194. <doi:10.4054/DemRes.2007.16.6>. Model based averaging is also supported.
Harmony is a tool using AI which allows you to compare items from questionnaires and identify similar content. You can try Harmony at <https://harmonydata.ac.uk/app/> and you can read our blog at <https://harmonydata.ac.uk/blog/> or at <https://fastdatascience.com/how-does-harmony-work/>. Documentation at <https://harmonydata.ac.uk/harmony-r-released/>.