This package provides functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities (I. Fellegi & A. Sunter (1969) <doi:10.1080/01621459.1969.10501049>, T.N. Herzog, F.J. Scheuren, & W.E. Winkler (2007), "Data Quality and Record Linkage Techniques", ISBN:978-0-387-69502-0), forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage. Focus is on memory, CPU performance and flexibility.
The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The rdmulti package provides tools to analyze RD designs with multiple cutoffs or scores: rdmc() estimates pooled and cutoff specific effects for multi-cutoff designs, rdmcplot() draws RD plots for multi-cutoff designs and rdms() estimates effects in cumulative cutoffs or multi-score designs. See Cattaneo, Titiunik and Vazquez-Bare (2020) <https://rdpackages.github.io/references/Cattaneo-Titiunik-VazquezBare_2020_Stata.pdf> for further methodological details.
This package provides a portable Shiny tool to explore patient-level electronic health record data and perform chart review in a single integrated framework. This tool supports browsing clinical data in many different formats including multiple versions of the OMOP common data model as well as the MIMIC-III data model. In addition, chart review information is captured and stored securely via the Shiny interface in a REDCap (Research Electronic Data Capture) project using the REDCap API. See the ReviewR website for additional information, documentation, and examples.
RTG Tools is a subset of RTG Core that includes several useful utilities for dealing with VCF files and sequence data. Probably the most interesting is the vcfeval command which performs sophisticated comparison of VCF files.
This is a legacy project, do not use it for new projects. Ruby 2.3 and later should make this obsolete. kgio provides non-blocking I/O methods for Ruby without raising exceptions on EAGAIN and EINPROGRESS.
This package provides kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods kernlab includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
Rasterize only specific layers of a ggplot2 plot while simultaneously keeping all labels and text in vector format. This allows users to keep plots within the reasonable size limit without losing vector properties of the scale-sensitive information.
Maximum likelihood computations for Tweedie families, including the series expansion (Dunn and Smyth, 2005; <doi10.1007/s11222-005-4070-y>) and the Fourier inversion (Dunn and Smyth, 2008; <doi:10.1007/s11222-007-9039-6>), and related methods.
This package provides a new class Formula, which extends the base class formula. It supports extended formulas with multiple parts of regressors on the right-hand side and/or multiple responses on the left-hand side.
This package provides a reticulate wrapper for the Python package anndata. It provides a scalable way of keeping track of data and learned annotations. It is used to read from and write to the h5ad file format.
This package provides helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, "anonymizing", and manually "recoding").
Reprotest builds the same source code twice in different environments, and then checks the binaries produced by each build for differences. If any are found, then diffoscope or diff is used to display them in detail for later analysis.
This package provides tools to convert the output of segmentation analysis using DNAcopy to a matrix structure with overlapping segments as rows and samples as columns so that other computational analyses can be applied to segmented data.
miaDash provides a Graphical User Interface for the exploration of microbiome data. This way, no knowledge of programming is required to perform analyses. Datasets can be imported, manipulated, analysed and visualised with a user-friendly interface.
This package provides an R interface for various subsampling algorithms implemented in python packages. Currently, interfaces to the geosketch and scSampler python packages are implemented. In addition it also provides diagnostic plots to evaluate the subsampling.
Decision tree algorithm with a major feature added. Allows for users to define an ordering on the partitioning process. Resulting in Branch-Exclusive Splits Trees (BEST). Cedric Beaulac and Jeffrey S. Rosentahl (2019) <arXiv:1804.10168>.
This package provides tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.
Every research team have their own script for calculation of hemodynamic indexes. This package makes it possible to insert a long-format dataframe, and add both periods of interest (trigger-periods), and delete artifacts with deleter-files.
This package provides methods and utilities for testing, identifying, selecting and mutating objects as categorical or continous types. These functions work on both atomic vectors as well as recursive objects: data.frames, data.tables, tibbles, lists, etc..
Data cleaning functions for classes logical, factor, numeric, character, currency and Date to make data cleaning fast and easy. Relying on very few dependencies, it provides smart guessing, but with user options to override anything if needed.
This package contains the function calendR() for creating fully customizable monthly and yearly calendars (colors, fonts, formats, ...) and even heatmap calendars. In addition, it allows saving the calendars in ready to print A4 format PDF files.
Germline and somatic locus data which contain the total read depth and B allele read depth using Bayesian model (Dirichlet Process) to cluster. Meanwhile, the cluster model can deal with the SNVs mutation and the CNAs mutation.
Compare baseline characteristics between two or more groups. The variables being compared can be factor and numeric variables. The function will automatically judge the type and distribution of the variables, and make statistical description and bivariate analysis.
This package provides a unified interface for simplifying cloud storage interactions, including uploading, downloading, reading, and writing files, with functions for both Google Drive (<https://www.google.com/drive/>) and Amazon S3 (<https://aws.amazon.com/s3/>).