Twelve confidence intervals for one binomial proportion or a vector of binomial proportions are computed. The confidence intervals are: Jeffreys, Wald, Wald corrected, Wald, Blyth and Still, Agresti and Coull, Wilson, Score, Score corrected, Wald logit, Wald logit corrected, Arcsine and Exact binomial. References include, among others: Vollset, S. E. (1993). "Confidence intervals for a binomial proportion". Statistics in Medicine, 12(9): 809-824. <doi:10.1002/sim.4780120902>.
This package provides a first-principle, phylogeny-aware comparative genomics tool for investigating associations between terms used to annotate genomic components (e.g., Pfam IDs, Gene Ontology terms,) with quantitative or rank variables such as number of cell types, genome size, or density of specific genomic elements. See the project website for more information, documentation and examples, and <doi:10.1016/j.patter.2023.100728> for the full paper.
An intuitive, cross-platform graphical data analysis system. It uses menus and dialogs to guide the user efficiently through the data manipulation and analysis process, and has an excel like spreadsheet for easy data frame visualization and editing. Deducer works best when used with the Java based R GUI JGR, but the dialogs can be called from the command line. Dialogs have also been integrated into the Windows Rgui.
Spatial analyses involving binning require that every bin have the same area, but this is impossible using a rectangular grid laid over the Earth or over any projection of the Earth. Discrete global grids use hexagons, triangles, and diamonds to overcome this issue, overlaying the Earth with equally-sized bins. This package provides utilities for working with discrete global grids, along with utilities to aid in plotting such data.
An implementation of 14 parsimonious mixture models for model-based clustering or model-based classification. Gaussian, Student's t, generalized hyperbolic, variance-gamma or skew-t mixtures are available. All approaches work with missing data. Celeux and Govaert (1995) <doi:10.1016/0031-3203(94)00125-6>, Browne and McNicholas
(2014) <doi:10.1007/s11634-013-0139-1>, Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>.
This package provides a collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the data.table package for optimal speed and memory efficiency. Highlights include a versatile bin_data()
function, sparsify()
for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf()
for calculating empirical Multivariate Cumulative Distribution Functions.
This package provides tools to help convert credit risk data at two timepoints into traditional credit state migration (aka, "transition") matrices. At a higher level, migrate is intended to help an analyst understand how risk moved in their credit portfolio over a time interval. References to this methodology include: 1. Schuermann, T. (2008) <doi:10.1002/9780470061596.risk0409>. 2. Perederiy, V. (2017) <doi:10.48550/arXiv.1708.00062>
.
Simulate DNA sequences for the node substitution model. In the node substitution model, substitutions accumulate additionally during a speciation event, providing a potential mechanistic explanation for substitution rate variation. This package provides tools to simulate such a process, simulate a reference process with only substitutions along the branches, and provides tools to infer phylogenies from alignments. More information can be found in Janzen (2021) <doi:10.1093/sysbio/syab085>.
Format numbers and plots for publication; includes the removal of leading zeros, standardization of number of digits, addition of affixes, and a p-value formatter. These tools combine the functionality of several base functions such as paste()
', format()
', and sprintf()
into specific use case functions that are named in a way that is consistent with usage, making their names easy to remember and easy to deploy.
Likelihood based optimal partitioning and indicator species analysis. Finding the best binary partition for each species based on model selection, with the possibility to take into account modifying/confounding variables as described in Kemencei et al. (2014) <doi:10.1556/ComEc.15.2014.2.6>
. The package implements binary and multi-level response models, various measures of uncertainty, Lorenz-curve based thresholding, with native support for parallel computations.
Generates multivariate data with count and continuous variables with a pre-specified correlation matrix. The count and continuous variables are assumed to have Poisson and normal marginals, respectively. The data generation mechanism is a combination of the normal to anything principle and a connection between Poisson and normal correlations in the mixture. The details of the method are explained in Yahav et al. (2012) <DOI:10.1002/asmb.901>.
Hexadecimal codes are typically used to represent colors in R. Connecting these codes to their colors requires practice or memorization. palette provides a vctrs class for working with color palettes, including printing and plotting functions. The goal of the class is to place visual representations of color palettes directly on or, at least, next to their corresponding character representations. Palette extensions also are provided for data frames using pillar'.
Computes nonparametric p-values for the potential class memberships of new observations as well as cross-validated p-values for the training data. The p-values are based on permutation tests applied to an estimated Bayesian likelihood ratio, using a plug-in statistic for the Gaussian model, k nearest neighbors', weighted nearest neighbors or penalized logistic regression'. Additionally, it provides graphical displays and quantitative analyses of the p-values.
Complex graphical representations of data are best explored using interactive elements. parcats adds interactive graphing capabilities to the easyalluvial package. The plotly.js parallel categories diagrams offer a good framework for creating interactive flow graphs that allow manual drag and drop sorting of dimensions and categories, highlighting single flows and displaying mouse over information. The plotly.js dependency is quite heavy and therefore is outsourced into a separate package.
For a single, known pathogen phylogeny, provides functions for enumeration of the set of compatible epidemic transmission trees, and for uniform sampling from that set. Optional arguments allow for incomplete sampling with a known number of missing individuals, multiple sampling, and known infection time limits. Always assumed are a complete transmission bottleneck and no superinfection or reinfection. See Hall and Colijn (2019) <doi:10.1093/molbev/msz058> for methodology.
Different multiple testing procedures for correlation tests are implemented. These procedures were shown to theoretically control asymptotically the Family Wise Error Rate (Roux (2018) <https://tel.archives-ouvertes.fr/tel-01971574v1>) or the False Discovery Rate (Cai & Liu (2016) <doi:10.1080/01621459.2014.999157>). The package gather four test statistics used in correlation testing, four FWER procedures with either single step or stepdown versions, and four FDR procedures.
martini deals with the low power inherent to GWAS studies by using prior knowledge represented as a network. SNPs are the vertices of the network, and the edges represent biological relationships between them (genomic adjacency, belonging to the same gene, physical interaction between protein products). The network is scanned using SConES
, which looks for groups of SNPs maximally associated with the phenotype, that form a close subnetwork.
Identifies motifs that are significantly co-enriched from enhancer-promoter interaction data. While enhancer-promoter annotation is commonly used to define groups of interaction anchors, spatzie also supports co-enrichment analysis between preprocessed interaction anchors. Supports BEDPE interaction data derived from genome-wide assays such as HiC
, ChIA-PET
, and HiChIP
. Can also be used to look for differentially enriched motif pairs between two interaction experiments.
This package provides functions for handling and analyzing immune receptor repertoire data, such as produced by the CellRanger
V(D)J pipeline. This includes reading the data into R, merging it with paired single-cell data, quantifying clonotype abundances, calculating diversity metrics, and producing common plots. It implements the E-M Algorithm for clonotype assignment, along with other methods, which makes use of ambiguous cells for improved quantification.
Permite obtener rápidamente una serie de medidas de resumen y gráficos para datos numéricos discretos o continuos en series simples. También permite obtener tablas de frecuencia clásicas y gráficos cuando se desea realizar un análisis de series agrupadas. Su objetivo es de aplicación didáctica para un curso introductorio de Bioestadà stica utilizando el software R, para las carreras de grado las carreras de grado y otras ofertas educativas de la Facultad de Ciencias Agrarias de la UNJu / It generates summary measures and graphs for discrete or continuous numerical data in simple series. It also enables the creation of classic frequency tables and graphs when analyzing grouped series. Its purpose is for educational application in an introductory Biostatistics course using the R software, aimed at undergraduate programs and other educational offerings of the Faculty of Agricultural Sciences at the National University of Jujuy (UNJu).
This package provides a fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.
This package provides functions for downloading data from the Bank for International Settlements (BIS; <https://www.bis.org/>) in Basel. Supported are only full datasets in (typically) CSV format. The package is lightweight and without dependencies; suggested packages are used only if data is to be transformed into particular data structures, for instance into zoo objects. Downloaded data can optionally be cached, to avoid repeated downloads of the same files.
This package provides a flexible tool for calculating carbon-equivalent emissions. Mostly using data from the UK Government's Greenhouse Gas Conversion Factors report <https://www.gov.uk/government/publications/greenhouse-gas-reporting-conversion-factors-2023>, it facilitates transparent emissions calculations for various sectors, including travel, accommodation, and clinical activities. The package is designed for easy integration into R workflows, with additional support for shiny applications and community-driven extensions.
This package implements a specific form of segmented linear regression with two independent variables. The visualization of that function looks like a quarter segment of a cowbell giving the package its name. The package has been specifically constructed for the case where minimum and maximum value of the dependent and two independent variables are known a prior, which is usually the case when those values are derived from Likert scales.