Fast computation of the required sample size or the achieved power, for GWAS studies with different types of covariate effects and different types of covariate-gene dependency structure. For the detailed description of the methodology, see Zhang (2022) "Power and Sample Size Computation for Genetic Association Studies of Binary Traits: Accounting for Covariate Effects" <arXiv:2203.15641>.
This package provides a facility to generate sliced (orthogonal) Latin hypercube designs with four and five slices. For details about sliced and orthogonal Latin hypercube designs, see Yang, J. F., Lin, C. D., Qian, P. Z., and Lin, D. K. (2013). "Construction of sliced orthogonal Latin hypercube designs". Statistica Sinica, 1117-1130, <doi:10.5705/ss.2012.037>.
Bindings for the Tabula <https://tabula.technology/> Java library, which can extract tables from PDF files. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection\ with a computer mouse for data retrieval.
Create highly customized tables with this simple and dependency-free package. Data frames can be converted to HTML', LaTeX', Markdown', Word', PNG', PDF', or Typst tables. The user interface is minimalist and easy to learn. The syntax is concise. HTML tables can be customized using the flexible Bootstrap framework, and LaTeX code with the tabularray package.
Create, store, read and manage structured collections of datasets and other objects using a workspace', then bundle it into a compressed archive. Using open and interoperable formats makes it possible to exchange bundled data from R to other languages such as Python or Julia'. Multiple formats are supported Parquet', JSON', yaml', spatial data and raster data are supported.
This package provides a set of wrappers intended to check, read and download information from the Wikimedia sources. It is specifically created to work with names of celebrities, in which case their information and statistics can be downloaded. Additionally, it also builds links and snippets to use in combination with the function gallery() in netCoin package.
This package is a parser to import HiC data into R. It accepts several type of data: tabular files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro `.matrix` and `.bed` files. The HiC data can be several files, for several replicates and conditions. The data is formated in an InteractionSet object.
MethylSig is a package for testing for differentially methylated cytosines (DMCs) or regions (DMRs) in whole-genome bisulfite sequencing (WGBS) or reduced representation bisulfite sequencing (RRBS) experiments. MethylSig uses a beta binomial model to test for significant differences between groups of samples. Several options exist for either site-specific or sliding window tests, and variance estimation.
This package contains data required to run examples in prebs package. The data files include: 1) Small sample bam files for demonstration purposes 2) Probe sequence mappings for Custom CDF (taken from http://brainarray.mbni.med.umich.edu/brainarray/Database/CustomCDF/genomic_curated_CDF.asp) 3) Probe sequence mappings for manufacturer's CDF (manually created using bowtie).
This package implements several functions useful for analysis of gene expression data by sequencing tags as done in SAGE (Serial Analysis of Gene Expressen) data, i.e. extraction of a SAGE library from sequence files, sequence error correction, library comparison. Sequencing error correction is implementing using an Expectation Maximization Algorithm based on a Mixture Model of tag counts.
The package AlphaBeta is a computational method for estimating epimutation rates and spectra from high-throughput DNA methylation data in plants. The method has been specifically designed to:
analyze germline epimutations in the context of multi-generational mutation accumulation lines;
analyze somatic epimutations in the context of plant development and aging.
This package provides a suite of helper functions for checking and manipulating TCGA data including data obtained from the curatedTCGAData experiment package. These functions aim to simplify and make working with TCGA data more manageable. Exported functions include those that import data from flat files into Bioconductor objects, convert row annotations, and identifier translation via the GDC API.
This package provides an R interface to the nanoarrow C library and the Apache Arrow application binary interface. Functions to import and export ArrowArray, ArrowSchema, and ArrowArrayStream C structures to and from R objects are provided alongside helpers to facilitate zero-copy data transfer among R bindings to libraries implementing the Arrow C data interface.
Application of reinsurance treaties to claims portfolios. The package creates a class Claims whose objective is to store claims and premiums, on which different treaties can be applied. A statistical analysis can then be applied to measure the impact of reinsurance, producing a table or graphical output. This package can be used for estimating the impact of reinsurance on several portfolios or for pricing treaties through statistical analysis. Documentation for the implemented methods can be found in "Reinsurance: Actuarial and Statistical Aspects" by Hansjöerg Albrecher, Jan Beirlant, Jozef L. Teugels (2017, ISBN: 978-0-470-77268-3) and "REINSURANCE: A Basic Guide to Facultative and Treaty Reinsurance" by Munich Re (2010) <https://www.munichre.com/site/mram/get/documents_E96160999/mram/assetpool.mr_america/PDFs/3_Publications/reinsurance_basic_guide.pdf>.
The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>).
This package performs simple correspondence analysis on a two-way contingency table, or multiple correspondence analysis (homogeneity analysis) on data with p categorical variables, and produces bootstrap-based elliptical confidence regions around the projected coordinates for the category points. Includes routines to plot the results in a variety of styles. Also reports the standard numerical output for correspondence analysis.
Given the non-negative data and its distribution, the package estimates the rank parameter for Non-negative Matrix Factorization. The method is based on hypothesis testing, using a deconvolved bootstrap distribution to assess the significance level accurately despite the large amount of optimization error. The distribution of the non-negative data can be either Normal distributed or Poisson distributed.
This package provides an interactive viewer for data.frame and tibble objects using shiny <https://shiny.posit.co/> and DT <https://rstudio.github.io/DT/>. It supports complex filtering, column selection, and automatic generation of reproducible dplyr <https://dplyr.tidyverse.org/> code for data manipulation. The package is designed for ease of use in data exploration and reporting workflows.
Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <doi:10.1017/pan.2025.10016>.
Spatio-temporal causal inference based on point process data. You provide the raw data of locations and timings of treatment and outcome events, specify counterfactual scenarios, and the package estimates causal effects over specified spatial and temporal windows. See Papadogeorgou, et al. (2022) <doi:10.1111/rssb.12548> and Mukaigawara, et al. (2024) <doi:10.31219/osf.io/5kc6f>.
An implementation of the nonnegative garrote method that incorporates hierarchical relationships among variables. The core function, HiGarrote(), offers an automated approach for analyzing experiments while respecting hierarchical structures among effects. For methodological details, refer to Yu and Joseph (2025) <doi:10.1080/00224065.2025.2513508>. This work is supported by U.S. National Science Foundation grant DMS-2310637.
Based on large margin principle, this package performs feature selection methods: "IM4E"(Iterative Margin-Maximization under Max-Min Entropy Algorithm); "Immigrate"(Iterative Max-Min Entropy Margin-Maximization with Interaction Terms Algorithm); "BIM"(Boosted version of IMMIGRATE algorithm); "Simba"(Iterative Search Margin Based Algorithm); "LFE"(Local Feature Extraction Algorithm). This package also performs prediction for the above feature selection methods.
Pipeline for Genome-Wide Association Study using Multi-Locus Mixed Model from Segura V, Vilhjálmsson BJ et al. (2012) <doi:10.1038/ng.2314>. The pipeline include detection of associated SNPs with MLMM, model selection by lowest eBIC and p-value threshold, estimation of the effects of the SNPs in the selected model and graphical functions.
Fit Maximum Entropy Optimality Theory models to data sets, generate the predictions made by such models for novel data, and compare the fit of different models using a variety of metrics. The package is described in Mayer, C., Tan, A., Zuraw, K. (in press) <https://sites.socsci.uci.edu/~cjmayer/papers/cmayer_et_al_maxent_ot_accepted.pdf>.