An R package that tests for enrichment and depletion of user-defined pathways using a Fisher's exact test. The method is designed for versatile pathway annotation formats (eg. gmt, txt, xlsx) to allow the user to run pathway analysis on custom annotations. This package is also integrated with Cytoscape to provide network-based pathway visualization that enhances the interpretability of the results.
This package defines interfaces from R to scvi-tools. A vignette works through the totalVI tutorial for analyzing CITE-seq data. Another vignette compares outputs of Chapter 12 of the OSCA book with analogous outputs based on totalVI quantifications. Future work will address other components of scvi-tools, with a focus on building understanding of probabilistic methods based on variational autoencoders.
SimBu can be used to simulate bulk RNA-seq datasets with known cell type fractions. You can either use your own single-cell study for the simulation or the sfaira database. Different pre-defined simulation scenarios exist, as are options to run custom simulations. Additionally, expression values can be adapted by adding an mRNA bias, which produces more biologically relevant simulations.
The TMSig package contains tools to prepare, analyze, and visualize named lists of sets, with an emphasis on molecular signatures (such as gene or kinase sets). It includes fast, memory efficient functions to construct sparse incidence and similarity matrices and filter, cluster, invert, and decompose sets. Additionally, bubble heatmaps can be created to visualize the results of any differential or molecular signatures analysis.
Estimation and interpretation of Bayesian distributed lag interaction models (BDLIMs). A BDLIM regresses a scalar outcome on repeated measures of exposure and allows for modification by a categorical variable under four specific patterns of modification. The main function is bdlim(). There are also summary and plotting files. Details on methodology are described in Wilson et al. (2017) <doi:10.1093/biostatistics/kxx002>.
R functions for "The Basics of Item Response Theory Using R" by Frank B. Baker and Seock-Ho Kim (Springer, 2017, ISBN-13: 978-3-319-54204-1) including iccplot(), icccal(), icc(), iccfit(), groupinv(), tcc(), ability(), tif(), and rasch(). For example, iccplot() plots an item characteristic curve under the two-parameter logistic model.
Arrays of structured data types can require large volumes of disk space to store. Blosc is a library that provides a fast and efficient way to compress such data. It is often applied in storage of n-dimensional arrays, such as in the case of the geo-spatial zarr file format. This package can be used to compress and decompress data using Blosc'.
Different tools for describing and analysing paired comparison data are presented. Main methods are estimation of products scores according Bradley Terry Luce model. A segmentation of the individual could be conducted on the basis of a mixture distribution approach. The number of classes can be tested by the use of Monte Carlo simulations. This package deals also with multi-criteria paired comparison data.
Ecological Metadata Language or EML is a long-established format for describing ecological datasets to facilitate sharing and re-use. Because EML is effectively a modified xml schema, however, it is challenging to write and manipulate for non-expert users. delma supports users to write metadata statements in R Markdown or Quarto markdown format, and parse them to EML and (optionally) back again.
Analysis of items and persons in data. To identify and remove person misfit in polytomous item-response data using either mokken or a graded response model (GRM, via mirt'). Provides automatic thresholds, visual diagnostics (2D/3D), and export utilities. Methods build on Mokken scaling as in Mokken (1971, ISBN:9789027968821) and on the graded response model of Samejima (1969) <doi:10.1007/BF03372160>.
Allows calculation on, and sampling from Gibbs Random Fields, and more precisely general homogeneous Potts model. The primary tool is the exact computation of the intractable normalising constant for small rectangular lattices. Beside the latter function, it contains method that give exact sample from the likelihood for small enough rectangular lattices or approximate sample from the likelihood using MCMC samplers for large lattices.
This package creates and plots 2D and 3D hive plots. Hive plots are a unique method of displaying networks of many types in which node properties are mapped to axes using meaningful properties rather than being arbitrarily positioned. The hive plot concept was invented by Martin Krzywinski at the Genome Science Center (www.hiveplot.net/). Keywords: networks, food webs, linnet, systems biology, bioinformatics.
Simulate expected equilibrium length composition, yield-per-recruit, and the spawning potential ratio (SPR) using the length-based SPR (LBSPR) model. Fit the LBSPR model to length data to estimate selectivity, relative apical fishing mortality, and the spawning potential ratio for data-limited fisheries. See Hordyk et al (2016) <doi:10.1139/cjfas-2015-0422> for more information about the LBSPR assessment method.
K-nearest neighbor search for projected and non-projected sf spatial layers. Nearest neighbor search uses (1) C code from GeographicLib for lon-lat point layers, (2) function knn() from package nabor for projected point layers, or (3) function st_distance() from package sf for line or polygon layers. The package also includes several other utility functions for spatial analysis.
The aim of neo2R is to provide simple and low level connectors for querying neo4j graph databases (<https://neo4j.com/>). The objects returned by the query functions are either lists or data.frames with very few post-processing. It allows fast processing of queries returning many records. And it let the user handle post-processing according to the data model and his needs.
Next-Generation Clustered Heat Maps (NG-CHMs) allow for dynamic exploration of heat map data in a web browser. NGCHM allows users to create both stand-alone HTML files containing a Next-Generation Clustered Heat Map, and .ngchm files to view in the NG-CHM viewer. See Ryan MC, Stucky M, et al (2020) <doi:10.12688/f1000research.20590.2> for more details.
Provide methods for estimating optimal treatment regimes in survival contexts with Kaplan-Meier-like estimators when no unmeasured confounding assumption is satisfied (Jiang, R., Lu, W., Song, R., and Davidian, M. (2017) <doi:10.1111/rssb.12201>) and when no unmeasured confounding assumption fails to hold and a binary instrument is available (Xia, J., Zhan, Z., Zhang, J. (2022) <arXiv:2210.05538>).
Plot malaria parasite genetic data on two or more episodes. Compute per-person posterior probabilities that each Plasmodium vivax (Pv) recurrence is a recrudescence, relapse, or reinfection (3Rs) using per-person P. vivax genetic data on two or more episodes and a statistical model described in Taylor, Foo and White (2022) <doi:10.1101/2022.11.23.22282669>. Plot per-recurrence posterior probabilities.
Village potential statistics (PODES) collects various information on village potential and challenges faced by villages in Indonesia. Information related to village potential includes economy, security, health, employment, communication and information, sports, entertainment, development, community empowerment, education, socio-culture, transportation in the village. Information related to challenges includes natural disasters, public health, environmental pollution, social problems and security disturbances that occur in the village.
This package provides a visual exploration tool for multiple sequence alignment and associated data. Supports MSA of DNA, RNA, and protein sequences using ggplot2'. Multiple sequence alignment can easily be combined with other ggplot2 plots, such as phylogenetic tree Visualized by ggtree', boxplot, genome map and so on. More features: visualization of sequence logos, sequence bundles, RNA secondary structures and detection of sequence recombinations.
All alleles from the IPD IMGT/HLA <https://www.ebi.ac.uk/ipd/imgt/hla/> and IPD KIR <https://www.ebi.ac.uk/ipd/kir/> database for Homo sapiens. Reference: Robinson J, Maccari G, Marsh SGE, Walter L, Blokhuis J, Bimber B, Parham P, De Groot NG, Bontrop RE, Guethlein LA, and Hammond JA KIR Nomenclature in non-human species Immunogenetics (2018), in preparation.
This package provides methods and tools for estimating, simulating and forecasting of so-called BEKK-models (named after Baba, Engle, Kraft and Kroner) based on the fast Berndtâ Hallâ Hallâ Hausman (BHHH) algorithm described in Hafner and Herwartz (2008) <doi:10.1007/s00184-007-0130-y>. For an overview, we refer the reader to Fülle et al. (2024) <doi:10.18637/jss.v111.i04>.
Distributes Gaussian process calculations across nodes in a distributed memory setting, using Rmpi. The bigGP class provides high-level methods for maximum likelihood with normal data, prediction, calculation of uncertainty (i.e., posterior covariance calculations), and simulation of realizations. In addition, bigGP provides an API for basic matrix calculations with distributed covariance matrices, including Cholesky decomposition, back/forwardsolve, crossproduct, and matrix multiplication.
This package provides access to consolidated information from the Brazilian Federal Government Payment Card. Includes functions to retrieve, clean, and organize data directly from the Transparency Portal <https://portaldatransparencia.gov.br/download-de-dados/cpgf/> and a curated dataset hosted on the Open Science Framework <https://osf.io/z2mxc/>. Useful for public spending analysis, transparency research, and reproducible workflows in auditing or investigative journalism.