TCGA processed RNA-Seq data for 9264 tumor and 741 normal samples across 24 cancer types and made them available as GEO accession [GSE62944](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62944). GSE62944 data have been parsed into a SummarizedExperiment
object available in ExperimentHub
.
Useful functions to work with sequence motifs in the analysis of genomics data. These include methods to annotate genomic regions or sequences with predicted motif hits and to identify motifs that drive observed changes in accessibility or expression. Functions to produce informative visualizations of the obtained results are also provided.
This package installs a self-contained Conda instance that is managed by the R/Bioconductor installation machinery. This aims to provide a consistent Python environment that can be used reliably by Bioconductor packages. Functions are also provided to enable smooth interoperability of multiple Python environments in a single R session.
Genome level Trellis graph visualizes genomic data conditioned by genomic categories (e.g. chromosomes). For each genomic category, multiple dimensional data which are represented as tracks describe different features from different aspects. This package provides high flexibility to arrange genomic categories and to add self-defined graphics in the plot.
This package provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to JavaScript promises, but with a syntax that is idiomatic R.
This package simplifies the creation of Excel .xlsx
files by providing a high level interface to writing, styling and editing worksheets. Through the use of Rcpp, read/write times are comparable to the xlsx
and XLConnect
packages with the added benefit of removing the dependency on Java.
mlr3misc
provides frequently used helper functions and assertions used in mlr3
and its companion packages. It comes with helper functions for functional programming, for printing, to work with data.table
, as well as some generally useful R6
classes. This package also supersedes the package BBmisc
.
Deciding what resolution to use can be a difficult question when approaching a clustering analysis. One way to approach this problem is to look at how samples move as the number of clusters increases. This package allows you to produce clustering trees, a visualization for interrogating clusterings as resolution increases.
Calculates risk differences (or prevalence differences for cross-sectional data) using generalized linear models with automatic link function selection. Provides robust model fitting with fallback methods, support for stratification and adjustment variables, and publication-ready output formatting. Handles model convergence issues gracefully and provides confidence intervals using multiple approaches. Methods are based on approaches described in Mark W. Donoghoe and Ian C. Marschner (2018) "logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model" <doi:10.18637/jss.v086.i09> for robust GLM fitting, and standard epidemiological methods for risk difference estimation as described in Kenneth J. Rothman, Sander Greenland and Timothy L. Lash (2008, ISBN:9780781755641) "Modern Epidemiology".
Model based simulation of dynamic networks under tie-oriented (Butts, C., 2008, <doi:10.1111/j.1467-9531.2008.00203.x>) and actor-oriented (Stadtfeld, C., & Block, P., 2017, <doi:10.15195/v4.a14>) relational event models. Supports simulation from a variety of relational event model extensions, including temporal variability in effects, heterogeneity through dyadic latent class relational event models (DLC-REM), random effects, blockmodels, and memory decay in relational event models (Lakdawala, R., 2024 <doi:10.48550/arXiv.2403.19329>
). The development of this package was supported by a Vidi Grant (452-17-006) awarded by the Netherlands Organization for Scientific Research (NWO) Grant and an ERC Starting Grant (758791).
REDCap Data Management - REDCapDM
is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on the identified queries. REDCap (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data.
PADRINO houses textual representations of Integral Projection Models which can be converted from their table format into full kernels to reproduce or extend an already published analysis. Rpadrino is an R interface to this database. For more information on Integral Projection Models, see Easterling et al. (2000) <doi:10.1890/0012-9658(2000)081[0694:SSSAAN]2.0.CO;2>, Merow et al. (2013) <doi:10.1111/2041-210X.12146>, Rees et al. (2014) <doi:10.1111/1365-2656.12178>, and Metcalf et al. (2015) <doi:10.1111/2041-210X.12405>. See Levin et al. (2021) for more information on ipmr', the engine that powers model reconstruction <doi:10.1111/2041-210X.13683>.
Facilitates the importation of the Boston Blue Bike trip data since 2015. Functions include the computation of trip distances of given trip data. It can also map the location of stations within a given radius and calculate the distance to nearby stations. Data is from <https://www.bluebikes.com/system-data>.
Extend lasso and elastic-net model fitting for large data sets that cannot be loaded into memory. Designed to be more memory- and computation-efficient than existing lasso-fitting packages like glmnet and ncvreg', thus allowing the user to analyze big data with limited RAM <doi:10.32614/RJ-2021-001>.
BEAST2 (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is a command-line tool. This package provides a way to call BEAST2 from an R function call.
The reliability of clusters is estimated using random projections. A set of stability measures is provided to assess the reliability of the clusters discovered by a generic clustering algorithm. The stability measures are taylored to high dimensional data (e.g. DNA microarray data) (Valentini, G (2005), <doi:10.1093/bioinformatics/bti817>.
Engines for survival models from the parsnip package. These include parametric models (e.g., Jackson (2016) <doi:10.18637/jss.v070.i08>), semi-parametric (e.g., Simon et al (2011) <doi:10.18637/jss.v039.i05>), and tree-based models (e.g., Buehlmann and Hothorn (2007) <doi:10.1214/07-STS242>).
Direction analysis is a set of tools designed to identify combinatorial effects of multiple treatments/conditions on pathways and kinases profiled by microarray, RNA-seq, proteomics, or phosphoproteomics data. See Yang P et al (2014) <doi:10.1093/bioinformatics/btt616>; and Yang P et al. (2016) <doi:10.1002/pmic.201600068>.
Discretely-sampled function is first smoothed. Features of the smoothed function are then extracted. Some of the key features include mean value, first and second derivatives, critical points (i.e. local maxima and minima), curvature of cunction at critical points, wiggliness of the function, noise in data, and outliers in data.
The ggplot2 package provides a powerful set of tools for visualising and investigating data. The ggsoccer package provides a set of functions for elegantly displaying and exploring soccer event data with ggplot2'. Providing extensible layers and themes, it is designed to work smoothly with a variety of popular sports data providers.
This package provides shortcuts in extracting useful data points and summarizing waveform data. It is optimized for speed to work efficiently with large data sets so you can get to the analysis phase more quickly. It also utilizes a user-friendly format for use by both beginners and seasoned R users.
This package contains techniques for mining large and high-dimensional data sets by using the concept of Intrinsic Dimension (ID). Here the ID is not necessarily an integer. It is extended to fractal dimensions. And the Morisita estimator is used for the ID estimation, but other tools are included as well.
This package performs Invariant Coordinate Selection (ICS) (Tyler, Critchley, Duembgen and Oja (2009) <doi:10.1111/j.1467-9868.2009.00706.x>) and especially ICS for multivariate outlier detection with application to quality control (Archimbaud, Nordhausen, Ruiz-Gazen (2018) <doi:10.1016/j.csda.2018.06.011>) using a shiny app.
This package provides methods for calculating and testing the significance of pairwise monotonic association from and based on the work of Pimentel (2009) <doi:10.4135/9781412985291.n2>. Computation of association of vectors from one or multiple sets can be performed in parallel thanks to the packages foreach and doMC
'.