TCGA processed RNA-Seq data for 9264 tumor and 741 normal samples across 24 cancer types and made them available as GEO accession [GSE62944](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62944). GSE62944 data have been parsed into a SummarizedExperiment object available in ExperimentHub.
Useful functions to work with sequence motifs in the analysis of genomics data. These include methods to annotate genomic regions or sequences with predicted motif hits and to identify motifs that drive observed changes in accessibility or expression. Functions to produce informative visualizations of the obtained results are also provided.
RAD is package which defines schemas for the Nancy Grace Roman Space Telescope shared attributes for processing and archive. These schemas are schemas for the ASDF file file format, which are used by ASDF to serialize and deserialize data for the Nancy Grace Roman Space Telescope.
The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters.
This package supports data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.
This package is used for demultiplexing single-cell sequencing experiments of pooled cells. These cells are labeled with barcode oligonucleotides. The package implements methods to fit regression mixture models for a probabilistic classification of cells, including multiplet detection. Demultiplexing error rates can be estimated, and methods for quality control are provided.
In S3 generics, it's useful to take ... so that methods can have additional arguments. But this flexibility comes at a cost: misspelled arguments will be silently ignored. The ellipsis package is an experiment that allows a generic to warn if any arguments passed in ... are not used.
This package provides functions that:
find the minimum/maximum of a linear or quadratic function,
sample an underdetermined or overdetermined system,
solve a linear system Ax=B for the unknown x.
It includes banded and tridiagonal linear systems. The package calls Fortran functions from LINPACK.
Fit the reduced-rank multinomial logistic regression model for Markov chains developed by Wang, Abner, Fardo, Schmitt, Jicha, Eldik and Kryscio (2021)<doi:10.1002/sim.8923> in R. It combines the ideas of multinomial logistic regression in Markov chains and reduced-rank. It is very useful in a study where multi-states model is assumed and each transition among the states is controlled by a series of covariates. The key advantage is to reduce the number of parameters to be estimated. The final coefficients for all the covariates and the p-values for the interested covariates will be reported. The p-values for the whole coefficient matrix can be calculated by two bootstrap methods.
Fits a multivariate value-added model (VAM), see Broatch, Green, and Karl (2018) <doi:10.32614/RJ-2018-033> and Broatch and Lohr (2012) <doi:10.3102/1076998610396900>, with normally distributed test scores and a binary outcome indicator. A pseudo-likelihood approach, Wolfinger (1993) <doi:10.1080/00949659308811554>, is used for the estimation of this joint generalized linear mixed model. The inner loop of the pseudo-likelihood routine (estimation of a linear mixed model) occurs in the framework of the EM algorithm presented by Karl, Yang, and Lohr (2013) <DOI:10.1016/j.csda.2012.10.004>. This material is based upon work supported by the National Science Foundation under grants DRL-1336027 and DRL-1336265.
Implementation of JQuery <https://jquery.com> and CSS styles to allow easy incorporation of various social media elements on a page. The elements include addition of share buttons or connect with us buttons or hyperlink buttons to Shiny applications or dashboards and Rmarkdown documents.Sharing capability on social media platforms including Facebook <https://www.facebook.com>, Linkedin <https://www.linkedin.com>, X/Twitter <https://x.com>, Tumblr <https://www.tumblr.com>, Pinterest <https://www.pinterest.com>, Whatsapp <https://www.whatsapp.com>, Reddit <https://www.reddit.com>, Baidu <https://www.baidu.com>, Blogger <https://www.blogger.com>, Weibo <https://www.weibo.com>, Instagram <https://www.instagram.com>, Telegram <https://www.telegram.me>, Youtube <https://www.youtube.com>.
BEAST2 (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is a command-line tool. This package provides a way to call BEAST2 from an R function call.
Extend lasso and elastic-net model fitting for large data sets that cannot be loaded into memory. Designed to be more memory- and computation-efficient than existing lasso-fitting packages like glmnet and ncvreg', thus allowing the user to analyze big data with limited RAM <doi:10.32614/RJ-2021-001>.
Facilitates the importation of the Boston Blue Bike trip data since 2015. Functions include the computation of trip distances of given trip data. It can also map the location of stations within a given radius and calculate the distance to nearby stations. Data is from <https://www.bluebikes.com/system-data>.
The reliability of clusters is estimated using random projections. A set of stability measures is provided to assess the reliability of the clusters discovered by a generic clustering algorithm. The stability measures are taylored to high dimensional data (e.g. DNA microarray data) (Valentini, G (2005), <doi:10.1093/bioinformatics/bti817>.
Engines for survival models from the parsnip package. These include parametric models (e.g., Jackson (2016) <doi:10.18637/jss.v070.i08>), semi-parametric (e.g., Simon et al (2011) <doi:10.18637/jss.v039.i05>), and tree-based models (e.g., Buehlmann and Hothorn (2007) <doi:10.1214/07-STS242>).
Direction analysis is a set of tools designed to identify combinatorial effects of multiple treatments/conditions on pathways and kinases profiled by microarray, RNA-seq, proteomics, or phosphoproteomics data. See Yang P et al (2014) <doi:10.1093/bioinformatics/btt616>; and Yang P et al. (2016) <doi:10.1002/pmic.201600068>.
Discretely-sampled function is first smoothed. Features of the smoothed function are then extracted. Some of the key features include mean value, first and second derivatives, critical points (i.e. local maxima and minima), curvature of cunction at critical points, wiggliness of the function, noise in data, and outliers in data.
The ggplot2 package provides a powerful set of tools for visualising and investigating data. The ggsoccer package provides a set of functions for elegantly displaying and exploring soccer event data with ggplot2'. Providing extensible layers and themes, it is designed to work smoothly with a variety of popular sports data providers.
This package provides shortcuts in extracting useful data points and summarizing waveform data. It is optimized for speed to work efficiently with large data sets so you can get to the analysis phase more quickly. It also utilizes a user-friendly format for use by both beginners and seasoned R users.
This package contains techniques for mining large and high-dimensional data sets by using the concept of Intrinsic Dimension (ID). Here the ID is not necessarily an integer. It is extended to fractal dimensions. And the Morisita estimator is used for the ID estimation, but other tools are included as well.
This package performs Invariant Coordinate Selection (ICS) (Tyler, Critchley, Duembgen and Oja (2009) <doi:10.1111/j.1467-9868.2009.00706.x>) and especially ICS for multivariate outlier detection with application to quality control (Archimbaud, Nordhausen, Ruiz-Gazen (2018) <doi:10.1016/j.csda.2018.06.011>) using a shiny app.
This package provides a series of statistical and plotting approaches in microbial community ecology based on the R6 class. The classes are designed for data preprocessing, taxa abundance plotting, alpha diversity analysis, beta diversity analysis, differential abundance test, null model analysis, network analysis, machine learning, environmental data analysis and functional analysis.
Administrative Boundaries of Spain at several levels (Autonomous Communities, Provinces, Municipalities) based on the GISCO Eurostat database <https://ec.europa.eu/eurostat/web/gisco> and CartoBase SIANE from Instituto Geografico Nacional <https://www.ign.es/>. It also provides a leaflet plugin and the ability of downloading and processing static tiles.