saseR is a highly performant and fast framework for aberrant expression and splicing analyses. The main functions are: \itemize\item \code\linkBamtoAspliCounts - Process BAM files to ASpli counts \item \code\linkconvertASpli - Get gene, bin or junction counts from ASpli SummarizedExperiment \item \code\linkcalculateOffsets - Create an offsets assays for aberrant expression or splicing analysis \item \code\linksaseRfindEncodingDim - Estimate the optimal number of latent factors to include when estimating the mean expression \item \code\linksaseRfit - Parameter estimation of the negative binomial distribution and compute p-values for aberrant expression and splicing For information upon how to use these functions, check out our vignette at \urlhttps://github.com/statOmics/saseR/blob/main/vignettes/Vignette.Rmd and the saseR paper: Segers, A. et al. (2023). Juggling offsets unlocks RNA-seq tools for fast scalable differential usage, aberrant splicing and expression analyses. bioRxiv. \urlhttps://doi.org/10.1101/2023.06.29.547014.
Aster models (Geyer, Wagenius, and Shaw, 2007, <doi:10.1093/biomet/asm030>; Shaw, Geyer, Wagenius, Hangelbroek, and Etterson, 2008, <doi:10.1086/588063>; Geyer, Ridley, Latta, Etterson, and Shaw, 2013, <doi:10.1214/13-AOAS653>) are exponential family regression models for life history analysis. They are like generalized linear models except that elements of the response vector can have different families (e.2g., some Bernoulli, some Poisson, some zero-truncated Poisson, some normal) and can be dependent, the dependence indicated by a graphical structure. Discrete time survival analysis, life table analysis, zero-inflated Poisson regression, and generalized linear models that are exponential family (e.g., logistic regression and Poisson regression with log link) are special cases. Main use is for data in which there is survival over discrete time periods and there is additional data about what happens conditional on survival (e.g., number of offspring). Uses the exponential family canonical parameterization (aster transform of usual parameterization). There are also random effects versions of these models.
NuPoP is an R package for Nucleosome Positioning Prediction.This package is built upon a duration hidden Markov model proposed in Xi et al, 2010; Wang et al, 2008. The core of the package was written in Fotran. In addition to the R package, a stand-alone Fortran software tool is also available at https://github.com/jipingw. The Fortran codes have complete functonality as the R package. Note: NuPoP has two separate functions for prediction of nucleosome positioning, one for MNase-map trained models and the other for chemical map-trained models. The latter was implemented for four species including yeast, S.pombe, mouse and human, trained based on our recent publications. We noticed there is another package nuCpos by another group for prediction of nucleosome positioning trained with chemicals. A report to compare recent versions of NuPoP with nuCpos can be found at https://github.com/jiping/NuPoP_doc. Some more information can be found and will be posted at https://github.com/jipingw/NuPoP.
Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.
While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint to a functional association. We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast.
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
This package selects genes associated with survival.
RubyRC4 is a pure Ruby implementation of the RC4 algorithm.
rGADEM is an efficient de novo motif discovery tool for large-scale genomic sequence data.
RJB is a bridge program that connects Ruby and Java via the Java Native Interface.
This package provides a web interface to compute transcriptional regulatory modules with rTRM.
None.
RDF.rb is a pure-Ruby library for working with Resource Description Framework (RDF) data.
This package is used for the analysis of long-range chromatin interactions from 3C-seq assay.
This package provides an iteration of the DEoptim function. It performs global optimization by differential evolution.
Headers and some wrapper functions from the SeqAn C++ library for ease of usage in R.
This package provides fast machine learning algorithms including matrix factorization and divisive clustering for large sparse and dense matrices.
This package implements the QUBIC algorithm introduced by Li et al. for the qualitative biclustering with gene expression data.
"Spaced Words Projection (SWeeP)" is a method for representing biological sequences using vectors preserving inter-sequence comparability.
The Rsolnp package implements a general non-linear augmented Lagrange multiplier method solver, a sequential quadratic programming (SQP) based solver).
Ghidra decompiler for Rizin.
This package provides R implementations of generalized survival models (GSMs), smooth accelerated failure time (AFT) models and Markov multi-state models.
This package makes it easy to use React in R with htmlwidget scaffolds, helper dependency functions, an embedded Babel transpiler, and examples.
Find and replace text in source files