Expression levels of mRNA molecules are regulated by different processes, comprising inhibition or activation by transcription factors and post-transcriptional degradation by microRNAs. birta (Bayesian Inference of Regulation of Transcriptional Activity) uses the regulatory networks of transcription factors and miRNAs together with mRNA and miRNA expression data to predict switches in regulatory activity between two conditions. A Bayesian network is used to model the regulatory structure and Markov-Chain-Monte-Carlo is applied to sample the activity states.
This package provides a high-level R interface to data files written using Unidata's netCDF library (version 4 or earlier), which are binary data files that are portable across platforms and include metadata information in addition to the data sets. Using this package, netCDF files can be opened and data sets read in easily. It is also easy to create new netCDF dimensions, variables, and files, in either version 3 or 4 format, and manipulate existing netCDF files.
RocksDB is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database. RocksDB is partially based on LevelDB.
An R package that offers a workflow to predict condition-specific enhancers from ChIP-seq data. The prediction of regulatory units is done in four main steps: Step 1 - the normalization of the ChIP-seq counts. Step 2 - the prediction of active enhancers binwise on the whole genome. Step 3 - the condition-specific clustering of the putative active enhancers. Step 4 - the detection of possible target genes of the condition-specific clusters using RNA-seq counts.
An implementation of the model in Steorts (2015) <DOI:10.1214/15-BA965SI>, which performs Bayesian entity resolution for categorical and text data, for any distance function defined by the user. In addition, the precision and recall are in the package to allow one to compare to any other comparable method such as logistic regression, Bayesian additive regression trees (BART), or random forests. The experiments are reproducible and illustrated using a simple vignette. LICENSE: GPL-3 + file license.
Estimates the Concordance Correlation Coefficient to assess agreement. The scenarios considered are non-repeated measures, non-longitudinal repeated measures (replicates) and longitudinal repeated measures. It also includes the estimation of the one-way intraclass correlation coefficient also known as reliability index. The estimation approaches implemented are variance components and U-statistics approaches. Description of methods can be found in Fleiss (1986) <doi:10.1002/9781118032923> and Carrasco et al. (2013) <doi:10.1016/j.cmpb.2012.09.002>.
Analysis of Fluorescence Recovery After Photobleaching (FRAP) experiments using nonlinear mixed-effects regression models and analysis of the results. FRApp is not limited to the analysis of FRAP experiments only. Any nonlinear mixed-effects models with an asymptotic exponential functional relationship to hierarchical data in various domains can be fitted. The analysis of data available in the package is presented in Di Credico, G., Pelucchi, S., Pauli, F. et al. (2025) <doi:10.1038/s41598-025-87154-w>.
Diagnostic plots for optimisation, with a focus on projection pursuit. These show paths the optimiser takes in the high-dimensional space in multiple ways: by reducing the dimension using principal component analysis, and also using the tour to show the path on the high-dimensional space. Several botanical colour palettes are included, reflecting the name of the package. A paper describing the methodology can be found at <https://journal.r-project.org/articles/RJ-2021-105/index.html>.
Set of functions designed to solve inverse problems. The direct problem is used to calculate a cost function to be minimized. Here are listed some papers using Inverse Problems solvers and sensitivity analysis: (Jader Lugon Jr.; Antonio J. Silva Neto 2011) <doi:10.1590/S1678-58782011000400003>. (Jader Lugon Jr.; Antonio J. Silva Neto; Pedro P.G.W. Rodrigues 2008) <doi:10.1080/17415970802082864>. (Jader Lugon Jr.; Antonio J. Silva Neto; Cesar C. Santana 2008) <doi:10.1080/17415970802082922>.
This is a C++ mutual information (MI) library based on the k-nearest neighbor (KNN) algorithm. There are three functions provided for computing MI for continuous values, mixed continuous and discrete values, and conditional MI for continuous values. They are based on algorithms by A. Kraskov, et. al. (2004) <doi:10.1103/PhysRevE.69.066138>, BC Ross (2014)<doi:10.1371/journal.pone.0087357>, and A. Tsimpiris (2012) <doi:10.1016/j.eswa.2012.05.014>, respectively.
We implement a surrogate modeling algorithm to guide simulation-based sample size planning. The method is described in detail in our paper (Zimmer & Debelak (2023) <doi:10.1037/met0000611>). It supports multiple study design parameters and optimization with respect to a cost function. It can find optimal designs that correspond to a desired statistical power or that fulfill a cost constraint. We also provide a tutorial paper (Zimmer et al. (2023) <doi:10.3758/s13428-023-02269-0>).
Estimates the sample size needed to detect microbial contamination in a lot with a user-specified detection probability and user-specified analytical sensitivity. Various patterns of microbial contamination are accounted for: homogeneous (Poisson), heterogeneous (Poisson-Gamma) or localized(Zero-inflated Poisson). Ida Jongenburger et al. (2010) <doi:10.1016/j.foodcont.2012.02.004> "Impact of microbial distributions on food safety". Leroy Simon (1963) <doi:10.1017/S0515036100001975> "Casualty Actuarial Society - The Negative Binomial and Poisson Distributions Compared".
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The tokenization and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example workflows and information about the utilities offered in the package.
In causal mediation analysis with multiple causally ordered mediators, a set of path-specific effects are identified under standard ignorability assumptions. This package implements an imputation approach to estimating these effects along with a set of bias formulas for conducting sensitivity analysis (Zhou and Yamamoto <doi:10.31235/osf.io/2rx6p>). It contains two main functions: paths() for estimating path-specific effects and sens() for conducting sensitivity analysis. Estimation uncertainty is quantified using the nonparametric bootstrap.
This package provides tools for penalised maximum likelihood estimation of hidden semi-Markov models (HSMMs) with flexible state dwell-time distributions. These include functions for model fitting, model checking and state-decoding. The package considers HSMMs for univariate time series with state-dependent gamma, normal, Poisson or Bernoulli distributions. For details, see Pohle, J., Adam, T. and Beumer, L.T. (2021): Flexible estimation of the state dwell-time distribution in hidden semi-Markov models. <arXiv:2101.09197>.
Computes noncompartmental pharmacokinetic parameters for drug concentration profiles. For each profile, data imputations and adjustments are made as necessary and basic parameters are estimated. Supports single dose, multi-dose, and multi-subject data. Supports steady-state calculations and various routes of drug administration. See ?qpNCA and vignettes. Methodology follows Rowland and Tozer (2011, ISBN:978-0-683-07404-8), Gabrielsson and Weiner (1997, ISBN:978-91-9765-100-4), and Gibaldi and Perrier (1982, ISBN:978-0824710422).
Highest posterior model is widely accepted as a good model among available models. In terms of variable selection highest posterior model is often the true model. Our stochastic search process SAHPM based on simulated annealing maximization method tries to find the highest posterior model by maximizing the model space with respect to the posterior probabilities of the models. This package currently contains the SAHPM method only for linear models. The codes for GLM will be added in future.
This package provides a system contains easy-to-use tools as a support for time series analysis courses. In particular, it incorporates a technique called Generalized Method of Wavelet Moments (GMWM) as well as its robust implementation for fast and robust parameter estimation of time series models which is described, for example, in Guerrier et al. (2013) <doi: 10.1080/01621459.2013.799920>. More details can also be found in the paper linked to via the URL below.
Supports the calculation of meteorological characteristics in evapotranspiration research and reference crop evapotranspiration, and offers three models to simulate crop evapotranspiration and soil water balance in the field, including single crop coefficient and dual crop coefficient, as well as the Shuttleworth-Wallace model. These calculations main refer to Allen et al.(1998, ISBN:92-5-104219-5), Teh (2006, ISBN:1-58-112-998-X), and Liu et al.(2006) <doi:10.1016/j.agwat.2006.01.018>.
Holds functions developed by the University of Ottawa's SAiVE (Spatio-temporal Analysis of isotope Variations in the Environment) research group with the intention of facilitating the re-use of code, foster good code writing practices, and to allow others to benefit from the work done by the SAiVE group. Contributions are welcome via the GitHub repository <https://github.com/UO-SAiVE/SAiVE> by group members as well as non-members.
UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.
AS (alternative splicing) is a common mechanism of post-transcriptional gene regulation in eukaryotic organisms that expands the functional and regulatory diversity of a single gene by generating multiple mRNA isoforms that encode structurally and functionally distinct proteins. ASpli is an integrative pipeline and user-friendly R package that facilitates the analysis of changes in both annotated and novel AS events. ASpli integrates several independent signals in order to deal with the complexity that might arise in splicing patterns.
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
The goal of `tpSVG` is to detect and visualize spatial variation in the gene expression for spatially resolved transcriptomics data analysis. Specifically, `tpSVG` introduces a family of count-based models, with generalizable parametric assumptions such as Poisson distribution or negative binomial distribution. In addition, comparing to currently available count-based model for spatially resolved data analysis, the `tpSVG` models improves computational time, and hence greatly improves the applicability of count-based models in SRT data analysis.