This is a C++ mutual information (MI) library based on the k-nearest neighbor (KNN) algorithm. There are three functions provided for computing MI for continuous values, mixed continuous and discrete values, and conditional MI for continuous values. They are based on algorithms by A. Kraskov, et. al. (2004) <doi:10.1103/PhysRevE.69.066138>, BC Ross (2014)<doi:10.1371/journal.pone.0087357>, and A. Tsimpiris (2012) <doi:10.1016/j.eswa.2012.05.014>, respectively.
Estimates the sample size needed to detect microbial contamination in a lot with a user-specified detection probability and user-specified analytical sensitivity. Various patterns of microbial contamination are accounted for: homogeneous (Poisson), heterogeneous (Poisson-Gamma) or localized(Zero-inflated Poisson). Ida Jongenburger et al. (2010) <doi:10.1016/j.foodcont.2012.02.004> "Impact of microbial distributions on food safety". Leroy Simon (1963) <doi:10.1017/S0515036100001975> "Casualty Actuarial Society - The Negative Binomial and Poisson Distributions Compared".
We implement a surrogate modeling algorithm to guide simulation-based sample size planning. The method is described in detail in our paper (Zimmer & Debelak (2023) <doi:10.1037/met0000611>). It supports multiple study design parameters and optimization with respect to a cost function. It can find optimal designs that correspond to a desired statistical power or that fulfill a cost constraint. We also provide a tutorial paper (Zimmer et al. (2023) <doi:10.3758/s13428-023-02269-0>).
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The tokenization and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example workflows and information about the utilities offered in the package.
In causal mediation analysis with multiple causally ordered mediators, a set of path-specific effects are identified under standard ignorability assumptions. This package implements an imputation approach to estimating these effects along with a set of bias formulas for conducting sensitivity analysis (Zhou and Yamamoto <doi:10.31235/osf.io/2rx6p>). It contains two main functions: paths() for estimating path-specific effects and sens() for conducting sensitivity analysis. Estimation uncertainty is quantified using the nonparametric bootstrap.
This package provides tools for penalised maximum likelihood estimation of hidden semi-Markov models (HSMMs) with flexible state dwell-time distributions. These include functions for model fitting, model checking and state-decoding. The package considers HSMMs for univariate time series with state-dependent gamma, normal, Poisson or Bernoulli distributions. For details, see Pohle, J., Adam, T. and Beumer, L.T. (2021): Flexible estimation of the state dwell-time distribution in hidden semi-Markov models. <arXiv:2101.09197>.
Computes noncompartmental pharmacokinetic parameters for drug concentration profiles. For each profile, data imputations and adjustments are made as necessary and basic parameters are estimated. Supports single dose, multi-dose, and multi-subject data. Supports steady-state calculations and various routes of drug administration. See ?qpNCA and vignettes. Methodology follows Rowland and Tozer (2011, ISBN:978-0-683-07404-8), Gabrielsson and Weiner (1997, ISBN:978-91-9765-100-4), and Gibaldi and Perrier (1982, ISBN:978-0824710422).
This package provides a system contains easy-to-use tools as a support for time series analysis courses. In particular, it incorporates a technique called Generalized Method of Wavelet Moments (GMWM) as well as its robust implementation for fast and robust parameter estimation of time series models which is described, for example, in Guerrier et al. (2013) <doi: 10.1080/01621459.2013.799920>. More details can also be found in the paper linked to via the URL below.
Holds functions developed by the University of Ottawa's SAiVE (Spatio-temporal Analysis of isotope Variations in the Environment) research group with the intention of facilitating the re-use of code, foster good code writing practices, and to allow others to benefit from the work done by the SAiVE group. Contributions are welcome via the GitHub repository <https://github.com/UO-SAiVE/SAiVE> by group members as well as non-members.
Highest posterior model is widely accepted as a good model among available models. In terms of variable selection highest posterior model is often the true model. Our stochastic search process SAHPM based on simulated annealing maximization method tries to find the highest posterior model by maximizing the model space with respect to the posterior probabilities of the models. This package currently contains the SAHPM method only for linear models. The codes for GLM will be added in future.
Supports the calculation of meteorological characteristics in evapotranspiration research and reference crop evapotranspiration, and offers three models to simulate crop evapotranspiration and soil water balance in the field, including single crop coefficient and dual crop coefficient, as well as the Shuttleworth-Wallace model. These calculations main refer to Allen et al.(1998, ISBN:92-5-104219-5), Teh (2006, ISBN:1-58-112-998-X), and Liu et al.(2006) <doi:10.1016/j.agwat.2006.01.018>.
An extension for NetSurfP-2.0 (Klausen et al. (2019) <doi:10.1002/prot.25674>) which is specifically designed to analyze the results of bottom-up-proteomics that is primarily analyzed with MaxQuant (Cox, J., Mann, M. (2008) <doi:10.1038/nbt.1511>). This tool is designed to process a large number of yeast peptides that produced as a results of whole yeast cell-proteome digestion and provide a coherent picture of secondary structure of proteins.
AS (alternative splicing) is a common mechanism of post-transcriptional gene regulation in eukaryotic organisms that expands the functional and regulatory diversity of a single gene by generating multiple mRNA isoforms that encode structurally and functionally distinct proteins. ASpli is an integrative pipeline and user-friendly R package that facilitates the analysis of changes in both annotated and novel AS events. ASpli integrates several independent signals in order to deal with the complexity that might arise in splicing patterns.
UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
The goal of `tpSVG` is to detect and visualize spatial variation in the gene expression for spatially resolved transcriptomics data analysis. Specifically, `tpSVG` introduces a family of count-based models, with generalizable parametric assumptions such as Poisson distribution or negative binomial distribution. In addition, comparing to currently available count-based model for spatially resolved data analysis, the `tpSVG` models improves computational time, and hence greatly improves the applicability of count-based models in SRT data analysis.
This package provides functions to create simulated time series of environmental exposures (e.g., temperature, air pollution) and health outcomes for use in power analysis and simulation studies in environmental epidemiology. This package also provides functions to evaluate the results of simulation studies based on these simulated time series. This work was supported by a grant from the National Institute of Environmental Health Sciences (R00ES022631) and a fellowship from the Colorado State University Programs for Research and Scholarly Excellence.
This package provides a fold change rank based method is presented to search for genes with changing expression and to detect recurrent chromosomal copy number aberrations. This method may be useful for high-throughput biological data (micro-array, sequencing, ...). Probabilities are associated with genes or probes in the data set and there is no problem of multiple tests when using this method. For array-based comparative genomic hybridization data, segmentation results are obtained by merging the significant probes detected.
This package provides tools for studying genotype-phenotype maps for bi-allelic loci underlying quantitative phenotypes. The 0.1 version is released in connection with the publication of Gjuvsland et al (2013) and implements basic line plots and the monotonicity measures for GP maps presented in the paper. Reference: Gjuvsland AB, Wang Y, Plahte E and Omholt SW (2013) Monotonicity is a key feature of genotype-phenotype maps. Frontier in Genetics 4:216 <doi:10.3389/fgene.2013.00216>.
Fits linear regression, logistic and multinomial regression models, Poisson regression, Cox model via Global Adaptive Generative Adjustment Algorithm. For more detailed information, see Bin Wang, Xiaofei Wang and Jianhua Guo (2022) <arXiv:1911.00658>. This paper provides the theoretical properties of Gaga linear model when the load matrix is orthogonal. Further study is going on for the nonorthogonal cases and generalized linear models. These works are in part supported by the National Natural Foundation of China (No.12171076).
Genomic biology is not limited to the confines of the canonical B-forming DNA duplex, but includes over ten different types of other secondary structures that are collectively termed non-B DNA structures. Of these non-B DNA structures, the G-quadruplexes are highly stable four-stranded structures that are recognized by distinct subsets of nuclear factors. This package provide functions for predicting intramolecular G quadruplexes. In addition, functions for predicting other intramolecular nonB DNA structures are included.
This package provides tools to adjust estimates of learning for guessing-related bias in educational and survey research. Implements standard guessing correction methods and a sophisticated latent class model that leverages informative pre-post test transitions to account for guessing behavior. The package helps researchers obtain more accurate estimates of actual learning when respondents may guess on closed-ended knowledge items. For theoretical background and empirical validation, see Cor and Sood (2018) <https://gsood.com/research/papers/guess.pdf>.
This package implements the multivariate adaptive shrinkage (mash) method of Urbut et al (2019) <DOI:10.1038/s41588-018-0268-8> for estimating and testing large numbers of effects in many conditions (or many outcomes). Mash takes an empirical Bayes approach to testing and effect estimation; it estimates patterns of similarity among conditions, then exploits these patterns to improve accuracy of the effect estimates. The core linear algebra is implemented in C++ for fast model fitting and posterior computation.
This package provides Scilab n1qn1'. This takes more memory than traditional L-BFGS. The n1qn1 routine is useful since it allows prespecification of a Hessian. If the Hessian is near enough the truth in optimization it can speed up the optimization problem. The algorithm is described in the Scilab optimization documentation located at <https://www.scilab.org/sites/default/files/optimization_in_scilab.pdf>. This version uses manually modified code from f2c to make this a C only binary.