This package contains procedures for depth-based supervised learning, which are entirely non-parametric, in particular the DDalpha-procedure (Lange, Mosler and Mozharovskyi, 2014). The training data sample is transformed by a statistical depth function to a compact low-dimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included.
This package helps identify mRNAs that are overexpressed in subsets of tumors relative to normal tissue. Ideal inputs would be paired tumor-normal data from the same tissue from many patients (>15 pairs). This unsupervised approach relies on the observation that oncogenes are characteristically overexpressed in only a subset of tumors in the population, and may help identify oncogene candidates purely based on differences in mRNA expression between previously unknown subtypes.
ScreenR is a package suitable to perform hit identification in loss of function High Throughput Biological Screenings performed using barcoded shRNA-based libraries. ScreenR combines the computing power of software such as edgeR with the simplicity of use of the Tidyverse metapackage. ScreenR executes a pipeline able to find candidate hits from barcode counts, and integrates a wide range of visualization modes for each step of the analysis.
Offers functions for plotting split (or implicit) networks (unrooted, undirected) and explicit networks (rooted, directed) with reticulations extending. ggtree and using functions from ape and phangorn'. It extends the ggtree package [@Yu2017] to allow the visualization of phylogenetic networks using the ggplot2 syntax. It offers an alternative to the plot functions already available in ape Paradis and Schliep (2019) <doi:10.1093/bioinformatics/bty633> and phangorn Schliep (2011) <doi:10.1093/bioinformatics/btq706>.
This package provides tools for defining recurrence rules and recurrence sets. Recurrence rules are a programmatic way to define a recurring event, like the first Monday of December. Multiple recurrence rules can be combined into larger recurrence sets. A full holiday and calendar interface is also provided that can generate holidays within a particular year, can detect if a date is a holiday, can respect holiday observance rules, and allows for custom holidays.
As heavy-tailed error distribution and outliers in the response variable widely exist, models which are robust to data contamination are highly demanded. Here, we develop a novel robust Bayesian variable selection method with elastic net penalty. In particular, the spike-and-slab priors have been incorporated to impose sparsity. An efficient Gibbs sampler has been developed to facilitate computation.The core modules of the package have been developed in C++ and R.
Estimate population average treatment effects from a primary data source with borrowing from supplemental sources. Causal estimation is done with either a Bayesian linear model or with Bayesian additive regression trees (BART) to adjust for confounding. Borrowing is done with multisource exchangeability models (MEMs). For information on BART, see Chipman, George, & McCulloch (2010) <doi:10.1214/09-AOAS285>. For information on MEMs, see Kaizer, Koopmeiners, & Hobbs (2018) <doi:10.1093/biostatistics/kxx031>.
This package provides a Bayesian meta-analysis method for studying cross-phenotype genetic associations. It uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. CPBayes is based on a spike and slab prior. The methodology is available from: A Majumdar, T Haldar, S Bhattacharya, JS Witte (2018) <doi:10.1371/journal.pgen.1007139>.
The CalMaTe method calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina.
An open, multi-algorithmic pipeline for easy, fast and efficient analysis of cellular sub-populations and the molecular signatures that characterize them. The pipeline consists of four successive steps: data pre-processing, cellular clustering with pseudo-temporal ordering, defining differential expressed genes and biomarker identification. More details on Ghannoum et. al. (2021) <doi:10.3390/ijms22031399>. This package implements extensions of the work published by Ghannoum et. al. (2019) <doi:10.1101/700989>.
Makes the Genepop software available in R. This software implements a mixture of traditional population genetic methods and some more focused developments: it computes exact tests for Hardy-Weinberg equilibrium, for population differentiation and for genotypic disequilibrium among pairs of loci; it computes estimates of F-statistics, null allele frequencies, allele size-based statistics for microsatellites, etc.; and it performs analyses of isolation by distance from pairwise comparisons of individuals or population samples.
After fitting a Generalized Additive (Mixed) Model, the next step is often to obtain predicted values for certain combinations of predictors for visualization of estimated effects in the model. It involves constructing a new data frame, add predicted values, and finally makes a (contour) plot. This package is intended to facilitate these steps to visualize estimated effects in a generalized additive model. The underlying modeling methodology is described in Wood (2017, ISBN:9781498728331).
Perform high dimensional Feature Selection in the presence of survival outcome. Based on Feature Selection method and different survival analysis, it will obtain the best markers with optimal threshold levels according to their effect on disease progression and produce the most consistent level according to those threshold values. The functions methodology is based on by Sonabend et al (2021) <doi:10.1093/bioinformatics/btab039> and Bhattacharjee et al (2021) <arXiv:2012.02102>.
This package provides a minimal-dependency client for Large Language Model chat APIs. Supports OpenAI <https://openai.com/>, Anthropic Claude <https://claude.com/>, Moonshot Kimi <https://www.moonshot.ai/>, Ollama <https://ollama.com/>, and other OpenAI'-compatible endpoints. Includes an agent loop with tool use and a Model Context Protocol client <https://modelcontextprotocol.io/>. API design is derived from the ellmer package, reimplemented with only base R, curl', and jsonlite'.
This package provides functions for actuarial risk modeling, including survival models, life annuities, multiple-decrement models, and mortality improvement projections. The package is designed to align with standard actuarial notation and supports teaching, exam preparation, and reproducible actuarial analysis. The methods are based on standard actuarial references including Camilli, Duncan and London (2014, ISBN:9781625423474) "Models for Quantifying Risk" and Dickson, Hardy and Waters (2020, ISBN:9781108478083) "Actuarial Mathematics for Life Contingent Risks".
Providing mean partition for ensemble clustering by optimal transport alignment(OTA), uncertainty measures for both partition-wise and cluster-wise assessment and multiple visualization functions to show uncertainty, for instance, membership heat map and plot of covering point set. A partition refers to an overall clustering result. Jia Li, Beomseok Seo, and Lin Lin (2019) <doi:10.1002/sam.11418>. Lixiang Zhang, Lin Lin, and Jia Li (2020) <doi:10.1093/bioinformatics/btaa165>.
An environment to simulate the development of annual plant populations with regard to population dynamics and genetics, especially herbicide resistance. It combines genetics on the individual level (Renton et al. 2011) with a stochastic development on the population level (Daedlow, 2015). Renton, M, Diggle, A, Manalil, S and Powles, S (2011) <doi:10.1016/j.jtbi.2011.05.010> Daedlow, Daniel (2015, doctoral dissertation: University of Rostock, Faculty of Agriculture and Environmental Sciences.).
This package provides functions for estimating ploidy levels and detecting aneuploidy in individuals using allele intensities or allele count data from high-throughput genotyping platforms, including single nucleotide polymorphism (SNP) arrays and sequencing-based technologies. Implements an extended version of the PennCNV signal standardization method by Wang et al. (2007) <doi:10.1101/gr.6861907> for higher ploidy levels. Computes B-allele frequencies (BAF), z-scores, and identifies copy number variation patterns.
This package provides a collection of functions to deal with spatial and spatiotemporal autoregressive conditional heteroscedasticity (spatial ARCH and GARCH models) by Otto, Schmid, Garthoff (2018, Spatial Statistics) <doi:10.1016/j.spasta.2018.07.005>: simulation of spatial ARCH-type processes (spARCH, log/exponential-spARCH, complex-spARCH); quasi-maximum-likelihood estimation of the parameters of spARCH models and spatial autoregressive models with spARCH disturbances, diagnostic checks, visualizations.
This package provides some code to run simulations of state-space models, and then use these in the Approximate Bayesian Computation Sequential Monte Carlo (ABC-SMC) algorithm of Toni et al. (2009) <doi:10.1098/rsif.2008.0172> and a bootstrap particle filter based particle Markov chain Monte Carlo (PMCMC) algorithm (Andrieu et al., 2010 <doi:10.1111/j.1467-9868.2009.00736.x>). Also provides functions to plot and summarise the outputs.
Newly developed methods for the estimation of several probabilities in an illness-death model. The package can be used to obtain nonparametric and semiparametric estimates for: transition probabilities, occupation probabilities, cumulative incidence function and the sojourn time distributions. Additionally, it is possible to fit proportional hazards regression models in each transition of the Illness-Death Model. Several auxiliary functions are also provided which can be used for marginal estimation of the survival functions.
Utility functions for scale-dependent and alternative hyperpriors. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. Hyperpriors for all effects can be elicitated within the package. Including complex tensor product interaction terms and variable selection priors. The basic model is explained in in Klein and Kneib (2016) <doi:10.1214/15-BA983>.
This package provides functions to create and manage research compendiums for data analysis. Research compendiums are a standard and intuitive folder structure for organizing the digital materials of a research project, which can significantly improve reproducibility. The package offers several compendium structure options that fit different research project as well as the ability of duplicating the folder structure of existing projects or implementing custom structures. It also simplifies the use of version control.
Fit, summarize, and predict for a variety of spatial statistical models applied to point-referenced and areal (lattice) data. Parameters are estimated using various methods. Additional modeling features include anisotropy, non-spatial random effects, partition factors, big data approaches, and more. Model-fit statistics are used to summarize, visualize, and compare models. Predictions at unobserved locations are readily obtainable. For additional details, see Dumelle et al. (2023) <doi:10.1371/journal.pone.0282524>.