This package provides a modular package for simulating phylogenetic trees and species traits jointly. Trees can be simulated using modular birth-death parameters (e.g. changing starting parameters or algorithm rules). Traits can be simulated in any way designed by the user. The growth of the tree and the traits can influence each other through modifiers objects providing rules for affecting each other. Finally, events can be created to modify both the tree and the traits under specific conditions ( Guillerme, 2024 <DOI:10.1111/2041-210X.14306>).
The CINdex package addresses important area of high-throughput genomic analysis. It allows the automated processing and analysis of the experimental DNA copy number data generated by Affymetrix SNP 6.0 arrays or similar high throughput technologies. It calculates the chromosome instability (CIN) index that allows to quantitatively characterize genome-wide DNA copy number alterations as a measure of chromosomal instability. This package calculates not only overall genomic instability, but also instability in terms of copy number gains and losses separately at the chromosome and cytoband level.
Indole-3-acetaldoxime (IAOx) represents an early intermediate of the biosynthesis of a variety of indolic secondary metabolites including the phytoanticipin indol-3-ylmethyl glucosinolate and the phytoalexin camalexin (3-thiazol-2'-yl-indole). Arabidopsis thaliana cyp79B2 cyp79B3 double knockout plants are completely impaired in the conversion of tryptophan to indole-3-acetaldoxime and do not accumulate IAOx-derived metabolites any longer. Consequently, comparative analysis of wild-type and cyp79B2 cyp79B3 plant lines has the potential to explore the complete range of IAOx-derived indolic secondary metabolites.
This package provides an implementation of multilayered visualizations for enhanced graphical representation of functional analysis data. It combines and integrates omics data derived from expression and functional annotation enrichment analyses. Its plotting functions have been developed with an hierarchical structure in mind: starting from a general overview to identify the most enriched categories (modified bar plot, bubble plot) to a more detailed one displaying different types of relevant information for the molecules in a given set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap).
This package contains miscellaneous functions used to interpret and translate, factorize and negate Sum of Products expressions, for both binary and multi-value crisp sets, and to extract information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly flexible recoding routine and a more flexible alternative to the base function with()
.
This package implements wavelet-based approaches for describing population admixture. Principal Components Analysis (PCA) is used to define the population structure and produce a localized admixture signal for each individual. Wavelet summaries of the PCA output describe variation present in the data and can be related to population-level demographic processes. For more details, see J Sanderson, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481 <doi:10.1534/genetics.115.176842>.
Gibbs sampling for Bayesian spatial blind source separation (BSP-BSS). BSP-BSS is designed for spatially dependent signals in high dimensional and large-scale data, such as neuroimaging. The method assumes the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, and constructs a Bayesian nonparametric prior by thresholding Gaussian processes. Details can be found in our paper: Wu et al. (2022+) "Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process" <doi:10.1080/01621459.2022.2123336>.
Calculates B-value and empirical equivalence bound. B-value is defined as the maximum magnitude of a confidence interval; and the empirical equivalence bound is the minimum B-value at a certain level. A new two-stage procedure for hypothesis testing is proposed, where the first stage is conventional hypothesis testing and the second is an equivalence testing procedure using the introduced empirical equivalence bound. See Zhao et al. (2019) "B-Value and Empirical Equivalence Bound: A New Procedure of Hypothesis Testing" <arXiv:1912.13084>
for details.
Various layers of B.C., including administrative boundaries, natural resource management boundaries, census boundaries etc. All layers are available in BC Albers (<https://spatialreference.org/ref/epsg/3005/>) equal-area projection, which is the B.C. government standard. The layers are sourced from the British Columbia and Canadian government under open licenses, including B.C. Data Catalogue (<https://data.gov.bc.ca>), the Government of Canada Open Data Portal (<https://open.canada.ca/en/using-open-data>), and Statistics Canada (<https://www.statcan.gc.ca/en/reference/licence>).
Calculates equitable overload compensation for college instructors based on institutional policies, enrollment thresholds, and regular teaching load limits. Compensation is awarded only for credit hours that exceed the regular load and meet minimum enrollment criteria. When enrollment is below a specified threshold, pay is prorated accordingly. The package prioritizes compensation from high-enrollment courses, or optionally from low-enrollment courses for fairness, depending on user-defined strategy. Includes tools for flexible policy settings, instructor filtering, and produces clean, audit-ready summary tables suitable for payroll and administrative reporting.
It is designed to streamline the process of calculating complete annual growth rates with user-friendly functions and robust algorithms. It enables researchers and analysts to effortlessly generate precise growth rate estimates for their data. For method details see, Sharma, M.K.(2013) <https://www.indianjournals.com/ijor.aspx?target=ijor:jfl&volume=26&issue=1and2&article=018>. It offers a comprehensive suite of functions and customisable parameters. Equipped to handle varying complexities in data structures. It empowers users to uncover insightful growth dynamics and make informed decisions.
This package performs the drifting Markov models (DMM) which are non-homogeneous Markov models designed for modeling the heterogeneities of sequences in a more flexible way than homogeneous Markov chains or even hidden Markov models. In this context, we developed an R package dedicated to the estimation, simulation and the exact computation of associated reliability of drifting Markov models. The implemented methods are described in Vergne, N. (2008), <doi:10.2202/1544-6115.1326> and Barbu, V.S., Vergne, N. (2019) <doi:10.1007/s11009-018-9682-8> .
Testing for and dating periods of explosive dynamics (exuberance) in time series using the univariate and panel recursive unit root tests proposed by Phillips et al. (2015) <doi:10.1111/iere.12132> and Pavlidis et al. (2016) <doi:10.1007/s11146-015-9531-2>.The recursive least-squares algorithm utilizes the matrix inversion lemma to avoid matrix inversion which results in significant speed improvements. Simulation of a variety of periodically-collapsing bubble processes. Details can be found in Vasilopoulos et al. (2022) <doi:10.18637/jss.v103.i10>.
Facilitates the post-Genome Wide Association Studies (GWAS) and Quantitative Trait Loci (QTL) analysis of identifying candidate genes within user-defined search window, based on the identified Single Nucleotide Polymorphisms (SNPs) as given by Mazumder AK (2024) <doi:10.1038/s41598-024-66903-3>. It supports candidate gene analysis for wheat and rice. Just import your GWAS result as explained in the sample_data file and the function does all the manual search and retrieve candidate genes for you, while exporting the results into ready-to-use output.
Immunotherapy has revolutionized cancer treatment, but predicting patient response remains challenging. Here, we presented Intelligent Predicting Response to cancer Immunotherapy through Systematic Modeling (iPRISM
), a novel network-based model that integrates multiple data types to predict immunotherapy outcomes. It incorporates gene expression, biological functional network, tumor microenvironment characteristics, immune-related pathways, and clinical data to provide a comprehensive view of factors influencing immunotherapy efficacy. By identifying key genetic and immunological factors, it provides an insight for more personalized treatment strategies and combination therapies to overcome resistance mechanisms.
An R interface for the Java Machine Learning for Language Toolkit (mallet) <http://mallet.cs.umass.edu/> to estimate probabilistic topic models, such as Latent Dirichlet Allocation. We can use the R package to read textual data into mallet from R objects, run the Java implementation of mallet directly in R, and extract results as R objects. The Mallet toolkit has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava
package to connect to a JVM.
Support JSON flattening in a long data frame way, where the nesting keys will be stored in the absolute path. It also provides an easy way to summarize the basic description of a JSON list. The idea of mojson is to transform a JSON object in an absolute serialization way, which means the early key-value pairs will appear in the heading rows of the resultant data frame. mojson also provides an alternative way of comparing two different JSON lists, returning the left/inner/right-join style results.
Unit testing for Monte Carlo methods, particularly Markov Chain Monte Carlo (MCMC) methods, are implemented as extensions of the testthat package. The MCMC methods check whether the MCMC chain has the correct invariant distribution. They do not check other properties of successful samplers such as whether the chain can reach all points, i.e. whether is recurrent. The tests require the ability to sample from the prior and to run steps of the MCMC chain. The methodology is described in Gandy and Scott (2020) <arXiv:2001.06465>
.
Power calculations are a critical component of any research study to determine the minimum sample size necessary to detect differences between multiple groups. Here we present an R package, PASSED', that performs power and sample size calculations for the test of two-sample means or ratios with data following beta, gamma (Chang et al. (2011), <doi:10.1007/s00180-010-0209-1>), normal, Poisson (Gu et al. (2008), <doi:10.1002/bimj.200710403>), binomial, geometric, and negative binomial (Zhu and Lakkis (2014), <doi:10.1002/sim.5947>) distributions.
Assessment of the prevalence of plastic debris in bird nests based on bootstrap replicates. The package allows for calculating bootstrapped 95% confidence intervals for the estimated prevalence of debris. Combined with a Bayesian approach, the resampling simulations can be also used to define appropriate sample sizes to detect prevalence of plastics. The method has wide application, and can also be applied to estimate confidence intervals and define sample sizes for the prevalence of plastics ingested by any other organisms. The method is described in Tavares et al. (Submitted).
Estimate Bayesian nested mixture models via Markov Chain Monte Carlo methods. Specifically, the package implements the common atoms model (Denti et al., 2023), its finite version (D'Angelo et al., 2023), and a hybrid finite-infinite model. All models use Gaussian mixtures with a normal-inverse-gamma prior distribution on the parameters. Additional functions are provided to help analyzing the results of the fitting procedure. References: Denti, Camerlenghi, Guindani, Mira (2023) <doi:10.1080/01621459.2021.1933499>, Dâ Angelo, Canale, Yu, Guindani (2023) <doi:10.1111/biom.13626>.
This package provides a collection of functions for preparing data and fitting Bayesian count spatial regression models, with a specific focus on the Gamma-Count (GC) model. The GC model is well-suited for modeling dispersed count data, including under-dispersed or over-dispersed counts, or counts with equivalent dispersion, using Integrated Nested Laplace Approximations (INLA). The package includes functions for generating data from the GC model, as well as spatially correlated versions of the model. See Nadifar, Baghishani, Fallah (2023) <doi:10.1007/s13253-023-00550-5>.
The framework proposed in Jenul et al., (2022) <doi:10.1007/s10994-022-06221-9>, together with an interactive Shiny dashboard. UBayFS
is an ensemble feature selection technique embedded in a Bayesian statistical framework. The method combines data and user knowledge, where the first is extracted via data-driven ensemble feature selection. The user can control the feature selection by assigning prior weights to features and penalizing specific feature combinations. UBayFS
can be used for common feature selection as well as block feature selection.
CHETAH (CHaracterization of cEll
Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq
classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq
dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.