This package implements wavelet-based approaches for describing population admixture. Principal Components Analysis (PCA) is used to define the population structure and produce a localized admixture signal for each individual. Wavelet summaries of the PCA output describe variation present in the data and can be related to population-level demographic processes. For more details, see J Sanderson, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481 <doi:10.1534/genetics.115.176842>.
Calculates B-value and empirical equivalence bound. B-value is defined as the maximum magnitude of a confidence interval; and the empirical equivalence bound is the minimum B-value at a certain level. A new two-stage procedure for hypothesis testing is proposed, where the first stage is conventional hypothesis testing and the second is an equivalence testing procedure using the introduced empirical equivalence bound. See Zhao et al. (2019) "B-Value and Empirical Equivalence Bound: A New Procedure of Hypothesis Testing" <arXiv:1912.13084> for details.
Time series analysis, (dis)aggregation and manipulation, e.g. time series extension, merge, projection, lag, lead, delta, moving and cumulative average and product, selection by index, date and year-period, conversion to daily, monthly, quarterly, (semi)annually. Simultaneous equation models definition, estimation, simulation and forecasting with coefficient restrictions, error autocorrelation, exogenization, add-factors, impact and interim multipliers analysis, conditional equation evaluation, rational expectations, endogenous targeting and model renormalization, structural stability, stochastic simulation and forecast, optimal control, by A. Luciani (2022) <doi:10.13140/RG.2.2.31160.83202>.
Various layers of B.C., including administrative boundaries, natural resource management boundaries, census boundaries etc. All layers are available in BC Albers (<https://spatialreference.org/ref/epsg/3005/>) equal-area projection, which is the B.C. government standard. The layers are sourced from the British Columbia and Canadian government under open licenses, including B.C. Data Catalogue (<https://data.gov.bc.ca>), the Government of Canada Open Data Portal (<https://open.canada.ca/en/using-open-data>), and Statistics Canada (<https://www.statcan.gc.ca/en/reference/licence>).
Calculates equitable overload compensation for college instructors based on institutional policies, enrollment thresholds, and regular teaching load limits. Compensation is awarded only for credit hours that exceed the regular load and meet minimum enrollment criteria. When enrollment is below a specified threshold, pay is prorated accordingly. The package prioritizes compensation from high-enrollment courses, or optionally from low-enrollment courses for fairness, depending on user-defined strategy. Includes tools for flexible policy settings, instructor filtering, and produces clean, audit-ready summary tables suitable for payroll and administrative reporting.
It is designed to streamline the process of calculating complete annual growth rates with user-friendly functions and robust algorithms. It enables researchers and analysts to effortlessly generate precise growth rate estimates for their data. For method details see, Sharma, M.K.(2013) <https://www.indianjournals.com/ijor.aspx?target=ijor:jfl&volume=26&issue=1and2&article=018>. It offers a comprehensive suite of functions and customisable parameters. Equipped to handle varying complexities in data structures. It empowers users to uncover insightful growth dynamics and make informed decisions.
This package performs the drifting Markov models (DMM) which are non-homogeneous Markov models designed for modeling the heterogeneities of sequences in a more flexible way than homogeneous Markov chains or even hidden Markov models. In this context, we developed an R package dedicated to the estimation, simulation and the exact computation of associated reliability of drifting Markov models. The implemented methods are described in Vergne, N. (2008), <doi:10.2202/1544-6115.1326> and Barbu, V.S., Vergne, N. (2019) <doi:10.1007/s11009-018-9682-8> .
Testing for and dating periods of explosive dynamics (exuberance) in time series using the univariate and panel recursive unit root tests proposed by Phillips et al. (2015) <doi:10.1111/iere.12132> and Pavlidis et al. (2016) <doi:10.1007/s11146-015-9531-2>.The recursive least-squares algorithm utilizes the matrix inversion lemma to avoid matrix inversion which results in significant speed improvements. Simulation of a variety of periodically-collapsing bubble processes. Details can be found in Vasilopoulos et al. (2022) <doi:10.18637/jss.v103.i10>.
This package provides a guarded resampling workflow for training and evaluating machine-learning models. When the guarded resampling path is used, preprocessing and model fitting are re-estimated within each resampling split to reduce leakage risk. Supports multiple resampling schemes, integrates with established engines in the tidymodels ecosystem, and aims to improve evaluation reliability by coordinating preprocessing, fitting, and evaluation within supported workflows. Offers a lightweight AutoML-style workflow by automating model training, resampling, and tuning across multiple algorithms, while keeping evaluation design explicit and user-controlled.
Facilitates the post-Genome Wide Association Studies (GWAS) and Quantitative Trait Loci (QTL) analysis of identifying candidate genes within user-defined search window, based on the identified Single Nucleotide Polymorphisms (SNPs) as given by Mazumder AK (2024) <doi:10.1038/s41598-024-66903-3>. It supports candidate gene analysis for wheat and rice. Just import your GWAS result as explained in the sample_data file and the function does all the manual search and retrieve candidate genes for you, while exporting the results into ready-to-use output.
This is a companion to Henry-Stewart talk by Zhao (2026, <doi:10.69645/FRFQ9519>), which gathers information, metadata and scripts to showcase modern genetic analysis -- ranging from testing of polymorphic variant(s) for Hardy-Weinberg equilibrium, association with traits using genetic and statistical models, Bayesian implementation, power calculation in study design, and genetic annotation. It also covers R integration with the Linux environment, GitHub, package creation and web applications. The earlier version by Zhao (2009, <doi:10.69645/DCRY5578>) provides an brief introduction to these topics.
Immunotherapy has revolutionized cancer treatment, but predicting patient response remains challenging. Here, we presented Intelligent Predicting Response to cancer Immunotherapy through Systematic Modeling (iPRISM), a novel network-based model that integrates multiple data types to predict immunotherapy outcomes. It incorporates gene expression, biological functional network, tumor microenvironment characteristics, immune-related pathways, and clinical data to provide a comprehensive view of factors influencing immunotherapy efficacy. By identifying key genetic and immunological factors, it provides an insight for more personalized treatment strategies and combination therapies to overcome resistance mechanisms.
An R interface for the Java Machine Learning for Language Toolkit (mallet) <http://mallet.cs.umass.edu/> to estimate probabilistic topic models, such as Latent Dirichlet Allocation. We can use the R package to read textual data into mallet from R objects, run the Java implementation of mallet directly in R, and extract results as R objects. The Mallet toolkit has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava package to connect to a JVM.
Unit testing for Monte Carlo methods, particularly Markov Chain Monte Carlo (MCMC) methods, are implemented as extensions of the testthat package. The MCMC methods check whether the MCMC chain has the correct invariant distribution. They do not check other properties of successful samplers such as whether the chain can reach all points, i.e. whether is recurrent. The tests require the ability to sample from the prior and to run steps of the MCMC chain. The methodology is described in Gandy and Scott (2020) <arXiv:2001.06465>.
Support JSON flattening in a long data frame way, where the nesting keys will be stored in the absolute path. It also provides an easy way to summarize the basic description of a JSON list. The idea of mojson is to transform a JSON object in an absolute serialization way, which means the early key-value pairs will appear in the heading rows of the resultant data frame. mojson also provides an alternative way of comparing two different JSON lists, returning the left/inner/right-join style results.
Power calculations are a critical component of any research study to determine the minimum sample size necessary to detect differences between multiple groups. Here we present an R package, PASSED', that performs power and sample size calculations for the test of two-sample means or ratios with data following beta, gamma (Chang et al. (2011), <doi:10.1007/s00180-010-0209-1>), normal, Poisson (Gu et al. (2008), <doi:10.1002/bimj.200710403>), binomial, geometric, and negative binomial (Zhu and Lakkis (2014), <doi:10.1002/sim.5947>) distributions.
This package provides a collection of functions for preparing data and fitting Bayesian count spatial regression models, with a specific focus on the Gamma-Count (GC) model. The GC model is well-suited for modeling dispersed count data, including under-dispersed or over-dispersed counts, or counts with equivalent dispersion, using Integrated Nested Laplace Approximations (INLA). The package includes functions for generating data from the GC model, as well as spatially correlated versions of the model. See Nadifar, Baghishani, Fallah (2023) <doi:10.1007/s13253-023-00550-5>.
The framework proposed in Jenul et al., (2022) <doi:10.1007/s10994-022-06221-9>, together with an interactive Shiny dashboard. UBayFS is an ensemble feature selection technique embedded in a Bayesian statistical framework. The method combines data and user knowledge, where the first is extracted via data-driven ensemble feature selection. The user can control the feature selection by assigning prior weights to features and penalizing specific feature combinations. UBayFS can be used for common feature selection as well as block feature selection.
BEAST is a Bayesian estimator of abrupt change, seasonality, and trend for decomposing univariate time series and 1D sequential data. Interpretation of time series depends on model choice; different models can yield contrasting or contradicting estimates of patterns, trends, and mechanisms. BEAST alleviates this by abandoning the single-best-model paradigm and instead using Bayesian model averaging over many competing decompositions. It detects and characterizes abrupt changes (changepoints, breakpoints, structural breaks, joinpoints), cyclic or seasonal variation, and nonlinear trends. BEAST not only detects when changes occur but also quantifies how likely the changes are true. It estimates not just piecewise linear trends but also arbitrary nonlinear trends. BEAST is generically applicable to any real-valued time series, such as those from remote sensing, economics, climate science, ecology, hydrology, and other environmental and biological systems. Example applications include identifying regime shifts in ecological data, mapping forest disturbance and land degradation from satellite image time series, detecting market trends in economic indicators, pinpointing anomalies and extreme events in climate records, and analyzing system dynamics in biological time series. Details are given in Zhao et al. (2019) <doi:10.1016/j.rse.2019.04.034>.
Analysis of Ct values from high throughput quantitative real-time PCR (qPCR) assays across multiple conditions or replicates. The input data can be from spatially-defined formats such ABI TaqMan Low Density Arrays or OpenArray; LightCycler from Roche Applied Science; the CFX plates from Bio-Rad Laboratories; conventional 96- or 384-well plates; or microfluidic devices such as the Dynamic Arrays from Fluidigm Corporation. HTqPCR handles data loading, quality assessment, normalization, visualization and parametric or non-parametric testing for statistical significance in Ct values between features (e.g. genes, microRNAs).
LegATo is a suite of open-source software tools for longitudinal microbiome analysis. It is extendable to several different study forms with optimal ease-of-use for researchers. Microbiome time-series data presents distinct challenges including complex covariate dependencies and variety of longitudinal study designs. This toolkit will allow researchers to determine which microbial taxa are affected over time by perturbations such as onset of disease or lifestyle choices, and to predict the effects of these perturbations over time, including changes in composition or stability of commensal bacteria.
The goal of the package aldvmm is to fit adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package aldvmm uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) <doi:10.1177/1536867X1501500307> using normal component distributions and a multinomial logit model of probabilities of component membership.
This package provides a Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit.
This package provides tools for bioinformatics modeling using recursive transformer-inspired architectures, autoencoders, random forests, XGBoost, and stacked ensemble models. Includes utilities for cross-validation, calibration, benchmarking, and threshold optimization in predictive modeling workflows. The methodology builds on ensemble learning (Breiman 2001 <doi:10.1023/A:1010933404324>), gradient boosting (Chen and Guestrin 2016 <doi:10.1145/2939672.2939785>), autoencoders (Hinton and Salakhutdinov 2006 <doi:10.1126/science.1127647>), and recursive transformer efficiency approaches such as Mixture-of-Recursions (Bae et al. 2025 <doi:10.48550/arXiv.2507.10524>).