Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.
Time series outlier detection with non parametric test. This is a new outlier detection methodology (washer): efficient for time saving elaboration and implementation procedures, adaptable for general assumptions and for needing very short time series, reliable and effective as involving robust non parametric test. You can find two approaches: single time series (a vector) and grouped time series (a data frame). For other informations: Andrea Venturini (2011) Statistica - Universita di Bologna, Vol.71, pp.329-344. For an informal explanation look at R-bloggers on web.
This package contains miscellaneous functions used to interpret and translate, factorize and negate Sum of Products expressions, for both binary and multi-value crisp sets, and to extract information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly flexible recoding routine and a more flexible alternative to the base function with()
.
This package provides an implementation of multilayered visualizations for enhanced graphical representation of functional analysis data. It combines and integrates omics data derived from expression and functional annotation enrichment analyses. Its plotting functions have been developed with an hierarchical structure in mind: starting from a general overview to identify the most enriched categories (modified bar plot, bubble plot) to a more detailed one displaying different types of relevant information for the molecules in a given set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap).
Automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.) Originally inspired by the AutoNormalize
library for Python by Alteryx (<https://github.com/alteryx/autonormalize>), with various changes and improvements. Automatic discovery of functional or approximate dependencies, normalisation based on those, and plotting of the resulting "database" via Graphviz', with options to exclude some attributes at discovery time, or remove discovered dependencies at normalisation time.
Generates Monte Carlo confidence intervals for standardized regression coefficients (beta) and other effect sizes, including multiple correlation, semipartial correlations, improvement in R-squared, squared partial correlations, and differences in standardized regression coefficients, for models fitted by lm()
. betaMC
combines ideas from Monte Carlo confidence intervals for the indirect effect (Pesigan and Cheung, 2023 <doi:10.3758/s13428-023-02114-4>) and the sampling covariance matrix of regression coefficients (Dudgeon, 2017 <doi:10.1007/s11336-017-9563-z>) to generate confidence intervals effect sizes in regression.
This package provides a general framework using mixture Weibull distributions to accurately predict biomarker-guided trial duration accounting for heterogeneous population. Extensive simulations are performed to evaluate the impact of heterogeneous population and the dynamics of biomarker characteristics and disease on the study duration. Several influential parameters including median survival time, enrollment rate, biomarker prevalence and effect size are identified. Efficiency gains of biomarker-guided trials can be quantitatively compared to the traditional all-comers design. For reference, see Zhang et al. (2024) <arXiv:2401.00540>
.
This package creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling) (Cortes, (2018) <arXiv:1811.01908>
), which usually leads to very sparse user and item factors (over 90% zero-valued). Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.
Set the R prompt dynamically, from a function. The package contains some examples to include various useful dynamic information in the prompt: the status of the last command (success or failure); the amount of memory allocated by the current R process; the name of the R package(s) loaded by pkgload and/or devtools'; various git information: the name of the active branch, whether it is dirty, if it needs pushes pulls. You can also create your own prompt if you don't like the predefined examples.
Symbolic calculation and evaluation of multivariate polynomials with rational coefficients. This package is strongly inspired by the spray package. It provides a function to compute Gröbner bases (reference <doi:10.1007/978-3-319-16721-3>). It also includes some features for symmetric polynomials, such as the Hall inner product. The header file of the C++ code can be used by other packages. It provides the templated class Qspray that can be used to represent and to deal with multivariate polynomials with another type of coefficients.
This package provides a computational framework for identification of B cell clones from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data. Three main functions are included (identicalClones
, hierarchicalClones
, and spectralClones
) that perform clustering among sequences of BCRs/IGs (B cell receptors/immunoglobulins) which share the same V gene, J gene and junction length. Nouri N and Kleinstein SH (2018) <doi: 10.1093/bioinformatics/bty235>. Nouri N and Kleinstein SH (2019) <doi: 10.1101/788620>. Gupta NT, et al. (2017) <doi: 10.4049/jimmunol.1601850>.
This package provides a modular package for simulating phylogenetic trees and species traits jointly. Trees can be simulated using modular birth-death parameters (e.g. changing starting parameters or algorithm rules). Traits can be simulated in any way designed by the user. The growth of the tree and the traits can influence each other through modifiers objects providing rules for affecting each other. Finally, events can be created to modify both the tree and the traits under specific conditions ( Guillerme, 2024 <DOI:10.1111/2041-210X.14306>).
The modern database TileDB
introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', GCS', Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
This package provides tools to compute ordinal, statistics and effect sizes as an alternative to mean comparison: Cliff's delta or success rate difference (SRD), Vargha and Delaney's A or the Area Under a Receiver Operating Characteristic Curve (AUC), the discrete type of McGraw & Wong's Common Language Effect Size (CLES) or Grissom & Kim's Probability of Superiority (PS), and the Number needed to treat (NNT) effect size. Moreover, comparisons to Cohen's d are offered based on Huberty & Lowman's Percentage of Group (Non-)Overlap considerations.
This package implements wavelet-based approaches for describing population admixture. Principal Components Analysis (PCA) is used to define the population structure and produce a localized admixture signal for each individual. Wavelet summaries of the PCA output describe variation present in the data and can be related to population-level demographic processes. For more details, see J Sanderson, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481 <doi:10.1534/genetics.115.176842>.
Calculates B-value and empirical equivalence bound. B-value is defined as the maximum magnitude of a confidence interval; and the empirical equivalence bound is the minimum B-value at a certain level. A new two-stage procedure for hypothesis testing is proposed, where the first stage is conventional hypothesis testing and the second is an equivalence testing procedure using the introduced empirical equivalence bound. See Zhao et al. (2019) "B-Value and Empirical Equivalence Bound: A New Procedure of Hypothesis Testing" <arXiv:1912.13084>
for details.
Various layers of B.C., including administrative boundaries, natural resource management boundaries, census boundaries etc. All layers are available in BC Albers (<https://spatialreference.org/ref/epsg/3005/>) equal-area projection, which is the B.C. government standard. The layers are sourced from the British Columbia and Canadian government under open licenses, including B.C. Data Catalogue (<https://data.gov.bc.ca>), the Government of Canada Open Data Portal (<https://open.canada.ca/en/using-open-data>), and Statistics Canada (<https://www.statcan.gc.ca/en/reference/licence>).
Gibbs sampling for Bayesian spatial blind source separation (BSP-BSS). BSP-BSS is designed for spatially dependent signals in high dimensional and large-scale data, such as neuroimaging. The method assumes the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, and constructs a Bayesian nonparametric prior by thresholding Gaussian processes. Details can be found in our paper: Wu et al. (2022+) "Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process" <doi:10.1080/01621459.2022.2123336>.
Calculates equitable overload compensation for college instructors based on institutional policies, enrollment thresholds, and regular teaching load limits. Compensation is awarded only for credit hours that exceed the regular load and meet minimum enrollment criteria. When enrollment is below a specified threshold, pay is prorated accordingly. The package prioritizes compensation from high-enrollment courses, or optionally from low-enrollment courses for fairness, depending on user-defined strategy. Includes tools for flexible policy settings, instructor filtering, and produces clean, audit-ready summary tables suitable for payroll and administrative reporting.
It is designed to streamline the process of calculating complete annual growth rates with user-friendly functions and robust algorithms. It enables researchers and analysts to effortlessly generate precise growth rate estimates for their data. For method details see, Sharma, M.K.(2013) <https://www.indianjournals.com/ijor.aspx?target=ijor:jfl&volume=26&issue=1and2&article=018>. It offers a comprehensive suite of functions and customisable parameters. Equipped to handle varying complexities in data structures. It empowers users to uncover insightful growth dynamics and make informed decisions.
This package performs the drifting Markov models (DMM) which are non-homogeneous Markov models designed for modeling the heterogeneities of sequences in a more flexible way than homogeneous Markov chains or even hidden Markov models. In this context, we developed an R package dedicated to the estimation, simulation and the exact computation of associated reliability of drifting Markov models. The implemented methods are described in Vergne, N. (2008), <doi:10.2202/1544-6115.1326> and Barbu, V.S., Vergne, N. (2019) <doi:10.1007/s11009-018-9682-8> .
Testing for and dating periods of explosive dynamics (exuberance) in time series using the univariate and panel recursive unit root tests proposed by Phillips et al. (2015) <doi:10.1111/iere.12132> and Pavlidis et al. (2016) <doi:10.1007/s11146-015-9531-2>.The recursive least-squares algorithm utilizes the matrix inversion lemma to avoid matrix inversion which results in significant speed improvements. Simulation of a variety of periodically-collapsing bubble processes. Details can be found in Vasilopoulos et al. (2022) <doi:10.18637/jss.v103.i10>.
Facilitates the post-Genome Wide Association Studies (GWAS) and Quantitative Trait Loci (QTL) analysis of identifying candidate genes within user-defined search window, based on the identified Single Nucleotide Polymorphisms (SNPs) as given by Mazumder AK (2024) <doi:10.1038/s41598-024-66903-3>. It supports candidate gene analysis for wheat and rice. Just import your GWAS result as explained in the sample_data file and the function does all the manual search and retrieve candidate genes for you, while exporting the results into ready-to-use output.
Immunotherapy has revolutionized cancer treatment, but predicting patient response remains challenging. Here, we presented Intelligent Predicting Response to cancer Immunotherapy through Systematic Modeling (iPRISM
), a novel network-based model that integrates multiple data types to predict immunotherapy outcomes. It incorporates gene expression, biological functional network, tumor microenvironment characteristics, immune-related pathways, and clinical data to provide a comprehensive view of factors influencing immunotherapy efficacy. By identifying key genetic and immunological factors, it provides an insight for more personalized treatment strategies and combination therapies to overcome resistance mechanisms.