Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.
This package provides a collection of functions that have been developed to assist experimenter in modeling chemical degradation kinetic data. The selection of the appropriate degradation model and parameter estimation is carried out automatically as far as possible and is driven by a rigorous statistical interpretation of the results. The package integrates already available goodness-of-fit statistics for nonlinear models. In addition it allows data fitting with the nonlinear first-order multi-target (FOMT) model.
This package provides the basic functionality to interact with the Collatz conjecture. The parameterisation uses the same (P,a,b) notation as Conway's generalisations. Besides the function and reverse function, there is also functionality to retrieve the hailstone sequence, the "stopping time"/"total stopping time", or tree-graph. The only restriction placed on parameters is that both P and a can't be 0. For further reading, see <https://en.wikipedia.org/wiki/Collatz_conjecture>.
This package provides a comprehensive reproducibility framework designed for R and bioinformatics workflows. Automatically captures the entire analysis environment including R session info, package versions, external tool versions ('Samtools', STAR', BWA', etc.), conda environments, reference genomes, data provenance with smart checksumming for large files, parameter choices, random seeds, and hardware specifications. Generates executable scripts with Docker', Singularity', and renv configurations. Integrates with workflow managers ('Nextflow', Snakemake', WDL', CWL') to ensure complete reproducibility of computational research workflows.
Computes the Extended Chen-Poisson (ecp) distribution, survival, density, hazard, cumulative hazard and quantile functions. It also allows to generate a pseudo-random sample from this distribution. The corresponding graphics are available. Functions to obtain measures of skewness and kurtosis, k-th raw moments, conditional k-th moments and mean residual life function were added. For details about ecp distribution, see Sousa-Ferreira, I., Abreu, A.M. & Rocha, C. (2023). <doi:10.57805/revstat.v21i2.405>.
Analyze functional data and its change points. Includes functionality to store and process data, summarize and validate assumptions, characterize and perform inference of change points, and provide visualizations. Data is stored as discretely collected observations without requiring the selection of basis functions. For more details see chapter 8 of Horvath and Rice (2024) <doi:10.1007/978-3-031-51609-2>. Additional papers are forthcoming. Focused works are also included in the documentation of corresponding functions.
Statistical tests widely utilized in biostatistics, public policy, and law. Along with the well-known tests for equality of means and variances, randomness, and measures of relative variability, the package contains new robust tests of symmetry, omnibus and directional tests of normality, and their graphical counterparts such as robust QQ plot, robust trend tests for variances, etc. All implemented tests and methods are illustrated by simulations and real-life examples from legal statistics, economics, and biostatistics.
This package provides statistical components, tables, and graphs that are useful in Quarto and RMarkdown reports and that produce Quarto elements for special formatting such as tabs and marginal notes and graphs. Some of the functions produce entire report sections with tabs, e.g., the missing data report created by missChk(). Functions for inserting variables and tables inside graphviz and mermaid diagrams are included, and so are special clinical trial graphics for adverse event reporting.
Tool for statistical simulations that have two components. One component generates the data and the other one analyzes the data. The main aims of the package are the reduction of the administrative source code (mainly loops and management code for the results) and a simple applicability of the package that allows the user to quickly learn how to work with it. Parallel computing is also supported. Finally, convenient functions are provided to summarize the simulation results.
Several statistical test functions as well as a function for exploratory data analysis to investigate classifiers allocating individuals to one of three disjoint and ordered classes. In a single classifier assessment the discriminatory power is compared to classification by chance. In a comparison of two classifiers the null hypothesis corresponds to equal discriminatory power of the two classifiers. See also "ROC Analysis for Classification and Prediction in Practice" by Nakas, Bantis and Gatsonis (2023), ISBN 9781482233704.
This package provides a tidy interface for integrating large language model (LLM) APIs such as Claude', Openai', Gemini','Mistral and local models via Ollama into R workflows. The package supports text and media-based interactions, interactive message history, batch request APIs, and a tidy, pipeline-oriented interface for streamlined integration into data workflows. Web services are available at <https://www.anthropic.com>, <https://openai.com>, <https://aistudio.google.com/>, <https://mistral.ai/> and <https://ollama.com>.
An R API providing easy access to a relational database with macroeconomic, financial and development related time series data for Uganda. Overall more than 5000 series at varying frequency (daily, monthly, quarterly, annual in fiscal or calendar years) can be accessed through the API. The data is provided by the Bank of Uganda, the Ugandan Ministry of Finance, Planning and Economic Development, the IMF and the World Bank. The database is being updated once a month.
Various semiparametric and nonparametric statistical tools for immune correlates analysis of vaccine clinical trial data. This includes calculation of summary statistics and estimation of risk, vaccine efficacy, controlled effects (controlled risk and controlled vaccine efficacy), and mediation effects (natural direct effect, natural indirect effect, proportion mediated). See Gilbert P, Fong Y, Kenny A, and Carone, M (2022) <doi:10.1093/biostatistics/kxac024> and Fay MP and Follmann DA (2023) <doi:10.48550/arXiv.2208.06465>.
Infectious disease surveillance requires early outbreak detection. This package provides statistical tools for analyzing time-series monitoring data through three core methods: a) EWMA (Exponentially Weighted Moving Average) b) Modified-CUSUM (Modified Cumulative Sum) c) Adjusted-Serfling models Methodologies are based on: - Wang et al. (2010) <doi:10.1016/j.jbi.2009.08.003> - Wang et al. (2015) <doi:10.1371/journal.pone.0119923> Designed for epidemiologists and public health researchers working with disease surveillance systems.
This tool enables in-database scoring of XGBoost models built in R, by translating trained model objects into SQL query. XGBoost <https://github.com/dmlc/xgboost> provides parallel tree boosting (also known as gradient boosting machine, or GBM) algorithms in a highly efficient, flexible and portable way. GBM algorithm is introduced by Friedman (2001) <doi:10.1214/aos/1013203451>, and more details on XGBoost can be found in Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
Fastseg implements a very fast and efficient segmentation algorithm. It can segment data from DNA microarrays and data from next generation sequencing for example to detect copy number segments. Further it can segment data from RNA microarrays like tiling arrays to identify transcripts. Most generally, it can segment data given as a matrix or as a vector. Various data formats can be used as input to fastseg like expression set objects for microarrays or GRanges for sequencing data.
This package provides a flexible approach to Bayesian optimization / model based optimization building on the bbotk package. The mlr3mbo is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using mlr3mbo for hyperparameter optimization of machine learning models within the mlr3 ecosystem is straightforward via mlr3tuning.
HuBMAP provides an open, global bio-molecular atlas of the human body at the cellular level. The `datasets()`, `samples()`, `donors()`, `publications()`, and `collections()` functions retrieves the information for each of these entity types. `*_details()` are available for individual entries of each entity type. `*_derived()` are available for retrieving derived datasets or samples for individual entries of each entity type. Data files can be accessed using `bulk_data_transfer()`.
Create a pie like plot to visualise if the aim or several aims of a project is achieved or close to be achieved i.e the aim is achieved when the point is at the center of the pie plot. Imagine it's like a dartboard and the center means 100% completeness/achievement. Achievement can also be understood as 100% coverage. The standard distribution of completeness allocated in the pie plot is 50%, 80% and 100% completeness.
Main function "decode" is used to decode coded key values to plain text. Function "code" can be used to code plain text to code if there is a 1:1 relation between the two. The concept relies on keyvalue objects used for translation. There are several keyvalue objects included in the areas of geographical regional codes, administrative health care unit codes, diagnosis codes and more. It is also easy to extend the use by arbitrary code sets.
Dynamic path analysis with estimation of the corresponding direct, indirect, and total effects, based on Fosen et al., (2006) <doi:10.1007/s10985-006-9004-2>. The main outcome of interest is a counting process from survival analysis (or recurrent events) data. At each time of event, ordinary linear regression is used to estimate the relation between the covariates, while Aalen's additive hazard model is used for the regression of the counting process on the covariates.
Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.
It allows running EViews (<https://eviews.com>) program from R, R Markdown and Quarto documents. EViews (Econometric Views) is a statistical software for Econometric analysis. This package integrates EViews and R and also serves as an EViews Knit-Engine for knitr package. Write all your EViews commands in R, R Markdown or Quarto documents. For details, please consult our peer-review article Mati S., Civcir I. and Abba S.I (2023) <doi:10.32614/RJ-2023-045>.
This package provides a consistent, unified and extensible framework for estimation of parameters for probability distributions, including parameter estimation procedures that allow for weighted samples; the current set of distributions included are: the standard beta, The four-parameter beta, Burr, gamma, Gumbel, Johnson SB and SU, Laplace, logistic, normal, symmetric truncated normal, truncated normal, symmetric-reflected truncated beta, standard symmetric-reflected truncated beta, triangular, uniform, and Weibull distributions; decision criteria and selections based on these decision criteria.