This package provides a Davidian curve defines a seminonparametric density, whose shape and flexibility can be tuned by easy to estimate parameters. Since a special case of a Davidian curve is the standard normal density, Davidian curves can be used for relaxing normality assumption in statistical applications (Zhang & Davidian, 2001) <doi:10.1111/j.0006-341X.2001.00795.x>. This package provides the density function, the gradient of the loglikelihood and a random generator for Davidian curves.
This package provides tools to perform hierarchical inference for one or multiple studies / data sets based on high-dimensional multivariate (generalised) linear models. A possible application is to perform hierarchical inference for GWA studies to find significant groups or single SNPs (if the signal is strong) in a data-driven and automated procedure. The method is based on an efficient hierarchical multiple testing correction and controls the FWER. The functions can easily be run in parallel.
Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.
This package provides the basic functionality to interact with the Collatz conjecture. The parameterisation uses the same (P,a,b) notation as Conway's generalisations. Besides the function and reverse function, there is also functionality to retrieve the hailstone sequence, the "stopping time"/"total stopping time", or tree-graph. The only restriction placed on parameters is that both P and a can't be 0. For further reading, see <https://en.wikipedia.org/wiki/Collatz_conjecture>.
This package implements algorithms for analyzing Cayley graphs of permutation groups, with a focus on the TopSpin puzzle and similar permutation-based combinatorial puzzles. Provides methods for cycle detection, state space exploration, bidirectional BFS pathfinding, and finding optimal operation sequences in permutation groups generated by shift and reverse operations. Includes C++ implementations of core operations via Rcpp for performance. Optional GPU acceleration via ggmlR Vulkan backend for batch distance calculations and parallel state transformations.
This package provides a collection of functions that have been developed to assist experimenter in modeling chemical degradation kinetic data. The selection of the appropriate degradation model and parameter estimation is carried out automatically as far as possible and is driven by a rigorous statistical interpretation of the results. The package integrates already available goodness-of-fit statistics for nonlinear models. In addition it allows data fitting with the nonlinear first-order multi-target (FOMT) model.
This package provides a comprehensive reproducibility framework designed for R and bioinformatics workflows. Automatically captures the entire analysis environment including R session info, package versions, external tool versions ('Samtools', STAR', BWA', etc.), conda environments, reference genomes, data provenance with smart checksumming for large files, parameter choices, random seeds, and hardware specifications. Generates executable scripts with Docker', Singularity', and renv configurations. Integrates with workflow managers ('Nextflow', Snakemake', WDL', CWL') to ensure complete reproducibility of computational research workflows.
Computes the Extended Chen-Poisson (ecp) distribution, survival, density, hazard, cumulative hazard and quantile functions. It also allows to generate a pseudo-random sample from this distribution. The corresponding graphics are available. Functions to obtain measures of skewness and kurtosis, k-th raw moments, conditional k-th moments and mean residual life function were added. For details about ecp distribution, see Sousa-Ferreira, I., Abreu, A.M. & Rocha, C. (2023). <doi:10.57805/revstat.v21i2.405>.
Analyze functional data and its change points. Includes functionality to store and process data, summarize and validate assumptions, characterize and perform inference of change points, and provide visualizations. Data is stored as discretely collected observations without requiring the selection of basis functions. For more details see chapter 8 of Horvath and Rice (2024) <doi:10.1007/978-3-031-51609-2>. Additional papers are forthcoming. Focused works are also included in the documentation of corresponding functions.
This package provides browser-native WebGL rendering for R graphics through htmlwidgets'. The package supports grammar-style graphics workflows and renderer-ready specifications for dense analytical and scientific scenes, including point, line, trajectory, raster, vector, mesh, and surface layers, shader-driven display modes, timeline controls, structured views, selection metadata, and publication-oriented static export helpers. Rendering stays in the browser, and the core package remains cross-platform without requiring CUDA', Metal', or OpenCL toolchains.
Statistical tests widely utilized in biostatistics, public policy, and law. Along with the well-known tests for equality of means and variances, randomness, and measures of relative variability, the package contains new robust tests of symmetry, omnibus and directional tests of normality, and their graphical counterparts such as robust QQ plot, robust trend tests for variances, etc. All implemented tests and methods are illustrated by simulations and real-life examples from legal statistics, economics, and biostatistics.
Constructs genetic linkage maps in autopolyploid full-sib populations. Uses pairwise recombination fraction estimation as the first source of information to sequentially position allelic variants in specific homologous chromosomes. For situations where pairwise analysis has limited power, the algorithm relies on the multilocus likelihood obtained through a hidden Markov model (HMM). Methods are described in Mollinari and Garcia (2019) <doi:10.1534/g3.119.400378> and Mollinari et al. (2020) <doi:10.1534/g3.119.400620>.
This package provides statistical components, tables, and graphs that are useful in Quarto and RMarkdown reports and that produce Quarto elements for special formatting such as tabs and marginal notes and graphs. Some of the functions produce entire report sections with tabs, e.g., the missing data report created by missChk(). Functions for inserting variables and tables inside graphviz and mermaid diagrams are included, and so are special clinical trial graphics for adverse event reporting.
Tool for statistical simulations that have two components. One component generates the data and the other one analyzes the data. The main aims of the package are the reduction of the administrative source code (mainly loops and management code for the results) and a simple applicability of the package that allows the user to quickly learn how to work with it. Parallel computing is also supported. Finally, convenient functions are provided to summarize the simulation results.
Several statistical test functions as well as a function for exploratory data analysis to investigate classifiers allocating individuals to one of three disjoint and ordered classes. In a single classifier assessment the discriminatory power is compared to classification by chance. In a comparison of two classifiers the null hypothesis corresponds to equal discriminatory power of the two classifiers. See also "ROC Analysis for Classification and Prediction in Practice" by Nakas, Bantis and Gatsonis (2023), ISBN 9781482233704.
An R API providing easy access to a relational database with macroeconomic, financial and development related time series data for Uganda. Overall more than 5000 series at varying frequency (daily, monthly, quarterly, annual in fiscal or calendar years) can be accessed through the API. The data is provided by the Bank of Uganda, the Ugandan Ministry of Finance, Planning and Economic Development, the IMF and the World Bank. The database is being updated once a month.
Various semiparametric and nonparametric statistical tools for immune correlates analysis of vaccine clinical trial data. This includes calculation of summary statistics and estimation of risk, vaccine efficacy, controlled effects (controlled risk and controlled vaccine efficacy), and mediation effects (natural direct effect, natural indirect effect, proportion mediated). See Gilbert P, Fong Y, Kenny A, and Carone, M (2022) <doi:10.1093/biostatistics/kxac024> and Fay MP and Follmann DA (2023) <doi:10.48550/arXiv.2208.06465>.
Infectious disease surveillance requires early outbreak detection. This package provides statistical tools for analyzing time-series monitoring data through three core methods: a) EWMA (Exponentially Weighted Moving Average) b) Modified-CUSUM (Modified Cumulative Sum) c) Adjusted-Serfling models Methodologies are based on: - Wang et al. (2010) <doi:10.1016/j.jbi.2009.08.003> - Wang et al. (2015) <doi:10.1371/journal.pone.0119923> Designed for epidemiologists and public health researchers working with disease surveillance systems.
This tool enables in-database scoring of XGBoost models built in R, by translating trained model objects into SQL query. XGBoost <https://github.com/dmlc/xgboost> provides parallel tree boosting (also known as gradient boosting machine, or GBM) algorithms in a highly efficient, flexible and portable way. GBM algorithm is introduced by Friedman (2001) <doi:10.1214/aos/1013203451>, and more details on XGBoost can be found in Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
Fastseg implements a very fast and efficient segmentation algorithm. It can segment data from DNA microarrays and data from next generation sequencing for example to detect copy number segments. Further it can segment data from RNA microarrays like tiling arrays to identify transcripts. Most generally, it can segment data given as a matrix or as a vector. Various data formats can be used as input to fastseg like expression set objects for microarrays or GRanges for sequencing data.
This package provides a flexible approach to Bayesian optimization / model based optimization building on the bbotk package. The mlr3mbo is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using mlr3mbo for hyperparameter optimization of machine learning models within the mlr3 ecosystem is straightforward via mlr3tuning.
HuBMAP provides an open, global bio-molecular atlas of the human body at the cellular level. The `datasets()`, `samples()`, `donors()`, `publications()`, and `collections()` functions retrieves the information for each of these entity types. `*_details()` are available for individual entries of each entity type. `*_derived()` are available for retrieving derived datasets or samples for individual entries of each entity type. Data files can be accessed using `bulk_data_transfer()`.
Create a pie like plot to visualise if the aim or several aims of a project is achieved or close to be achieved i.e the aim is achieved when the point is at the center of the pie plot. Imagine it's like a dartboard and the center means 100% completeness/achievement. Achievement can also be understood as 100% coverage. The standard distribution of completeness allocated in the pie plot is 50%, 80% and 100% completeness.
Dynamic path analysis with estimation of the corresponding direct, indirect, and total effects, based on Fosen et al., (2006) <doi:10.1007/s10985-006-9004-2>. The main outcome of interest is a counting process from survival analysis (or recurrent events) data. At each time of event, ordinary linear regression is used to estimate the relation between the covariates, while Aalen's additive hazard model is used for the regression of the counting process on the covariates.