Simultaneous tests and confidence intervals are provided for one-way experimental designs with one or many normally distributed, primary response variables (endpoints). Differences (Hasler and Hothorn, 2011 <doi:10.2202/1557-4679.1258>) or ratios (Hasler and Hothorn, 2012 <doi:10.1080/19466315.2011.633868>) of means can be considered. Various contrasts can be chosen, unbalanced sample sizes are allowed as well as heterogeneous variances (Hasler and Hothorn, 2008 <doi:10.1002/bimj.200710466>) or covariance matrices (Hasler, 2014 <doi:10.1515/ijb-2012-0015>).
Easily override the default visual choices in ggplot2 to make your time series plots look more like the Wall Street Journal. Specific theme design choices include omitting x-axis grid lines and displaying sparse light grey y-axis grid lines. Additionally, this allows to label the y-axis scales with your units only displayed on the top-most number, while also removing the bottom most number (unless specifically overridden). The goal is visual simplicity, because who has time to waste looking at a cluttered graph?
This package provides a collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters.
This package manages a file system cache. Regular files can be moved or copied to the cache folder. Sub-folders can be created in order to organize the files. Files can be located inside the cache using a glob function. Text contents can be easily stored in and retrieved from the cache using dedicated functions. It can be used for an application or a package, as a global cache, or as a per-user cache, in which case the standard OS user cache folder will be used.
This package provides a scripting and command-line front-end is provided by r (aka littler) as a lightweight binary wrapper around the GNU R language and environment for statistical computing and graphics. While R can be used in batch mode, the r binary adds full support for both shebang-style scripting (i.e. using a hash-mark-exclamation-path expression as the first line in scripts) as well as command-line use in standard pipelines. In other words, r provides the R language without the environment.
Average population attributable fractions are calculated for a set of risk factors (either binary or ordinal valued) for both prospective and case- control designs. Confidence intervals are found by Monte Carlo simulation. The method can be applied to either prospective or case control designs, provided an estimate of disease prevalence is provided. In addition to an exact calculation of AF, an approximate calculation, based on randomly sampling permutations has been implemented to ensure the calculation is computationally tractable when the number of risk factors is large.
BAYesian inference for MEDical designs in R. Functions for the computation of Bayes factors for common biomedical research designs. Implemented are functions to test the equivalence (equiv_bf), non-inferiority (infer_bf), and superiority (super_bf) of an experimental group compared to a control group on a continuous outcome measure. Bayes factors for these three tests can be computed based on raw data (x, y) or summary statistics (n_x, n_y, mean_x, mean_y, sd_x, sd_y [or ci_margin and ci_level]).
This package implements Bayesian hierarchical models with flexible Gaussian process priors, focusing on Extended Latent Gaussian Models and incorporating various Gaussian process priors for Bayesian smoothing. Computations leverage finite element approximations and adaptive quadrature for efficient inference. Methods are detailed in Zhang, Stringer, Brown, and Stafford (2023) <doi:10.1177/09622802221134172>; Zhang, Stringer, Brown, and Stafford (2024) <doi:10.1080/10618600.2023.2289532>; Zhang, Brown, and Stafford (2023) <doi:10.48550/arXiv.2305.09914>; and Stringer, Brown, and Stafford (2021) <doi:10.1111/biom.13329>.
This package provides tools for estimation and clustering of spherical data, seamlessly integrated with the flexmix package. Includes the necessary M-step implementations for both Poisson Kernel-Based Distribution (PKBD) and spherical Cauchy distribution. Additionally, the package provides random number generators for PKBD and spherical Cauchy distribution. Methods are based on Golzy M., Markatou M. (2020) <doi:10.1080/10618600.2020.1740713>, Kato S., McCullagh P. (2020) <doi:10.3150/20-bej1222> and Sablica L., Hornik K., Leydold J. (2023) <doi:10.1214/23-ejs2149>.
Support in preparing a raw ESM dataset for statistical analysis. Preparation includes the handling of errors (mostly due to technological reasons) and the generating of new variables that are necessary and/or helpful in meeting the conditions when statistically analyzing ESM data. The functions in esmprep are meant to hierarchically lead from bottom, i.e. the raw (separated) ESM dataset(s), to top, i.e. a single ESM dataset ready for statistical analysis. This hierarchy evolved out of my personal experience in working with ESM data.
Fit and visualize the results of a Bayesian analysis of networks commonly found in psychology. The package supports fitting cross-sectional network models fitted using the packages BDgraph', bgms and BGGM', as well as network comparison fitted using the bgms and BBGM'. The package provides the parameter estimates, posterior inclusion probabilities, inclusion Bayes factor, and the posterior density of the parameters. In addition, for BDgraph and bgms it allows to assess the posterior structure space. Furthermore, the package comes with an extensive suite for visualizing results.
Calculates additive and dominance genetic relationship matrices and their inverses, in matrix and tabular-sparse formats. It includes functions for checking and processing pedigree, calculating inbreeding coefficients (Meuwissen & Luo, 1992 <doi:10.1186/1297-9686-24-4-305>), as well as functions to calculate the matrix of genetic group contributions (Q), and adding those contributions to the genetic merit of animals (Quaas (1988) <doi:10.3168/jds.S0022-0302(88)79691-5>). Calculation of Q is computationally extensive. There are computationally optimized functions to calculate Q.
The algorithm of semi-supervised learning is based on finite Gaussian mixture models and includes a mechanism for handling missing data. It aims to fit a g-class Gaussian mixture model using maximum likelihood. The algorithm treats the labels of unclassified features as missing data, building on the framework introduced by Rubin (1976) <doi:10.2307/2335739> for missing data analysis. By taking into account the dependencies in the missing pattern, the algorithm provides more information for determining the optimal classifier, as specified by Bayes rule.
This package contains an engine for spatially-explicit eco-evolutionary mechanistic models with a modular implementation and several support functions. It allows exploring the consequences of ecological and macroevolutionary processes across realistic or theoretical spatio-temporal landscapes on biodiversity patterns as a general term. Reference: Oskar Hagen, Benjamin Flueck, Fabian Fopp, Juliano S. Cabral, Florian Hartig, Mikael Pontarp, Thiago F. Rangel, Loic Pellissier (2021) "gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth's biodiversity" <doi:10.1371/journal.pbio.3001340>.
This package provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). It also provides other tools for analysis and graphical display of the models such as robust methods and homogeneity of variance covariance matrices. The related candisc package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.
This package provides efficient implementation of the Isolate-Detect methodology for the consistent estimation of the number and location of multiple change-points in one-dimensional data sequences from the "deterministic + noise" model. For details on the Isolate-Detect methodology, please see Anastasiou and Fryzlewicz (2018) <https://docs.wixstatic.com/ugd/24cdcc_6a0866c574654163b8255e272bc0001b.pdf>. Currently implemented scenarios are: piecewise-constant signal with Gaussian noise, piecewise-constant signal with heavy-tailed noise, continuous piecewise-linear signal with Gaussian noise, continuous piecewise-linear signal with heavy-tailed noise.
Quantifies ecological memory in long time-series using Random Forest models ('Benito', Gil-Romera', and Birks 2019 <doi:10.1111/ecog.04772>) fitted with ranger (Wright and Ziegler 2017 <doi:10.18637/jss.v077.i01>). Ecological memory is assessed by modeling a response variable as a function of lagged predictors, distinguishing endogenous memory (lagged response) from exogenous memory (lagged environmental drivers). Designed for palaeoecological datasets and simulated pollen curves from virtualPollen', but applicable to any long time-series with environmental drivers and a biotic response.
This package implements a method that builds the coefficients of a polynomial model that performs almost equivalently as a given neural network (densely connected). This is achieved using Taylor expansion at the activation functions. The obtained polynomial coefficients can be used to explain features (and their interactions) importance in the neural network, therefore working as a tool for interpretability or eXplainable Artificial Intelligence (XAI). See Morala et al. 2021 <doi:10.1016/j.neunet.2021.04.036>, and 2023 <doi:10.1109/TNNLS.2023.3330328>.
This package implements the Network meta-Analytic Predictive (NAP) prior framework to accommodate changes in the standard of care (SoC) during ongoing randomized controlled trials (RCTs). The method synthesizes pre- and post-change in-trial data by leveraging external evidence, particularly head-to-head trials comparing the original and new standards of care, to bridge the two evidence periods and enable principled borrowing. The package provides utilities to construct NAP-based priors and perform Bayesian inference for time-to-event endpoints using summarized trial evidence.
Deduplicates datasets by retaining the most complete and informative records. Identifies duplicated entries based on a specified key column, calculates completeness scores for each row, and compares values within groups. When differences between duplicates exceed a user-defined threshold, records are split into unique IDs; otherwise, they are coalesced into a single, most complete entry. Returns a list containing the original duplicates, the split entries, and the final coalesced dataset. Useful for cleaning survey or administrative data where duplicated IDs may reflect minor data entry inconsistencies.
Estimation of two- and three-way dynamic panel threshold regression models (Di Lascio and Perazzini (2024) <https://repec.unibz.it/bemps104.pdf>; Di Lascio and Perazzini (2022, ISBN:978-88-9193-231-0); Seo and Shin (2016) <doi:10.1016/j.jeconom.2016.03.005>) through the generalized method of moments based on the first difference transformation and the use of instrumental variables. The models can be used to find a change point detection in the time series. In addition, random number generation is also implemented.
Calculates a Satorra-Bentler scaled chi-squared difference test between nested models that were estimated using maximum likelihood (ML) with robust standard errors, which cannot be calculated the traditional way. For details see Satorra & Bentler (2001) <doi:10.1007/bf02296192> and Satorra & Bentler (2010) <doi:10.1007/s11336-009-9135-y>. This package may be particularly helpful when used in conjunction with Mplus software, specifically when implementing the complex survey option. In such cases, the model estimator in Mplus defaults to ML with robust standard errors.
Identifying cell types based on expression profiles is a pillar of single cell analysis. scROSHI identifies cell types based on expression profiles of single cell analysis by utilizing previously obtained cell type specific gene sets. It takes into account the hierarchical nature of cell type relationship and does not require training or annotated data. A detailed description of the method can be found at: Prummer, Bertolini, Bosshard, Barkmann, Yates, Boeva, The Tumor Profiler Consortium, Stekhoven, and Singer (2022) <doi:10.1101/2022.04.05.487176>.
This package provides tools to convert from specific formats to more general forms of spatial data. Using tables to store the actual entities present in spatial data provides flexibility, and the functions here deliberately minimize the level of interpretation applied, leaving that for specific applications. Includes support for simple features, round-trip for Spatial classes and long-form tables, analogous to ggplot2::fortify'. There is also a more normal form representation that decomposes simple features and their kin to tables of objects, parts, and unique coordinates.