Implementations of the quantile slice sampler of Heiner et al. (2024+, in preparation) as well as other popular slice samplers are provided. Helper functions for specifying pseudo-target distributions are included, both for diagnostics and for tuning the quantile slice sampler. Other implemented methods include the generalized elliptical slice sampler of Nishihara et al. (2014)<https://jmlr.org/papers/v15/nishihara14a.html
Uses simulation to create prediction intervals for post-policy outcomes in interrupted time series (ITS) designs, following Miratrix (2020) <arXiv:2002.05746>. This package provides methods for fitting ITS models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package also provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. This work partially funded by Arnold Ventures in collaboration with MDRC.
This package provides functions for the integrated analysis of protein-protein interaction networks and the detection of functional modules. Different datasets can be integrated into the network by assigning p-values of statistical tests to the nodes of the network. E.g. p-values obtained from the differential expression of the genes from an Affymetrix array are assigned to the nodes of the network. By fitting a beta-uniform mixture model and calculating scores from the p-values, overall scores of network regions can be calculated and an integer linear programming algorithm identifies the maximum scoring subnetwork.
This package provides a fast reimplementation of several density-based algorithms of the DBSCAN family. It includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and hierarchical DBSCAN (HDBSCAN), the ordering algorithm ordering points to identify the clustering structure (OPTICS), shared nearest neighbor clustering, and the outlier detection algorithms local outlier factor (LOF) and global-local outlier score from hierarchies (GLOSH). The implementations use the kd-tree data structure for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.
Many tools for data analysis are not available in R, but are present in public repositories like conda. The Herper package provides a comprehensive set of functions to interact with the conda package managament system. With Herper users can install, manage and run conda packages from the comfort of their R session. Herper also provides an ad-hoc approach to handling external system requirements for R packages. For people developing packages with python conda dependencies we recommend using basilisk (https://bioconductor.org/packages/release/bioc/html/basilisk.html) to internally support these system requirments pre-hoc.
Here we present Link-HD, an approach to integrate heterogeneous datasets, as a generalization of STATIS-ACT (“Structuration des Tableaux A Trois Indices de la Statistique–Analyse Conjointe de Tableaux”), a family of methods to join and compare information from multiple subspaces. However, STATIS-ACT has some drawbacks since it only allows continuous data and it is unable to establish relationships between samples and features. In order to tackle these constraints, we incorporate multiple distance options and a linear regression based Biplot model in order to stablish relationships between observations and variable and perform variable selection.
Covariance measure tests for conditional independence testing against conditional covariance and nonlinear conditional mean alternatives. The package implements versions of the generalised covariance measure test (Shah and Peters, 2020, <doi:10.1214/19-aos1857>) and projected covariance measure test (Lundborg et al., 2023, <doi:10.1214/24-AOS2447>). The tram-GCM test, for censored responses, is implemented including the Cox model and survival forests (Kook et al., 2024, <doi:10.1080/01621459.2024.2395588>). Application examples to variable significance testing and modality selection can be found in Kook and Lundborg (2024, <doi:10.1093/bib/bbae475>).
This package provides a collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Conducts sensitivity analyses for unmeasured confounding, selection bias, and measurement error (individually or in combination; VanderWeele & Ding (2017) <doi:10.7326/M16-2607>; Smith & VanderWeele (2019) <doi:10.1097/EDE.0000000000001032>; VanderWeele & Li (2019) <doi:10.1093/aje/kwz133>; Smith, Mathur, & VanderWeele (2021) <doi:10.1097/EDE.0000000000001380>). Also conducts sensitivity analyses for unmeasured confounding in meta-analyses (Mathur & VanderWeele (2020a) <doi:10.1080/01621459.2018.1529598>; Mathur & VanderWeele (2020b) <doi:10.1097/EDE.0000000000001180>) and for additive measures of effect modification (Mathur et al., <doi:10.1093/ije/dyac073>).
Forensic applications of pedigree analysis, including likelihood ratios for relationship testing, general relatedness inference, marker simulation, and power analysis. forrel is part of the pedsuite', a collection of packages for pedigree analysis, further described in the book Pedigree Analysis in R (Vigeland, 2021, ISBN:9780128244302). Several functions deal specifically with power analysis in missing person cases, implementing methods described in Vigeland et al. (2020) <doi:10.1016/j.fsigen.2020.102376>. Data import from the Familias software (Egeland et al. (2000) <doi:10.1016/S0379-0738(00)00147-X>) is supported through the pedFamilias package.
An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization. See Sangalli, L. M. (2021) <doi:10.1111/insr.12444> "Spatial Regression With Partial Differential Equation Regularisation" for an overview. The release 1.1-9 requires R (>= 4.2.0) to be installed on windows machines.
Given a group of genomes and their relationship with each other, the package clusters the genomes and selects the most representative members of each cluster. Additional data can be provided to the prioritize certain genomes. The results can be printed out as a list or a new phylogeny with graphs of the trees and distance distributions also available. For detailed introduction see: Thomas H Clarke, Lauren M Brinkac, Granger Sutton, and Derrick E Fouts (2018), GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models, Bioinformatics, bty300, <doi:10.1093/bioinformatics/bty300>.
The risk plot may be one of the most commonly used figures in tumor genetic data analysis. We can conclude the following two points: Comparing the prediction results of the model with the real survival situation to see whether the survival rate of the high-risk group is lower than that of the low-level group, and whether the survival time of the high-risk group is shorter than that of the low-risk group. The other is to compare the heat map and scatter plot to see the correlation between the predictors and the outcome.
The number of clusters (k) is needed to start all the partitioning clustering algorithms. An optimal value of this input argument is widely determined by using some internal validity indices. Since most of the existing internal indices suggest a k value which is computed from the clustering results after several runs of a clustering algorithm they are computationally expensive. On the contrary, the package kpeaks enables to estimate k before running any clustering algorithm. It is based on a simple novel technique using the descriptive statistics of peak counts of the features in a data set.
Implementation of several phenotype-based family genetic risk scores with unified input data and data preparation functions to help facilitate the required data preparation and management. The implemented family genetic risk scores are the extended liability threshold model conditional on family history from Pedersen (2022) <doi:10.1016/j.ajhg.2022.01.009> and Pedersen (2023) <https://www.nature.com/articles/s41467-023-41210-z>, Pearson-Aitken Family Genetic Risk Scores from Krebs (2024) <doi:10.1016/j.ajhg.2024.09.009>, and family genetic risk score from Kendler (2021) <doi:10.1001/jamapsychiatry.2021.0336>.
Estimates exponential-family random graph models for multilevel network data, assuming the multilevel structure is observed. The scope, at present, covers multilevel models where the set of nodes is nested within known blocks. The estimation method uses Monte-Carlo maximum likelihood estimation (MCMLE) methods to estimate a variety of canonical or curved exponential family models for binary random graphs. MCMLE methods for curved exponential-family random graph models can be found in Hunter and Handcock (JCGS, 2006). The package supports parallel computing, and provides methods for assessing goodness-of-fit of models and visualization of networks.
Sharing statistical methods or simulation frameworks through shiny applications often requires workflows for handling data. To help save and display simulation results, the postgresUI() and postgresServer() functions in mmints help with persistent data storage using a PostgreSQL database. The mmints package also offers data upload functionality through the csvUploadUI() and csvUploadServer() functions which allow users to upload data, view variables and their types, and edit variable types before fitting statistical models within the shiny application. These tools aim to enhance efficiency and user interaction in shiny based statistical and simulation applications.
This package provides a collection of tools to fit and work with trophic Species Distribution Models. Trophic Species Distribution Models combine knowledge of trophic interactions with Bayesian structural equation models that model each species as a function of its prey (or predators) and environmental conditions. It exploits the topological ordering of the known trophic interaction network to predict species distribution in space and/or time, where the prey (or predator) distribution is unavailable. The method implemented by the package is described in Poggiato, Andréoletti, Pollock and Thuiller (2022) <doi:10.22541/au.166853394.45823739/v1>.
It covers various approaches to analysis of variance, provides an assumption testing section in order to provide a decision diagram that allows selecting the most appropriate technique. It provides the classical analysis of variance, the nonparametric equivalent of Kruskal Wallis, and the Bayesian approach. These results are shown in an interactive shiny panel, which allows modifying the arguments of the tests, contains interactive graphics and presents automatic conclusions depending on the tests in order to contribute to the interpretation of these analyzes. AovBay uses Stan and FactorBayes for Bayesian analysis and Highcharts for interactive charts.
Enables quick calibration of radiocarbon dates under various calibration curves (including user generated ones); age-depth modelling as per the algorithm of Haslett and Parnell (2008) <DOI:10.1111/j.1467-9876.2008.00623.x>; Relative sea level rate estimation incorporating time uncertainty in polynomial regression models (Parnell and Gehrels 2015) <DOI:10.1002/9781118452547.ch32>; non-parametric phase modelling via Gaussian mixtures as a means to determine the activity of a site (and as an alternative to the Oxcal function SUM(); currently unpublished), and reverse calibration of dates from calibrated into 14C years (also unpublished).
The Bayesian MCMC estimation of parameters for Thomas-type cluster point process with various inhomogeneities. It allows for inhomogeneity in (i) distribution of parent points, (ii) mean number of points in a cluster, (iii) cluster spread. The package also allows for the Bayesian MCMC algorithm for the homogeneous generalized Thomas process. The cluster size is allowed to have a variance that is greater or less than the expected value (cluster sizes are over or under dispersed). Details are described in DvoŠák, RemeÅ¡, Beránek & MrkviÄ ka (2022) <arXiv: 10.48550/arXiv.2205.07946>.
This package provides a beginners toolbox to help those in ecology who want to deepen their understanding or utilize Bioacoustics in their work. The package has a number of utilizations from calculating frequency from waveform, performing operations in dB, and determining acoustic range of recorders. The majority of this package is based on key concepts learned from the K. Lisa Yang Center for Conservation Bioacoustics at Cornell University and their associated course: Introduction to Bioacoustics course. More information can be found within the walk through vignettes at <https://github.com/MattyD797/bioSNR/tree/main/vignettes>.
This package provides functions and data files to help CE Public-Use Microdata (PUMD) users calculate annual estimated expenditure means, standard errors, and quantiles according to the methods used by the CE with PUMD. For more information on the CE please visit <https://www.bls.gov/cex>. For further reading on CE estimate calculations please see the CE Calculation section of the U.S. Bureau of Labor Statistics (BLS) Handbook of Methods at <https://www.bls.gov/opub/hom/cex/calculation.htm>. For further information about CE PUMD please visit <https://www.bls.gov/cex/pumd.htm>.
Building on top of the RcppArmadillo linear algebra functionalities to do fast spatial interaction models in the context of urban analytics, geography, transport modelling. It uses the Newton root search algorithm to determine the optimal cost exponent and can run country level models with thousands of origins and destinations. It aims at implementing an easy approach based on matrices, that can originate from various routing and processing steps earlier in an workflow. Currently, the simplest form of production, destination and doubly constrained models are implemented. Schlosser et al. (2023) <doi:10.48550/arXiv.2309.02112>.