The lognormal distribution (Limpert et al. (2001) <doi:10.1641/0006-3568(2001)051%5B0341:lndats%5D2.0.co;2>) can characterize uncertainty that is bounded by zero. This package provides estimation of distribution parameters, computation of moments and other basic statistics, and an approximation of the distribution of the sum of several correlated lognormally distributed variables (Lo 2013 <doi:10.12988/ams.2013.39511>) and the approximation of the difference of two correlated lognormally distributed variables (Lo 2012 <doi:10.1155/2012/838397>).
This package provides tools for the calculation of effect sizes (standardised mean difference) and mean difference in pre-post controlled studies, including robust imputation of missing variances (standard deviation of changes) and correlations (Pearson correlation coefficient). The main function metacor_dual() implements several methods for imputing missing standard deviation of changes or Pearson correlation coefficient, and generates transparent imputation reports. Designed for meta-analyses with incomplete summary statistics. For details on the methods, see Higgins et al. (2023) and Fu et al. (2013).
This package provides a method that analyzes quality control metrics from multi-sample genomic sequencing studies and nominates poor quality samples for exclusion. Per sample quality control data are transformed into z-scores and aggregated. The distribution of aggregated z-scores are modelled using parametric distributions. The parameters of the optimal model, selected either by goodness-of-fit statistics or user-designation, are used for outlier nomination. Two implementations of the Cosine Similarity Outlier Detection algorithm are provided with flexible parameters for dataset customization.
Displays provenance graphically for provenance collected by the rdt or rdtLite packages, or other tools providing compatible PROV JSON output. The exact format of the JSON created by rdt and rdtLite is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>. More information about rdtLite and associated tools is available at <https://github.com/End-to-end-provenance/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi: 10.3390/informatics5010012>.
The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification. In particular, it normalizes the surrogate to resemble gaussian mixture and leverages the remaining features through random corruption denoising. Background and details about the method can be found at Yu et al. (2018) <doi:10.1093/jamia/ocx111>.
Cluster-independent method based on topology structure of gene co-expression network for identifying feature gene sets, extracting cellular subpopulations, and elucidating intrinsic relationships among these subpopulations. Without prior cell clustering, SifiNet circumvents potential inaccuracies in clustering that may influence subsequent analyses. This method is introduced in Qi Gao, Zhicheng Ji, Liuyang Wang, Kouros Owzar, Qi-Jing Li, Cliburn Chan, Jichun Xie "SifiNet: a robust and accurate method to identify feature gene sets and annotate cells" (2024) <doi:10.1093/nar/gkae307>.
We build an Susceptible-Infectious-Recovered (SIR) model where the rate of infection is the sum of the household rate and the community rate. We estimate the posterior distribution of the parameters using the Metropolis algorithm. Further details may be found in: F Scott Dahlgren, Ivo M Foppa, Melissa S Stockwell, Celibell Y Vargas, Philip LaRussa, Carrie Reed (2021) "Household transmission of influenza A and B within a prospective cohort during the 2013-2014 and 2014-2015 seasons" <doi:10.1002/sim.9181>.
This package provides functions to compute compositional turnover using zeta-diversity, the number of species shared by multiple assemblages. The package includes functions to compute zeta-diversity for a specific number of assemblages and to compute zeta-diversity for a range of numbers of assemblages. It also includes functions to explain how zeta-diversity varies with distance and with differences in environmental variables between assemblages, using generalised linear models, linear models with negative constraints, generalised additive models,shape constrained additive models, and I-splines.
BANDITS is a Bayesian hierarchical model for detecting differential splicing of genes and transcripts, via DTU (differential transcript usage), between two or more conditions. The method uses a Bayesian hierarchical framework, which allows for sample specific proportions in a Dirichlet-Multinomial model, and samples the allocation of fragments to the transcripts. Parameters are inferred via MCMC (Markov chain Monte Carlo) techniques and a DTU test is performed via a multivariate Wald test on the posterior densities for the average relative abundance of transcripts.
This package provides support for the foreach looping construct. foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn't require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
This package implements beta regression for modeling beta-distributed dependent variables on the open unit interval (0, 1), e.g., rates and proportions, see Cribari-Neto and Zeileis (2010) <doi:10.18637/jss.v034.i02>. Moreover, extended-support beta regression models can accommodate dependent variables with boundary observations at 0 and/or 1. For the classical beta regression model, alternative specifications are provided: Bias-corrected and bias-reduced estimation, finite mixture models, and recursive partitioning for beta regression, see <doi:10.18637/jss.v048.i11>.
Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: "Bayesian Exploratory Factor Analysis" (befa) from G. Conti, S. Frühwirth-Schnatter, J.J. Heckman, R. Piatek (2014) <doi:10.1016/j.jeconom.2014.06.008>, an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling.
Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the Arrow platform <https://arrow.apache.org/docs/r/>, which allows users to work with larger-than-memory census data using dplyr familiar functions. <https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr>.
Implemented are three Wald-type statistic and respective permuted versions for null hypotheses formulated in terms of cumulative hazard rate functions, medians and the concordance measure, respectively, in the general framework of survival factorial designs with possibly heterogeneous survival and/or censoring distributions, for crossed designs with an arbitrary number of factors and nested designs with up to three factors. Ditzhaus, Dobler and Pauly (2020) <doi:10.1177/0962280220980784> Ditzhaus, Janssen, Pauly (2020) <arXiv: 2004.10818v2> Dobler and Pauly (2019) <doi:10.1177/0962280219831316>.
This package provides facilities to read, write and validate geographic metadata defined with ISO TC211 / OGC ISO geographic information metadata standards, and encoded using the ISO 19139 and ISO 19115-3 (XML) standard technical specifications. This includes ISO 19110 (Feature cataloguing), 19115 (dataset metadata), 19119 (service metadata) and 19136 (GML). Other interoperable schemas from the OGC are progressively supported as well, such as the Sensor Web Enablement (SWE) Common Data Model, the OGC GML Coverage Implementation Schema (GMLCOV), or the OGC GML Referenceable Grid (GMLRGRID).
Studies that report shifts in species distributions may be biased by the shape of the study area. The main functionality of this package is to calculate the Latitudinal Bias Index (LBI) for any given shape. The LBI is bounded between +1 (100% probability to exclusively record latitudinal shifts, i.e., range shifts data sampled along a perfectly South-North oriented straight line) and -1 (100% probability to exclusively record longitudinal shifts, i.e., range shifts data sampled along a perfectly East-West oriented straight line).
Nonparametric Failure Time (NFT) Bayesian Additive Regression Trees (BART): Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees (HBART) and Low Information Omnibus (LIO) Dirichlet Process Mixtures (DPM). An NFT BART model is of the form Y = mu + f(x) + sd(x) E where functions f and sd have BART and HBART priors, respectively, while E is a nonparametric error distribution due to a DPM LIO prior hierarchy. See the following for a description of the model at <doi:10.1111/biom.13857>.
This package provides tools for calculating disclosure risk measures for microdata, including record-level and file-level measures. The record-level disclosure risk is estimated primarily using exhaustive tabulation. The file-level disclosure risk is estimated by fitting loglinear models on the observed sample counts in cells formed by key variables and their interactions. Funded by the National Center for Education Statistics. See Skinner and Shlomo (2008) <doi:10.1198/016214507000001328> for a description of the file-level risk measures and the loglinear model approach.
Simultaneous tests and confidence intervals are provided for one-way experimental designs with one or many normally distributed, primary response variables (endpoints). Differences (Hasler and Hothorn, 2011 <doi:10.2202/1557-4679.1258>) or ratios (Hasler and Hothorn, 2012 <doi:10.1080/19466315.2011.633868>) of means can be considered. Various contrasts can be chosen, unbalanced sample sizes are allowed as well as heterogeneous variances (Hasler and Hothorn, 2008 <doi:10.1002/bimj.200710466>) or covariance matrices (Hasler, 2014 <doi:10.1515/ijb-2012-0015>).
Easily override the default visual choices in ggplot2 to make your time series plots look more like the Wall Street Journal. Specific theme design choices include omitting x-axis grid lines and displaying sparse light grey y-axis grid lines. Additionally, this allows to label the y-axis scales with your units only displayed on the top-most number, while also removing the bottom most number (unless specifically overridden). The goal is visual simplicity, because who has time to waste looking at a cluttered graph?
This package provides connections to the epiviz web app (http://epiviz.cbcb.umd.edu) for interactive visualization of genomic data. Objects in R/bioc interactive sessions can be displayed in genome browser tracks or plots to be explored by navigation through genomic regions. Fundamental Bioconductor data structures are supported (e.g., GenomicRanges and RangedSummarizedExperiment objects), while providing an easy mechanism to support other data structures (through package epivizrData). Visualizations (using d3.js) can be easily added to the web app as well.
Average population attributable fractions are calculated for a set of risk factors (either binary or ordinal valued) for both prospective and case- control designs. Confidence intervals are found by Monte Carlo simulation. The method can be applied to either prospective or case control designs, provided an estimate of disease prevalence is provided. In addition to an exact calculation of AF, an approximate calculation, based on randomly sampling permutations has been implemented to ensure the calculation is computationally tractable when the number of risk factors is large.
This package implements Bayesian hierarchical models with flexible Gaussian process priors, focusing on Extended Latent Gaussian Models and incorporating various Gaussian process priors for Bayesian smoothing. Computations leverage finite element approximations and adaptive quadrature for efficient inference. Methods are detailed in Zhang, Stringer, Brown, and Stafford (2023) <doi:10.1177/09622802221134172>; Zhang, Stringer, Brown, and Stafford (2024) <doi:10.1080/10618600.2023.2289532>; Zhang, Brown, and Stafford (2023) <doi:10.48550/arXiv.2305.09914>; and Stringer, Brown, and Stafford (2021) <doi:10.1111/biom.13329>.
BAYesian inference for MEDical designs in R. Functions for the computation of Bayes factors for common biomedical research designs. Implemented are functions to test the equivalence (equiv_bf), non-inferiority (infer_bf), and superiority (super_bf) of an experimental group compared to a control group on a continuous outcome measure. Bayes factors for these three tests can be computed based on raw data (x, y) or summary statistics (n_x, n_y, mean_x, mean_y, sd_x, sd_y [or ci_margin and ci_level]).