Displays provenance graphically for provenance collected by the rdt or rdtLite
packages, or other tools providing compatible PROV JSON output. The exact format of the JSON created by rdt and rdtLite
is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>
. More information about rdtLite
and associated tools is available at <https://github.com/End-to-end-provenance/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi: 10.3390/informatics5010012>.
The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification. In particular, it normalizes the surrogate to resemble gaussian mixture and leverages the remaining features through random corruption denoising. Background and details about the method can be found at Yu et al. (2018) <doi:10.1093/jamia/ocx111>.
We build an Susceptible-Infectious-Recovered (SIR) model where the rate of infection is the sum of the household rate and the community rate. We estimate the posterior distribution of the parameters using the Metropolis algorithm. Further details may be found in: F Scott Dahlgren, Ivo M Foppa, Melissa S Stockwell, Celibell Y Vargas, Philip LaRussa
, Carrie Reed (2021) "Household transmission of influenza A and B within a prospective cohort during the 2013-2014 and 2014-2015 seasons" <doi:10.1002/sim.9181>.
Cluster-independent method based on topology structure of gene co-expression network for identifying feature gene sets, extracting cellular subpopulations, and elucidating intrinsic relationships among these subpopulations. Without prior cell clustering, SifiNet
circumvents potential inaccuracies in clustering that may influence subsequent analyses. This method is introduced in Qi Gao, Zhicheng Ji, Liuyang Wang, Kouros Owzar, Qi-Jing Li, Cliburn Chan, Jichun Xie "SifiNet
: a robust and accurate method to identify feature gene sets and annotate cells" (2024) <doi:10.1093/nar/gkae307>.
This package provides functions to compute compositional turnover using zeta-diversity, the number of species shared by multiple assemblages. The package includes functions to compute zeta-diversity for a specific number of assemblages and to compute zeta-diversity for a range of numbers of assemblages. It also includes functions to explain how zeta-diversity varies with distance and with differences in environmental variables between assemblages, using generalised linear models, linear models with negative constraints, generalised additive models,shape constrained additive models, and I-splines.
BANDITS is a Bayesian hierarchical model for detecting differential splicing of genes and transcripts, via DTU (differential transcript usage), between two or more conditions. The method uses a Bayesian hierarchical framework, which allows for sample specific proportions in a Dirichlet-Multinomial model, and samples the allocation of fragments to the transcripts. Parameters are inferred via MCMC (Markov chain Monte Carlo) techniques and a DTU test is performed via a multivariate Wald test on the posterior densities for the average relative abundance of transcripts.
This package implements beta regression for modeling beta-distributed dependent variables on the open unit interval (0, 1), e.g., rates and proportions, see Cribari-Neto and Zeileis (2010) <doi:10.18637/jss.v034.i02>. Moreover, extended-support beta regression models can accommodate dependent variables with boundary observations at 0 and/or 1. For the classical beta regression model, alternative specifications are provided: Bias-corrected and bias-reduced estimation, finite mixture models, and recursive partitioning for beta regression, see <doi:10.18637/jss.v048.i11>.
This package provides support for the foreach
looping construct. foreach
is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply
function, but doesn't require the evaluation of a function. Using foreach
without side effects also facilitates executing the loop in parallel.
Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: "Bayesian Exploratory Factor Analysis" (befa) from G. Conti, S. Frühwirth-Schnatter, J.J. Heckman, R. Piatek (2014) <doi:10.1016/j.jeconom.2014.06.008>, an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling.
Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the Arrow platform <https://arrow.apache.org/docs/r/>, which allows users to work with larger-than-memory census data using dplyr familiar functions. <https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr>.
Implemented are three Wald-type statistic and respective permuted versions for null hypotheses formulated in terms of cumulative hazard rate functions, medians and the concordance measure, respectively, in the general framework of survival factorial designs with possibly heterogeneous survival and/or censoring distributions, for crossed designs with an arbitrary number of factors and nested designs with up to three factors. Ditzhaus, Dobler and Pauly (2020) <doi:10.1177/0962280220980784> Ditzhaus, Janssen, Pauly (2020) <arXiv
: 2004.10818v2> Dobler and Pauly (2019) <doi:10.1177/0962280219831316>.
This package provides facilities to read, write and validate geographic metadata defined with ISO TC211 / OGC ISO geographic information metadata standards, and encoded using the ISO 19139 and ISO 19115-3 (XML) standard technical specifications. This includes ISO 19110 (Feature cataloguing), 19115 (dataset metadata), 19119 (service metadata) and 19136 (GML). Other interoperable schemas from the OGC are progressively supported as well, such as the Sensor Web Enablement (SWE) Common Data Model, the OGC GML Coverage Implementation Schema (GMLCOV), or the OGC GML Referenceable Grid (GMLRGRID).
This package provides tools for the calculation of effect sizes (standardised mean difference) and mean difference in pre-post controlled studies, including robust imputation of missing variances (standard deviation of changes) and correlations (Pearson correlation coefficient). The main function metacor_dual()
implements several methods for imputing missing standard deviation of changes or Pearson correlation coefficient, and generates transparent imputation reports. Designed for meta-analyses with incomplete summary statistics. For more details on the methods, see Higgins et al. (2023) and Fu et al. (2013).
This package provides tools for calculating disclosure risk measures for microdata, including record-level and file-level measures. The record-level disclosure risk is estimated primarily using exhaustive tabulation. The file-level disclosure risk is estimated by fitting loglinear models on the observed sample counts in cells formed by key variables and their interactions. Funded by the National Center for Education Statistics. See Skinner and Shlomo (2008) <doi:10.1198/016214507000001328> for a description of the file-level risk measures and the loglinear model approach.
Simultaneous tests and confidence intervals are provided for one-way experimental designs with one or many normally distributed, primary response variables (endpoints). Differences (Hasler and Hothorn, 2011 <doi:10.2202/1557-4679.1258>) or ratios (Hasler and Hothorn, 2012 <doi:10.1080/19466315.2011.633868>) of means can be considered. Various contrasts can be chosen, unbalanced sample sizes are allowed as well as heterogeneous variances (Hasler and Hothorn, 2008 <doi:10.1002/bimj.200710466>) or covariance matrices (Hasler, 2014 <doi:10.1515/ijb-2012-0015>).
Easily override the default visual choices in ggplot2 to make your time series plots look more like the Wall Street Journal. Specific theme design choices include omitting x-axis grid lines and displaying sparse light grey y-axis grid lines. Additionally, this allows to label the y-axis scales with your units only displayed on the top-most number, while also removing the bottom most number (unless specifically overridden). The goal is visual simplicity, because who has time to waste looking at a cluttered graph?
Average population attributable fractions are calculated for a set of risk factors (either binary or ordinal valued) for both prospective and case- control designs. Confidence intervals are found by Monte Carlo simulation. The method can be applied to either prospective or case control designs, provided an estimate of disease prevalence is provided. In addition to an exact calculation of AF, an approximate calculation, based on randomly sampling permutations has been implemented to ensure the calculation is computationally tractable when the number of risk factors is large.
BAYesian inference for MEDical designs in R. Functions for the computation of Bayes factors for common biomedical research designs. Implemented are functions to test the equivalence (equiv_bf), non-inferiority (infer_bf), and superiority (super_bf) of an experimental group compared to a control group on a continuous outcome measure. Bayes factors for these three tests can be computed based on raw data (x, y) or summary statistics (n_x, n_y, mean_x, mean_y, sd_x, sd_y [or ci_margin and ci_level]).
This package implements Bayesian hierarchical models with flexible Gaussian process priors, focusing on Extended Latent Gaussian Models and incorporating various Gaussian process priors for Bayesian smoothing. Computations leverage finite element approximations and adaptive quadrature for efficient inference. Methods are detailed in Zhang, Stringer, Brown, and Stafford (2023) <doi:10.1177/09622802221134172>; Zhang, Stringer, Brown, and Stafford (2024) <doi:10.1080/10618600.2023.2289532>; Zhang, Brown, and Stafford (2023) <doi:10.48550/arXiv.2305.09914>
; and Stringer, Brown, and Stafford (2021) <doi:10.1111/biom.13329>.
This package provides tools for estimation and clustering of spherical data, seamlessly integrated with the flexmix package. Includes the necessary M-step implementations for both Poisson Kernel-Based Distribution (PKBD) and spherical Cauchy distribution. Additionally, the package provides random number generators for PKBD and spherical Cauchy distribution. Methods are based on Golzy M., Markatou M. (2020) <doi:10.1080/10618600.2020.1740713>, Kato S., McCullagh
P. (2020) <doi:10.3150/20-bej1222> and Sablica L., Hornik K., Leydold J. (2023) <doi:10.1214/23-ejs2149>.
Visualize contact tracing data using a shiny app and estimate the incubation or latency time of an infectious disease respecting the following characteristics in the analysis; (i) doubly interval censoring with (partly) overlapping or distinct windows; (ii) an infection risk corresponding to exponential growth; (iii) right truncation allowing for individual truncation times; (iv) different choices concerning the family of the distribution. For our earlier work, we refer to Arntzen et al. (2023) <doi:10.1002/sim.9726>. A paper describing our approach in detail will follow.
Support in preparing a raw ESM dataset for statistical analysis. Preparation includes the handling of errors (mostly due to technological reasons) and the generating of new variables that are necessary and/or helpful in meeting the conditions when statistically analyzing ESM data. The functions in esmprep are meant to hierarchically lead from bottom, i.e. the raw (separated) ESM dataset(s), to top, i.e. a single ESM dataset ready for statistical analysis. This hierarchy evolved out of my personal experience in working with ESM data.
The algorithm of semi-supervised learning is based on finite Gaussian mixture models and includes a mechanism for handling missing data. It aims to fit a g-class Gaussian mixture model using maximum likelihood. The algorithm treats the labels of unclassified features as missing data, building on the framework introduced by Rubin (1976) <doi:10.2307/2335739> for missing data analysis. By taking into account the dependencies in the missing pattern, the algorithm provides more information for determining the optimal classifier, as specified by Bayes rule.
Calculates additive and dominance genetic relationship matrices and their inverses, in matrix and tabular-sparse formats. It includes functions for checking and processing pedigree, calculating inbreeding coefficients (Meuwissen & Luo, 1992 <doi:10.1186/1297-9686-24-4-305>), as well as functions to calculate the matrix of genetic group contributions (Q), and adding those contributions to the genetic merit of animals (Quaas (1988) <doi:10.3168/jds.S0022-0302(88)79691-5>). Calculation of Q is computationally extensive. There are computationally optimized functions to calculate Q.