Import, create and assemble data needed to fit spatial-statistical stream-network models using the SSN2 package for R'. Streams, observations, and prediction locations are represented as simple features and specific tools provided to define topological relationships between features; calculate the hydrologic distances (with flow-direction preserved) and the spatial additive function used to weight converging stream segments; and export the topological, spatial, and attribute information to an `SSN` (spatial stream network) object, which can be efficiently stored, accessed and analysed in R'. A detailed description of methods used to calculate and format the spatial data can be found in Peterson, E.E. and Ver Hoef, J.M., (2014) <doi:10.18637/jss.v056.i02>.
Compute the coordinates to produce a tendril plot. In the tendril plot, each tendril (branch) represents a type of events, and the direction of the tendril is dictated by on which treatment arm the event is occurring. If an event is occurring on the first of the two specified treatment arms, the tendril bends in a clockwise direction. If an event is occurring on the second of the treatment arms, the tendril bends in an anti-clockwise direction. Ref: Karpefors, M and Weatherall, J., "The Tendril Plot - a novel visual summary of the incidence, significance and temporal aspects of adverse events in clinical trials" - JAMIA 2018; 25(8): 1069-1073 <doi:10.1093/jamia/ocy016>.
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", summarized according to economic status (class), sex, age and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are the individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are also the data sets downloaded from the Kaggle competition and thus lowers the barrier to entry for users new to R or machine learing.
Unlock the power of large-scale geospatial analysis, quickly generate high-resolution kernel density visualizations, supporting advanced analysis tasks such as bandwidth-tuning and spatiotemporal analysis. Regardless of the size of your dataset, our library delivers efficient and accurate results. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu, Reynold Cheng (2023) <doi:10.1145/3555041.3589401>. Tsz Nam Chan, Rui Zang, Pak Lon Ip, Leong Hou U, Jianliang Xu (2023) <doi:10.1145/3555041.3589711>. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.1145/3514221.3517823>. Tsz Nam Chan, Pak Lon Ip, Kaiyan Zhao, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3554821.3554855>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3503585.3503591>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu (2022) <doi:10.14778/3494124.3494135>. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Weng Hou Tong, Shivansh Mittal, Ye Li, Reynold Cheng (2021) <doi:10.14778/3476311.3476312>. Tsz Nam Chan, Zhe Li, Leong Hou U, Jianliang Xu, Reynold Cheng (2021) <doi:10.14778/3461535.3461540>. Tsz Nam Chan, Reynold Cheng, Man Lung Yiu (2020) <doi:10.1145/3318464.3380561>. Tsz Nam Chan, Leong Hou U, Reynold Cheng, Man Lung Yiu, Shivansh Mittal (2020) <doi:10.1109/TKDE.2020.3018376>. Tsz Nam Chan, Man Lung Yiu, Leong Hou U (2019) <doi:10.1109/ICDE.2019.00055>.
Fit Bayesian multivariate GARCH models using Stan for full Bayesian inference. Generate (weighted) forecasts for means, variances (volatility) and correlations. Currently DCC(P,Q), CCC(P,Q), pdBEKK(P,Q
), and BEKK(P,Q) parameterizations are implemented, based either on a multivariate gaussian normal or student-t distribution. DCC and CCC models are based on Engle (2002) <doi:10.1198/073500102288618487> and Bollerslev (1990). The BEKK parameterization follows Engle and Kroner (1995) <doi:10.1017/S0266466600009063> while the pdBEKK
as well as the estimation approach for this package is described in Rast et al. (2020) <doi:10.31234/osf.io/j57pk>. The fitted models contain rstan objects and can be examined with rstan functions.
Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes, but often require collection of additional information may be cumbersome to apply to models that yield a continuous result. Decision curve analysis is a method for evaluating and comparing prediction models that incorporates clinical consequences, requires only the data set on which the models are tested, and can be applied to models that have either continuous or dichotomous results. See the following references for details on the methods: Vickers (2006) <doi:10.1177/0272989X06295361>, Vickers (2008) <doi:10.1186/1472-6947-8-53>, and Pfeiffer (2020) <doi:10.1002/bimj.201800240>.
Open source data allows for reproducible research and helps advance our knowledge. The purpose of this package is to collate open source ophthalmic data sets curated for direct use. This is real life data of people with intravitreal injections with anti-vascular endothelial growth factor (anti-VEGF), due to age-related macular degeneration or diabetic macular edema. Associated publications of the data sets: Fu et al. (2020) <doi:10.1001/jamaophthalmol.2020.5044>, Moraes et al (2020) <doi:10.1016/j.ophtha.2020.09.025>, Fasler et al. (2019) <doi:10.1136/bmjopen-2018-027441>, Arpa et al. (2020) <doi:10.1136/bjophthalmol-2020-317161>, Kern et al. 2020, <doi:10.1038/s41433-020-1048-0>.
Genotyping assays for bi-allelic markers (e.g. SNPs) produce signal intensities for the two alleles. fitPoly
assigns genotypes (allele dosages) to a collection of polyploid samples based on these signal intensities. fitPoly
replaces the older package fitTetra
that was limited (a.o.) to only tetraploid populations whereas fitPoly
accepts any ploidy level. Reference: Voorrips RE, Gort G, Vosman B (2011) <doi:10.1186/1471-2105-12-172>. New functions added on conversion of data from SNP array software formats, drawing of XY-scatterplots with or without genotype colors, checking against expected F1 segregation patterns, comparing results from two different assays (probes) for the same SNP, recovery from a saveMarkerModels()
crash.
Network meta-analysis for survival outcome data often involves several studies only involve dichotomized outcomes (e.g., the numbers of event and sample sizes of individual arms). To combine these different outcome data, Woods et al. (2010) <doi:10.1186/1471-2288-10-54> proposed a Bayesian approach using complicated hierarchical models. Besides, frequentist approaches have been alternative standard methods for the statistical analyses of network meta-analysis, and the methodology has been well established. We proposed an easy-to-implement method for the network meta-analysis based on the frequentist framework in Noma and Maruo (2025) <doi:10.1101/2025.01.23.25321051>. This package involves some convenient functions to implement the simple synthesis method.
Set of functions that improves the graphical presentations of the functions: wave.correlation and spin.correlation (waveslim package, Whitcher 2012) and the wave.multiple.correlation and wave.multiple.cross.correlation (wavemulcor package, Fernandez-Macho 2012b). The plot outputs (heatmaps) can be displayed in the screen or can be saved as PNG or JPG images or as PDF or EPS formats. The W2CWM2C package also helps to handle the (input data) multivariate time series easily as a list of N elements (times series) and provides a multivariate data set (dataexample) to exemplify its use. A description of the package was published in a scientific paper: Polanco-Martinez and Fernandez-Macho (2014), <doi:10.1109/MCSE.2014.96>.
treekoR
is a novel framework that aims to utilise the hierarchical nature of single cell cytometry data to find robust and interpretable associations between cell subsets and patient clinical end points. These associations are aimed to recapitulate the nested proportions prevalent in workflows inovlving manual gating, which are often overlooked in workflows using automatic clustering to identify cell populations. We developed treekoR
to: Derive a hierarchical tree structure of cell clusters; quantify a cell types as a proportion relative to all cells in a sample (%total), and, as the proportion relative to a parent population (%parent); perform significance testing using the calculated proportions; and provide an interactive html visualisation to help highlight key results.
This package provides functions are designed to facilitate access to and utility with large scale, publicly available environmental data in R. The package contains functions for downloading raw data files from web URLs (download_data()
), processing the raw data files into clean spatial objects (process_covariates()
), and extracting values from the spatial data objects at point and polygon locations (calculate_covariates()
). These functions call a series of source-specific functions which are tailored to each data sources/datasets particular URL structure, data format, and spatial/temporal resolution. The functions are tested, versioned, and open source and open access. For sum_edc()
method details, see Messier, Akita, and Serre (2012) <doi:10.1021/es203152a>.
Chemical analysis of proteins based on their amino acid compositions. Amino acid compositions can be read from FASTA files and used to calculate chemical metrics including carbon oxidation state and stoichiometric hydration state, as described in Dick et al. (2020) <doi:10.5194/bg-17-6145-2020>. Other properties that can be calculated include protein length, grand average of hydropathy (GRAVY), isoelectric point (pI
), molecular weight (MW), standard molal volume (V0), and metabolic costs (Akashi and Gojobori, 2002 <doi:10.1073/pnas.062526999>; Wagner, 2005 <doi:10.1093/molbev/msi126>; Zhang et al., 2018 <doi:10.1038/s41467-018-06461-1>). A database of amino acid compositions of human proteins derived from UniProt
is provided.
Given a multivariate dataset and some knowledge about the dependencies between its features, it is customary to fit a statistical model to the features to infer parameters of interest. Such a procedure implicitly assumes that the sample is exchangeable. This package provides a flexible non-parametric test of this exchangeability assumption, allowing the user to specify the feature dependencies by hand as long as features can be grouped into disjoint independent sets. This package also allows users to test a dual hypothesis, which is, given that the sample is exchangeable, does a proposed grouping of the features into disjoint sets also produce statistically independent sets of features? See Aw, Spence and Song (2023) for the accompanying paper.
Analyzes the function calls in an R package and creates a hive plot of the calls, dividing them among functions that only make outgoing calls (sources), functions that have only incoming calls (sinks), and those that have both incoming calls and make outgoing calls (managers). Function calls can be mapped by their absolute numbers, their normalized absolute numbers, or their rank. FuncMap
should be useful for comparing packages at a high level for their overall design. Plus, it's just plain fun. The hive plot concept was developed by Martin Krzywinski (www.hiveplot.com) and inspired this package. Note: this package is maintained for historical reasons. HiveR
is a full package for creating hive plots.
This is a method for Allele-specific DNA Copy Number profiling for whole-Exome sequencing data. Given the allele-specific coverage and site biases at the variant loci, this program segments the genome into regions of homogeneous allele-specific copy number. It requires, as input, the read counts for each variant allele in a pair of case and control samples, as well as the site biases. For detection of somatic mutations, the case and control samples can be the tumor and normal sample from the same individual. The implemented method is based on the paper: Chen, H., Jiang, Y., Maxwell, K., Nathanson, K. and Zhang, N. (under review). Allele-specific copy number estimation by whole Exome sequencing.
Highly efficient functions for estimating various rank (centrality) measures of nodes in bipartite graphs (two-mode networks). Includes methods for estimating HITS, CoHITS
, BGRM, and BiRank
with implementation primarily inspired by He et al. (2016) <doi:10.1109/TKDE.2016.2611584>. Also provides easy-to-use tools for efficiently estimating PageRank
in one-mode graphs, incorporating or removing edge-weights during rank estimation, projecting two-mode graphs to one-mode, and for converting edgelists and matrices to sparseMatrix
format. Best of all, the package's rank estimators can work directly with common formats of network data including edgelists (class data.frame, data.table, or tbl_df) and adjacency matrices (class matrix or dgCMatrix
).
Emissions are the mass of pollutants released into the atmosphere. Air quality models need emissions data, with spatial and temporal distribution, to represent air pollutant concentrations. This package, eixport, creates inputs for the air quality models WRF-Chem Grell et al (2005) <doi:10.1016/j.atmosenv.2005.04.027>, MUNICH Kim et al (2018) <doi:10.5194/gmd-11-611-2018> , BRAMS-SPM Freitas et al (2005) <doi:10.1016/j.atmosenv.2005.07.017> and RLINE Snyder et al (2013) <doi:10.1016/j.atmosenv.2013.05.074>. See the eixport website (<https://atmoschem.github.io/eixport/>) for more information, documentations and examples. More details in Ibarra-Espinosa et al (2018) <doi:10.21105/joss.00607>.
Fit occupancy models in Stan via brms'. The full variety of brms formula-based effects structures are available to use in multiple classes of occupancy model, including single-season models, models with data augmentation for never-observed species, dynamic (multiseason) models with explicit colonization and extinction processes, and dynamic models with autologistic occupancy dynamics. Formulas can be specified for all relevant distributional terms, including detection and one or more of occupancy, colonization, extinction, and autologistic depending on the model type. Several important forms of model post-processing are provided. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Socolar & Mills (2023) <doi:10.1101/2023.10.26.564080>.
Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical methods rely on strong assumptions such as the exclusion criterion, which states that instrumental effects must be entirely mediated by treatments. In the so-called "leaky" IV setting, candidate instruments are allowed to have some direct influence on outcomes, rendering the average treatment effect (ATE) unidentifiable. But with limits on the amount of information leakage, we may still recover sharp bounds on the ATE, providing partial identification. This package implements methods for ATE bounding in the leaky IV setting with linear structural equations. For details, see Watson et al. (2024) <doi:10.48550/arXiv.2404.04446>
.
Split an untargeted metabolomics data set into a set of likely true metabolites and a set of likely measurement artifacts. This process involves comparing missing rates of pooled plasma samples and biological samples. The functions assume a fixed injection order of samples where biological samples are randomized and processed between intermittent pooled plasma samples. By comparing patterns of missing data across injection order, metabolites that appear in blocks and are likely artifacts can be separated from metabolites that seem to have random dispersion of missing data. The two main metrics used are: 1. the number of consecutive blocks of samples with present data and 2. the correlation of missing rates between biological samples and flanking pooled plasma samples.
Monte Carlo confidence intervals for free and defined parameters in models fitted in the structural equation modeling package lavaan can be generated using the semmcci package. semmcci has three main functions, namely, MC()
, MCMI()
, and MCStd()
. The output of lavaan is passed as the first argument to the MC()
function or the MCMI()
function to generate Monte Carlo confidence intervals. Monte Carlo confidence intervals for the standardized estimates can also be generated by passing the output of the MC()
function or the MCMI()
function to the MCStd()
function. A description of the package and code examples are presented in Pesigan and Cheung (2023) <doi:10.3758/s13428-023-02114-4>.
Interactions between proteins occur in many, if not most, biological processes. Most proteins perform their functions in networks associated with other proteins and other biomolecules. This fact has motivated the development of a variety of experimental methods for the identification of protein interactions. This variety has in turn ushered in the development of numerous different computational approaches for modeling and predicting protein interactions. Sometimes an experiment is aimed at identifying proteins closely related to some interesting proteins. A network based statistical learning method is used to infer the putative functions of proteins from the known functions of its neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions.
This package provides users with its associated functions for pedagogical purposes in visually learning Bayesian networks and Markov chain Monte Carlo (MCMC) computations. It enables users to: a) Create and examine the (starting) graphical structure of Bayesian networks; b) Create random Bayesian networks using a dataset with customized constraints; c) Generate Stan code for structures of Bayesian networks for sampling the data and learning parameters; d) Plot the network graphs; e) Perform Markov chain Monte Carlo computations and produce graphs for posteriors checks. The package refers to one reference item, which describes the methods and algorithms: Vuong, Quan-Hoang and La, Viet-Phuong (2019) <doi:10.31219/osf.io/w5dx6> The bayesvl R package. Open Science Framework (May 18).