MetaPhOR
was developed to enable users to assess metabolic dysregulation using transcriptomic-level data (RNA-sequencing and Microarray data) and produce publication-quality figures. A list of differentially expressed genes (DEGs), which includes fold change and p value, from DESeq2 or limma, can be used as input, with sample size for MetaPhOR
, and will produce a data frame of scores for each KEGG pathway. These scores represent the magnitude and direction of transcriptional change within the pathway, along with estimated p-values.MetaPhOR
then uses these scores to visualize metabolic profiles within and between samples through a variety of mechanisms, including: bubble plots, heatmaps, and pathway models.
Poisson-lognormal generalized linear mixed model analysis of multivariate counts data using MCMC, aiming to infer the changes in relative proportions of individual variables. The package was originally designed for sequence-based analysis of microbial communities ("metabarcoding", variables = operational taxonomic units, OTUs), but can be used for other types of multivariate counts, such as in ecological applications (variables = species). The results are summarized and plotted using ggplot2 functions. Includes functions to remove sample and variable outliers and reformat counts into normalized log-transformed values for correlation and principal component/coordinate analysis. Walkthrough and examples: http://www.bio.utexas.edu/research/matz_lab/matzlab/Methods_files/walkthroughExample_mcmcOTU_R.txt
.
Simulate complex traits given a SNP genotype matrix and model parameters (the desired heritability, number of causal loci, and either the true ancestral allele frequencies used to generate the genotypes or the mean kinship for a real dataset). Emphasis on avoiding common biases due to the use of estimated allele frequencies. The code selects random loci to be causal, constructs coefficients for these loci and random independent non-genetic effects, and can optionally generate random group effects. Traits can follow three models: random coefficients, fixed effect sizes, and infinitesimal (multivariate normal). GWAS method benchmarking functions are also provided. Described in Yao and Ochoa (2022) <doi:10.1101/2022.03.25.485885>.
Converts the dates to different SAS date formats. In SAS dates are a special case of numeric values. Each day is assigned a specific numeric value, starting from January 1, 1960. This date is assigned the date value 0, and the next date has a date value of 1 and so on. The previous days to this date are represented by -1 , -2 and so on. With this approach, SAS can represent any date in the future or any date in the past. There are many date formats used in SAS to represent date-time. Here, we try to develop functions which will convert the date to different SAS date formats.
Computes Bayesian wavelet shrinkage credible intervals for nonparametric regression. The method uses cumulants to derive Bayesian credible intervals for wavelet regression estimates. The first four cumulants of the posterior distribution of the estimates are expressed in terms of the observed data and integer powers of the mother wavelet functions. These powers are closely approximated by linear combinations of wavelet scaling functions at an appropriate finer scale. Hence, a suitable modification of the discrete wavelet transform allows the posterior cumulants to be found efficiently for any data set. Johnson transformations then yield the credible intervals themselves. Barber, S., Nason, G.P. and Silverman, B.W. (2002) <doi:10.1111/1467-9868.00332>.
When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations. The R package careless provides solutions designed to detect such careless / insufficient effort responses by allowing easy calculation of indices proposed in the literature. It currently supports the calculation of longstring, even-odd consistency, psychometric synonyms/antonyms, Mahalanobis distance, and intra-individual response variability (also termed inter-item standard deviation). For a review of these methods, see Curran (2016) <doi:10.1016/j.jesp.2015.07.006>.
This package provides data for Kaya identity variables (population, gross domestic product, primary energy consumption, and energy-related CO2 emissions) for the world and for individual nations, and utility functions for looking up data, plotting trends of Kaya variables, and plotting the fuel mix for a given country or region. The Kaya identity (Yoichi Kaya and Keiichi Yokobori, "Environment, Energy, and Economy: Strategies for Sustainability" (United Nations University Press, 1998) and <https://en.wikipedia.org/wiki/Kaya_identity>) expresses a nation's or region's greenhouse gas emissions in terms of its population, per-capita Gross Domestic Product, the energy intensity of its economy, and the carbon-intensity of its energy supply.
This package contains functions performing Bayesian inference for meta-analytic and network meta-analytic models through Markov chain Monte Carlo algorithm. Currently, the package implements Hui Yao, Sungduk Kim, Ming-Hui Chen, Joseph G. Ibrahim, Arvind K. Shah, and Jianxin Lin (2015) <doi:10.1080/01621459.2015.1006065> and Hao Li, Daeyoung Lim, Ming-Hui Chen, Joseph G. Ibrahim, Sungduk Kim, Arvind K. Shah, Jianxin Lin (2021) <doi:10.1002/sim.8983>. For maximal computational efficiency, the Markov chain Monte Carlo samplers for each model, written in C++, are fine-tuned. This software has been developed under the auspices of the National Institutes of Health and Merck & Co., Inc., Kenilworth, NJ, USA.
Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as Facebook', Twitter and Youtube'), online products reviews, and so on. The utility of opitools was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.
This package provides high-level API and a wide range of options to create stunning, publication-quality plots effortlessly. It is built upon ggplot2 and other plotting packages, and is designed to be easy to use and to work seamlessly with ggplot2 objects. It is particularly useful for creating complex plots with multiple layers, facets, and annotations. It also provides a set of functions to create plots for specific types of data, such as Venn diagrams, alluvial diagrams, and phylogenetic trees. The package is designed to be flexible and customizable, and to work well with the ggplot2 ecosystem. The API can be found at <https://pwwang.github.io/plotthis/reference/index.html>.
GARFIELD is a non-parametric functional enrichment analysis approach described in the paper GARFIELD: GWAS analysis of regulatory or functional information enrichment with LD correction. Briefly, it is a method that leverages GWAS findings with regulatory or functional annotations (primarily from ENCODE and Roadmap epigenomics data) to find features relevant to a phenotype of interest. It performs greedy pruning of GWAS SNPs (LD r2 > 0.1) and then annotates them based on functional information overlap. Next, it quantifies Fold Enrichment (FE) at various GWAS significance cutoffs and assesses them by permutation testing, while matching for minor allele frequency, distance to nearest transcription start site and number of LD proxies (r2 > 0.8).
ProteoMM
is a statistical method to perform model-based peptide-level differential expression analysis of single or multiple datasets. For multiple datasets ProteoMM
produces a single fold change and p-value for each protein across multiple datasets. ProteoMM
provides functionality for normalization, missing value imputation and differential expression. Model-based peptide-level imputation and differential expression analysis component of package follows the analysis described in “A statistical framework for protein quantitation in bottom-up MS based proteomics" (Karpievitch et al. Bioinformatics 2009). EigenMS
normalisation is implemented as described in "Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition." (Karpievitch et al. Bioinformatics 2009).
This package provides an R scripting interface to the open-source SAGA-GIS (System for Automated Geoscientific Analyses Geographical Information System) software. Rsagacmd dynamically generates R functions for every SAGA-GIS geoprocessing tool based on the user's currently installed SAGA-GIS version. These functions are contained within an S3 object and are accessed as a named list of libraries and tools. This structure facilitates an easier scripting experience by organizing the large number of SAGA-GIS geoprocessing tools (>700) by their respective library. Interactive scripting can fully take advantage of code autocompletion tools (e.g. in RStudio'), allowing for each tools syntax to be quickly recognized. Furthermore, the most common types of spatial data (via the terra', sp', and sf packages) along with non-spatial data are automatically passed from R to the SAGA-GIS command line tool for geoprocessing operations, and the results are loaded as the appropriate R object. Outputs from individual SAGA-GIS tools can also be chained using pipes from the magrittr and dplyr packages to combine complex geoprocessing operations together in a single statement. SAGA-GIS is available under a GPLv2 / LGPLv2 licence from <https://sourceforge.net/projects/saga-gis/> including Windows x86/x64 and macOS
binaries. SAGA-GIS is also included in Debian/Ubuntu default software repositories. Rsagacmd has currently been tested on SAGA-GIS versions from 2.3.1 to 9.5.1 on Windows, Linux and macOS
.
The functions proposed in this package allows to evaluate the process of measurement of the chemical components of water numerically or graphically. TSSS()
, ICHS and datacheck()
functions are useful to control the quality of measurements of chemical components of a sample of water. If one or more measurements include an error, the generated graph will indicate it with a position of the point that represents the sample outside the confidence interval. The function CI()
allows to evaluate the possibility of contamination of a water sample after being obtained. Validation()
is a function that allows to calculate the quality parameters of a technique for the measurement of a chemical component.
Designed to support the visualization, numerical computation, qualitative analysis, model-data fusion, and stochastic simulation for autonomous systems of differential equations. Euler and Runge-Kutta methods are implemented, along with tools to visualize the two-dimensional phaseplane. Likelihood surfaces and a simple Markov Chain Monte Carlo parameter estimator can be used for model-data fusion of differential equations and empirical models. The Euler-Maruyama method is provided for simulation of stochastic differential equations. The package was originally written for internal use to support teaching by Zobitz, and refined to support the text "Exploring modeling with data and differential equations using R" by John Zobitz (2021) <https://jmzobitz.github.io/ModelingWithR/index.html>
.
This package provides a collection of fast and flexible functions for analyzing omics data in observational studies. Multiple different approaches for integrating multiple environmental/genetic factors, omics data, and/or phenotype data are implemented. This includes functions for performing omics wide association studies with one or more variables of interest as the exposure or outcome; a function for performing a meet in the middle analysis for linking exposures, omics, and outcomes (as described by Chadeau-Hyam et al., (2010) <doi:10.3109/1354750X.2010.533285>); and a function for performing a mixtures analysis across all omics features using quantile-based g-Computation (as described by Keil et al., (2019) <doi:10.1289/EHP5838>).
IsoSpec
is a fine structure calculator used for obtaining the most probable masses of a chemical compound given the frequencies of the composing isotopes and their masses. It finds the smallest set of isotopologues with a given probability. The probability is assumed to be that of the product of multinomial distributions, each corresponding to one particular element and parametrized by the frequencies of finding these elements in nature. These numbers are supplied by IUPAC - the International Union of Pure and Applied Chemistry. See: Lacki, Valkenborg, Startek (2020) <DOI:10.1021/acs.analchem.0c00959> and Lacki, Startek, Valkenborg, Gambin (2017) <DOI:10.1021/acs.analchem.6b01459> for the description of the algorithms used.
This package provides a stochastic, spatially-explicit, demo-genetic model simulating the spread and evolution of a plant pathogen in a heterogeneous landscape to assess resistance deployment strategies. It is based on a spatial geometry for describing the landscape and allocation of different cultivars, a dispersal kernel for the dissemination of the pathogen, and a SEIR ('Susceptible-Exposed-Infectious-Removedâ ) structure with a discrete time step. It provides a useful tool to assess the performance of a wide range of deployment options with respect to their epidemiological, evolutionary and economic outcomes. Loup Rimbaud, Julien Papaïx, Jean-François Rey, Luke G Barrett, Peter H Thrall (2018) <doi:10.1371/journal.pcbi.1006067>.
The aim is to develop an R package, which is the new.dist package, for the probability (density) function, the distribution function, the quantile function and the associated random number generation function for discrete and continuous distributions, which have recently been proposed in the literature. This package implements the following distributions: The Power Muth Distribution, a Bimodal Weibull Distribution, the Discrete Lindley Distribution, The Gamma-Lomax Distribution, Weighted Geometric Distribution, a Power Log-Dagum Distribution, Kumaraswamy Distribution, Lindley Distribution, the Unit-Inverse Gaussian Distribution, EP Distribution, Akash Distribution, Ishita Distribution, Maxwell Distribution, the Standard Omega Distribution, Slashed Generalized Rayleigh Distribution, Two-Parameter Rayleigh Distribution, Muth Distribution, Uniform-Geometric Distribution, Discrete Weibull Distribution.
Classify Open Street Map (OSM) features into meaningful functional or analytical categories. Designed for OSM PBF files, e.g. from <https://download.geofabrik.de/> imported as spatial data frames. A classification consists of a list of categories that are related to certain OSM tags and values. Given a layer from an OSM PBF file and a classification, the main osm_classify()
function returns a classification data table giving, for each feature, the primary and alternative categories (if there is overlap) assigned, and the tag(s) and value(s) matched on. The package also contains a classification of OSM features by economic function/significance, following Krantz (2023) <https://www.ssrn.com/abstract=4537867>.
Powerful graphical displays and statistical tools for structured problem solving and diagnosis. The functions of the sherlock package are especially useful for applying the process of elimination as a problem diagnosis technique. The sherlock package was designed to seamlessly work with the tidyverse set of packages and provides a collection of graphical displays built on top of the ggplot and plotly packages, such as different kinds of small multiple plots as well as helper functions such as adding reference lines, normalizing observations, reading in data or saving analysis results in an Excel file. References: David Hartshorne (2019, ISBN: 978-1-5272-5139-7). Stefan H. Steiner, R. Jock MacKay
(2005, ISBN: 0873896467).
Diagnostics for fixed effects linear and general linear regression models fitted with survey data. Extensions of standard diagnostics to complex survey data are included: standardized residuals, leverages, Cook's D, dfbetas, dffits, condition indexes, and variance inflation factors as found in Li and Valliant (Surv. Meth., 2009, 35(1), pp. 15-24; Jnl. of Off. Stat., 2011, 27(1), pp. 99-119; Jnl. of Off. Stat., 2015, 31(1), pp. 61-75); Liao and Valliant (Surv. Meth., 2012, 38(1), pp. 53-62; Surv. Meth., 2012, 38(2), pp. 189-202). Variance inflation factors and condition indexes are also computed for some general linear models as described in Liao (U. Maryland thesis, 2010).
An Electronic Data Capture system (EDC) and Data Standard agnostic solution that enables the pharmaceutical programming community to develop Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) datasets in R. The reusable algorithms concept in sdtm.oak provides a framework for modular programming and can potentially automate the conversion of raw clinical data to SDTM through standardized SDTM specifications. SDTM is one of the required standards for data submission to the Food and Drug Administration (FDA) in the United States and Pharmaceuticals and Medical Devices Agency (PMDA) in Japan. SDTM standards are implemented following the SDTM Implementation Guide as defined by CDISC <https://www.cdisc.org/standards/foundational/sdtmig>.
This package provides functions to generate K-fold cross validation (CV) folds and CV test error estimates that take into account how a survey dataset's sampling design was constructed (SRS, clustering, stratification, and/or unequal sampling weights). You can input linear and logistic regression models, along with data and a type of survey design in order to get an output that can help you determine which model best fits the data using K-fold cross validation. Our paper on "K-Fold Cross-Validation for Complex Sample Surveys" by Wieczorek, Guerin, and McMahon
(2022) <doi:10.1002/sta4.454> explains why differing how we take folds based on survey design is useful.