Contemporary software commonly used to design stated preference experiments are expensive and the code is closed source. This is a free software package with an easy to use interface to make flexible stated preference experimental designs using state-of-the-art methods. For an overview of stated choice experimental design theory, see e.g., Rose, J. M. & Bliemer, M. C. J. (2014) in Hess S. & Daly. A. <doi:10.4337/9781781003152>. The package website can be accessed at <https://spdesign.edsandorf.me>. We acknowledge funding from the European Unionâ s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant INSPiRE
(Grant agreement ID: 793163).
This package provides a suite of methods for powerful and robust microbiome data analysis, including data normalization, data simulation, community-level association testing and differential abundance analysis. It implements generalized UniFrac distances, Geometric Mean of Pairwise Ratios (GMPR) normalization, semiparametric data simulator, distance-based statistical methods, and feature- based statistical methods. The distance-based statistical methods include three extensions of PERMANOVA:
PERMANOVA using the Freedman-Lane permutation scheme,
PERMANOVA omnibus test using multiple matrices, and
analytical approach to approximating PERMANOVA p-value.
Feature-based statistical methods include linear model-based methods for differential abundance analysis of zero-inflated high-dimensional compositional data.
This package provides a common misconception is that the Hochberg procedure comes up with adequate overall type I error control when test statistics are positively correlated. However, unless the test statistics follow some standard distributions, the Hochberg procedure requires a more stringent positive dependence assumption, beyond mere positive correlation, to ensure valid overall type I error control. To fill this gap, we formulate statistical tests grounded in rank correlation coefficients to validate fulfillment of the positive dependence through stochastic ordering (PDS) condition. See Gou, J., Wu, K. and Chen, O. Y. (2024). Rank correlation coefficient based tests on positive dependence through stochastic ordering with application in cancer studies, Technical Report.
Enables the automation of actions across the pipeline, including initial steps of transforming binocular data and gap repair to event-based processing such as fixations, saccades, and entry/duration in Areas of Interest (AOIs). It also offers visualisation of eye movement and AOI entries. These tools take relatively raw (trial, time, x, and y form) data and can be used to return fixations, saccades, and AOI entries and time spent in AOIs. As the tools rely on this basic data format, the functions can work with data from any eye tracking device. Implements fixation and saccade detection using methods proposed by Salvucci and Goldberg (2000) <doi:10.1145/355017.355028>.
An English language syllable counter, plus readability score measure-er. For readability, we support Flesch Reading Ease and Flesch-Kincaid Grade Level ('Kincaid et al'. 1975) <https://stars.library.ucf.edu/cgi/viewcontent.cgi?article=1055&context=istlibrary>, Automated Readability Index ('Senter and Smith 1967) <https://apps.dtic.mil/sti/citations/AD0667273>, Simple Measure of Gobbledygook (McLaughlin
1969), and Coleman-Liau (Coleman and Liau 1975) <doi:10.1037/h0076540>. The package has been carefully optimized and should be very efficient, both in terms of run time performance and memory consumption. The main methods are vectorized by document, and scores for multiple documents are computed in parallel via OpenMP
'.
This package provides interactive plotting for mathematical models of infectious disease spread. Users can choose from a variety of common built-in ordinary differential equation (ODE) models (such as the SIR, SIRS, and SIS models), or create their own. This latter flexibility allows shinySIR
to be applied to simple ODEs from any discipline. The package is a useful teaching tool as students can visualize how changing different parameters can impact model dynamics, with minimal knowledge of coding in R. The built-in models are inspired by those featured in Keeling and Rohani (2008) <doi:10.2307/j.ctvcm4gk0> and Bjornstad (2018) <doi:10.1007/978-3-319-97487-3>.
MetaPhOR
was developed to enable users to assess metabolic dysregulation using transcriptomic-level data (RNA-sequencing and Microarray data) and produce publication-quality figures. A list of differentially expressed genes (DEGs), which includes fold change and p value, from DESeq2 or limma, can be used as input, with sample size for MetaPhOR
, and will produce a data frame of scores for each KEGG pathway. These scores represent the magnitude and direction of transcriptional change within the pathway, along with estimated p-values.MetaPhOR
then uses these scores to visualize metabolic profiles within and between samples through a variety of mechanisms, including: bubble plots, heatmaps, and pathway models.
Implementation of LT-FH++, an extension of the liability threshold family history (LT-FH) model. LT-FH++ uses a Gibbs sampler for sampling from the truncated multivariate normal distribution and allows for flexible family structures. LT-FH++ was first described in Pedersen, Emil M., et al. (2022) <doi:10.1016/j.ajhg.2022.01.009> as an extension to LT-FH with more flexible family structures, and again as the age-dependent liability threshold (ADuLT
) model Pedersen, Emil M., et al. (2023) <https://www.nature.com/articles/s41467-023-41210-z> as an alternative to traditional time-to-event genome-wide association studies, where family history was not considered.
Poisson-lognormal generalized linear mixed model analysis of multivariate counts data using MCMC, aiming to infer the changes in relative proportions of individual variables. The package was originally designed for sequence-based analysis of microbial communities ("metabarcoding", variables = operational taxonomic units, OTUs), but can be used for other types of multivariate counts, such as in ecological applications (variables = species). The results are summarized and plotted using ggplot2 functions. Includes functions to remove sample and variable outliers and reformat counts into normalized log-transformed values for correlation and principal component/coordinate analysis. Walkthrough and examples: http://www.bio.utexas.edu/research/matz_lab/matzlab/Methods_files/walkthroughExample_mcmcOTU_R.txt
.
Simulate complex traits given a SNP genotype matrix and model parameters (the desired heritability, number of causal loci, and either the true ancestral allele frequencies used to generate the genotypes or the mean kinship for a real dataset). Emphasis on avoiding common biases due to the use of estimated allele frequencies. The code selects random loci to be causal, constructs coefficients for these loci and random independent non-genetic effects, and can optionally generate random group effects. Traits can follow three models: random coefficients, fixed effect sizes, and infinitesimal (multivariate normal). GWAS method benchmarking functions are also provided. Described in Yao and Ochoa (2022) <doi:10.1101/2022.03.25.485885>.
Converts the dates to different SAS date formats. In SAS dates are a special case of numeric values. Each day is assigned a specific numeric value, starting from January 1, 1960. This date is assigned the date value 0, and the next date has a date value of 1 and so on. The previous days to this date are represented by -1 , -2 and so on. With this approach, SAS can represent any date in the future or any date in the past. There are many date formats used in SAS to represent date-time. Here, we try to develop functions which will convert the date to different SAS date formats.
Computes Bayesian wavelet shrinkage credible intervals for nonparametric regression. The method uses cumulants to derive Bayesian credible intervals for wavelet regression estimates. The first four cumulants of the posterior distribution of the estimates are expressed in terms of the observed data and integer powers of the mother wavelet functions. These powers are closely approximated by linear combinations of wavelet scaling functions at an appropriate finer scale. Hence, a suitable modification of the discrete wavelet transform allows the posterior cumulants to be found efficiently for any data set. Johnson transformations then yield the credible intervals themselves. Barber, S., Nason, G.P. and Silverman, B.W. (2002) <doi:10.1111/1467-9868.00332>.
When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations. The R package careless provides solutions designed to detect such careless / insufficient effort responses by allowing easy calculation of indices proposed in the literature. It currently supports the calculation of longstring, even-odd consistency, psychometric synonyms/antonyms, Mahalanobis distance, and intra-individual response variability (also termed inter-item standard deviation). For a review of these methods, see Curran (2016) <doi:10.1016/j.jesp.2015.07.006>.
This package provides data for Kaya identity variables (population, gross domestic product, primary energy consumption, and energy-related CO2 emissions) for the world and for individual nations, and utility functions for looking up data, plotting trends of Kaya variables, and plotting the fuel mix for a given country or region. The Kaya identity (Yoichi Kaya and Keiichi Yokobori, "Environment, Energy, and Economy: Strategies for Sustainability" (United Nations University Press, 1998) and <https://en.wikipedia.org/wiki/Kaya_identity>) expresses a nation's or region's greenhouse gas emissions in terms of its population, per-capita Gross Domestic Product, the energy intensity of its economy, and the carbon-intensity of its energy supply.
This package contains functions performing Bayesian inference for meta-analytic and network meta-analytic models through Markov chain Monte Carlo algorithm. Currently, the package implements Hui Yao, Sungduk Kim, Ming-Hui Chen, Joseph G. Ibrahim, Arvind K. Shah, and Jianxin Lin (2015) <doi:10.1080/01621459.2015.1006065> and Hao Li, Daeyoung Lim, Ming-Hui Chen, Joseph G. Ibrahim, Sungduk Kim, Arvind K. Shah, Jianxin Lin (2021) <doi:10.1002/sim.8983>. For maximal computational efficiency, the Markov chain Monte Carlo samplers for each model, written in C++, are fine-tuned. This software has been developed under the auspices of the National Institutes of Health and Merck & Co., Inc., Kenilworth, NJ, USA.
Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as Facebook', Twitter and Youtube'), online products reviews, and so on. The utility of opitools was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.
This package provides high-level API and a wide range of options to create stunning, publication-quality plots effortlessly. It is built upon ggplot2 and other plotting packages, and is designed to be easy to use and to work seamlessly with ggplot2 objects. It is particularly useful for creating complex plots with multiple layers, facets, and annotations. It also provides a set of functions to create plots for specific types of data, such as Venn diagrams, alluvial diagrams, and phylogenetic trees. The package is designed to be flexible and customizable, and to work well with the ggplot2 ecosystem. The API can be found at <https://pwwang.github.io/plotthis/reference/index.html>.
GARFIELD is a non-parametric functional enrichment analysis approach described in the paper GARFIELD: GWAS analysis of regulatory or functional information enrichment with LD correction. Briefly, it is a method that leverages GWAS findings with regulatory or functional annotations (primarily from ENCODE and Roadmap epigenomics data) to find features relevant to a phenotype of interest. It performs greedy pruning of GWAS SNPs (LD r2 > 0.1) and then annotates them based on functional information overlap. Next, it quantifies Fold Enrichment (FE) at various GWAS significance cutoffs and assesses them by permutation testing, while matching for minor allele frequency, distance to nearest transcription start site and number of LD proxies (r2 > 0.8).
ProteoMM
is a statistical method to perform model-based peptide-level differential expression analysis of single or multiple datasets. For multiple datasets ProteoMM
produces a single fold change and p-value for each protein across multiple datasets. ProteoMM
provides functionality for normalization, missing value imputation and differential expression. Model-based peptide-level imputation and differential expression analysis component of package follows the analysis described in “A statistical framework for protein quantitation in bottom-up MS based proteomics" (Karpievitch et al. Bioinformatics 2009). EigenMS
normalisation is implemented as described in "Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition." (Karpievitch et al. Bioinformatics 2009).
The functions proposed in this package allows to evaluate the process of measurement of the chemical components of water numerically or graphically. TSSS()
, ICHS and datacheck()
functions are useful to control the quality of measurements of chemical components of a sample of water. If one or more measurements include an error, the generated graph will indicate it with a position of the point that represents the sample outside the confidence interval. The function CI()
allows to evaluate the possibility of contamination of a water sample after being obtained. Validation()
is a function that allows to calculate the quality parameters of a technique for the measurement of a chemical component.
Designed to support the visualization, numerical computation, qualitative analysis, model-data fusion, and stochastic simulation for autonomous systems of differential equations. Euler and Runge-Kutta methods are implemented, along with tools to visualize the two-dimensional phaseplane. Likelihood surfaces and a simple Markov Chain Monte Carlo parameter estimator can be used for model-data fusion of differential equations and empirical models. The Euler-Maruyama method is provided for simulation of stochastic differential equations. The package was originally written for internal use to support teaching by Zobitz, and refined to support the text "Exploring modeling with data and differential equations using R" by John Zobitz (2021) <https://jmzobitz.github.io/ModelingWithR/index.html>
.
This package provides a collection of fast and flexible functions for analyzing omics data in observational studies. Multiple different approaches for integrating multiple environmental/genetic factors, omics data, and/or phenotype data are implemented. This includes functions for performing omics wide association studies with one or more variables of interest as the exposure or outcome; a function for performing a meet in the middle analysis for linking exposures, omics, and outcomes (as described by Chadeau-Hyam et al., (2010) <doi:10.3109/1354750X.2010.533285>); and a function for performing a mixtures analysis across all omics features using quantile-based g-Computation (as described by Keil et al., (2019) <doi:10.1289/EHP5838>).
IsoSpec
is a fine structure calculator used for obtaining the most probable masses of a chemical compound given the frequencies of the composing isotopes and their masses. It finds the smallest set of isotopologues with a given probability. The probability is assumed to be that of the product of multinomial distributions, each corresponding to one particular element and parametrized by the frequencies of finding these elements in nature. These numbers are supplied by IUPAC - the International Union of Pure and Applied Chemistry. See: Lacki, Valkenborg, Startek (2020) <DOI:10.1021/acs.analchem.0c00959> and Lacki, Startek, Valkenborg, Gambin (2017) <DOI:10.1021/acs.analchem.6b01459> for the description of the algorithms used.
This package provides a stochastic, spatially-explicit, demo-genetic model simulating the spread and evolution of a plant pathogen in a heterogeneous landscape to assess resistance deployment strategies. It is based on a spatial geometry for describing the landscape and allocation of different cultivars, a dispersal kernel for the dissemination of the pathogen, and a SEIR ('Susceptible-Exposed-Infectious-Removedâ ) structure with a discrete time step. It provides a useful tool to assess the performance of a wide range of deployment options with respect to their epidemiological, evolutionary and economic outcomes. Loup Rimbaud, Julien Papaïx, Jean-François Rey, Luke G Barrett, Peter H Thrall (2018) <doi:10.1371/journal.pcbi.1006067>.