Implementation of LT-FH++, an extension of the liability threshold family history (LT-FH) model. LT-FH++ uses a Gibbs sampler for sampling from the truncated multivariate normal distribution and allows for flexible family structures. LT-FH++ was first described in Pedersen, Emil M., et al. (2022) <doi:10.1016/j.ajhg.2022.01.009> as an extension to LT-FH with more flexible family structures, and again as the age-dependent liability threshold (ADuLT) model Pedersen, Emil M., et al. (2023) <https://www.nature.com/articles/s41467-023-41210-z> as an alternative to traditional time-to-event genome-wide association studies, where family history was not considered.
Simulate complex traits given a SNP genotype matrix and model parameters (the desired heritability, number of causal loci, and either the true ancestral allele frequencies used to generate the genotypes or the mean kinship for a real dataset). Emphasis on avoiding common biases due to the use of estimated allele frequencies. The code selects random loci to be causal, constructs coefficients for these loci and random independent non-genetic effects, and can optionally generate random group effects. Traits can follow three models: random coefficients, fixed effect sizes, and infinitesimal (multivariate normal). GWAS method benchmarking functions are also provided. Described in Yao and Ochoa (2022) <doi:10.1101/2022.03.25.485885>.
Converts the dates to different SAS date formats. In SAS dates are a special case of numeric values. Each day is assigned a specific numeric value, starting from January 1, 1960. This date is assigned the date value 0, and the next date has a date value of 1 and so on. The previous days to this date are represented by -1 , -2 and so on. With this approach, SAS can represent any date in the future or any date in the past. There are many date formats used in SAS to represent date-time. Here, we try to develop functions which will convert the date to different SAS date formats.
Computes Bayesian wavelet shrinkage credible intervals for nonparametric regression. The method uses cumulants to derive Bayesian credible intervals for wavelet regression estimates. The first four cumulants of the posterior distribution of the estimates are expressed in terms of the observed data and integer powers of the mother wavelet functions. These powers are closely approximated by linear combinations of wavelet scaling functions at an appropriate finer scale. Hence, a suitable modification of the discrete wavelet transform allows the posterior cumulants to be found efficiently for any data set. Johnson transformations then yield the credible intervals themselves. Barber, S., Nason, G.P. and Silverman, B.W. (2002) <doi:10.1111/1467-9868.00332>.
This package provides an R scripting interface to the open-source SAGA-GIS (System for Automated Geoscientific Analyses Geographical Information System) software. Rsagacmd dynamically generates R functions for every SAGA-GIS geoprocessing tool based on the user's currently installed SAGA-GIS version. These functions are contained within an S3 object and are accessed as a named list of libraries and tools. This structure facilitates an easier scripting experience by organizing the large number of SAGA-GIS geoprocessing tools (>700) by their respective library. Interactive scripting can fully take advantage of code autocompletion tools (e.g. in RStudio'), allowing for each tools syntax to be quickly recognized. Furthermore, the most common types of spatial data (via the terra', sp', and sf packages) along with non-spatial data are automatically passed from R to the SAGA-GIS command line tool for geoprocessing operations, and the results are loaded as the appropriate R object. Outputs from individual SAGA-GIS tools can also be chained using pipes from the magrittr and dplyr packages to combine complex geoprocessing operations together in a single statement. SAGA-GIS is available under a GPLv2 / LGPLv2 licence from <https://sourceforge.net/projects/saga-gis/> including Windows x86/x64 and macOS binaries. SAGA-GIS is also included in Debian/Ubuntu default software repositories. Rsagacmd has currently been tested on SAGA-GIS versions from 2.3.1 to 9.5.1 on Windows, Linux and macOS.
GARFIELD is a non-parametric functional enrichment analysis approach described in the paper GARFIELD: GWAS analysis of regulatory or functional information enrichment with LD correction. Briefly, it is a method that leverages GWAS findings with regulatory or functional annotations (primarily from ENCODE and Roadmap epigenomics data) to find features relevant to a phenotype of interest. It performs greedy pruning of GWAS SNPs (LD r2 > 0.1) and then annotates them based on functional information overlap. Next, it quantifies Fold Enrichment (FE) at various GWAS significance cutoffs and assesses them by permutation testing, while matching for minor allele frequency, distance to nearest transcription start site and number of LD proxies (r2 > 0.8).
ProteoMM is a statistical method to perform model-based peptide-level differential expression analysis of single or multiple datasets. For multiple datasets ProteoMM produces a single fold change and p-value for each protein across multiple datasets. ProteoMM provides functionality for normalization, missing value imputation and differential expression. Model-based peptide-level imputation and differential expression analysis component of package follows the analysis described in “A statistical framework for protein quantitation in bottom-up MS based proteomics" (Karpievitch et al. Bioinformatics 2009). EigenMS normalisation is implemented as described in "Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition." (Karpievitch et al. Bioinformatics 2009).
When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations. The R package careless provides solutions designed to detect such careless / insufficient effort responses by allowing easy calculation of indices proposed in the literature. It currently supports the calculation of longstring, even-odd consistency, psychometric synonyms/antonyms, Mahalanobis distance, and intra-individual response variability (also termed inter-item standard deviation). For a review of these methods, see Curran (2016) <doi:10.1016/j.jesp.2015.07.006>.
This package provides data for Kaya identity variables (population, gross domestic product, primary energy consumption, and energy-related CO2 emissions) for the world and for individual nations, and utility functions for looking up data, plotting trends of Kaya variables, and plotting the fuel mix for a given country or region. The Kaya identity (Yoichi Kaya and Keiichi Yokobori, "Environment, Energy, and Economy: Strategies for Sustainability" (United Nations University Press, 1998) and <https://en.wikipedia.org/wiki/Kaya_identity>) expresses a nation's or region's greenhouse gas emissions in terms of its population, per-capita Gross Domestic Product, the energy intensity of its economy, and the carbon-intensity of its energy supply.
Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as Facebook', Twitter and Youtube'), online products reviews, and so on. The utility of opitools was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.
This package provides high-level API and a wide range of options to create stunning, publication-quality plots effortlessly. It is built upon ggplot2 and other plotting packages, and is designed to be easy to use and to work seamlessly with ggplot2 objects. It is particularly useful for creating complex plots with multiple layers, facets, and annotations. It also provides a set of functions to create plots for specific types of data, such as Venn diagrams, alluvial diagrams, and phylogenetic trees. The package is designed to be flexible and customizable, and to work well with the ggplot2 ecosystem. The API can be found at <https://pwwang.github.io/plotthis/reference/index.html>.
Implementation of the SIC epsilon-telescope method, either using single or distributional (multiparameter) regression. Includes classical regression with normally distributed errors and robust regression, where the errors are from the Laplace distribution. The "smooth generalized normal distribution" is used, where the estimation of an additional shape parameter allows the user to move smoothly between both types of regression. See O'Neill and Burke (2022) "Robust Distributional Regression with Automatic Variable Selection" for more details. <doi:10.48550/arXiv.2212.07317>. This package also contains the data analyses from O'Neill and Burke (2023). "Variable selection using a smooth information criterion for distributional regression models". <doi:10.1007/s11222-023-10204-8>.
This package provides a specialized selection algorithm designed to align simulated fire perimeters with specific fire size distribution scenarios. The foundation of this approach lies in generating a vast collection of plausible simulated fires across a wide range of conditions, assuming a random pattern of ignition. The algorithm then assembles individual fire perimeters based on their specific probabilities of occurrence, e.g., determined by (i) the likelihood of ignition and (ii) the probability of particular fire-weather scenarios, including wind speed and direction. Implements the method presented in Rodrigues (2025a) <doi:10.5194/egusphere-egu25-8974>. Demo data and code examples can be found in Rodrigues (2025b) <doi:10.5281/zenodo.15282605>.
An implementation of tidy speaker vowel normalization. This includes generic functions for defining new normalization methods for points, formant tracks, and Discrete Cosine Transform coefficients, as well as convenience functions implementing established normalization methods. References for the implemented methods are: Johnson, Keith (2020) <doi:10.5334/labphon.196> Lobanov, Boris (1971) <doi:10.1121/1.1912396> Nearey, Terrance M. (1978) <https://sites.ualberta.ca/~tnearey/Nearey1978_compressed.pdf> Syrdal, Ann K., and Gopal, H. S. (1986) <doi:10.1121/1.393381> Watt, Dominic, and Fabricius, Anne (2002) <https://www.latl.leeds.ac.uk/article/evaluation-of-a-technique-for-improving-the-mapping-of-multiple-speakers-vowel-spaces-in-the-f1-f2-plane/>.
The functions proposed in this package allows to evaluate the process of measurement of the chemical components of water numerically or graphically. TSSS(), ICHS and datacheck() functions are useful to control the quality of measurements of chemical components of a sample of water. If one or more measurements include an error, the generated graph will indicate it with a position of the point that represents the sample outside the confidence interval. The function CI() allows to evaluate the possibility of contamination of a water sample after being obtained. Validation() is a function that allows to calculate the quality parameters of a technique for the measurement of a chemical component.
Designed to support the visualization, numerical computation, qualitative analysis, model-data fusion, and stochastic simulation for autonomous systems of differential equations. Euler and Runge-Kutta methods are implemented, along with tools to visualize the two-dimensional phaseplane. Likelihood surfaces and a simple Markov Chain Monte Carlo parameter estimator can be used for model-data fusion of differential equations and empirical models. The Euler-Maruyama method is provided for simulation of stochastic differential equations. The package was originally written for internal use to support teaching by Zobitz, and refined to support the text "Exploring modeling with data and differential equations using R" by John Zobitz (2021) <https://jmzobitz.github.io/ModelingWithR/index.html>.
This package provides a collection of fast and flexible functions for analyzing omics data in observational studies. Multiple different approaches for integrating multiple environmental/genetic factors, omics data, and/or phenotype data are implemented. This includes functions for performing omics wide association studies with one or more variables of interest as the exposure or outcome; a function for performing a meet in the middle analysis for linking exposures, omics, and outcomes (as described by Chadeau-Hyam et al., (2010) <doi:10.3109/1354750X.2010.533285>); and a function for performing a mixtures analysis across all omics features using quantile-based g-Computation (as described by Keil et al., (2019) <doi:10.1289/EHP5838>).
IsoSpec is a fine structure calculator used for obtaining the most probable masses of a chemical compound given the frequencies of the composing isotopes and their masses. It finds the smallest set of isotopologues with a given probability. The probability is assumed to be that of the product of multinomial distributions, each corresponding to one particular element and parametrized by the frequencies of finding these elements in nature. These numbers are supplied by IUPAC - the International Union of Pure and Applied Chemistry. See: Lacki, Valkenborg, Startek (2020) <DOI:10.1021/acs.analchem.0c00959> and Lacki, Startek, Valkenborg, Gambin (2017) <DOI:10.1021/acs.analchem.6b01459> for the description of the algorithms used.
This package provides a stochastic, spatially-explicit, demo-genetic model simulating the spread and evolution of a plant pathogen in a heterogeneous landscape to assess resistance deployment strategies. It is based on a spatial geometry for describing the landscape and allocation of different cultivars, a dispersal kernel for the dissemination of the pathogen, and a SEIR ('Susceptible-Exposed-Infectious-Removedâ ) structure with a discrete time step. It provides a useful tool to assess the performance of a wide range of deployment options with respect to their epidemiological, evolutionary and economic outcomes. Loup Rimbaud, Julien Papaïx, Jean-François Rey, Luke G Barrett, Peter H Thrall (2018) <doi:10.1371/journal.pcbi.1006067>.
The aim is to develop an R package, which is the new.dist package, for the probability (density) function, the distribution function, the quantile function and the associated random number generation function for discrete and continuous distributions, which have recently been proposed in the literature. This package implements the following distributions: The Power Muth Distribution, a Bimodal Weibull Distribution, the Discrete Lindley Distribution, The Gamma-Lomax Distribution, Weighted Geometric Distribution, a Power Log-Dagum Distribution, Kumaraswamy Distribution, Lindley Distribution, the Unit-Inverse Gaussian Distribution, EP Distribution, Akash Distribution, Ishita Distribution, Maxwell Distribution, the Standard Omega Distribution, Slashed Generalized Rayleigh Distribution, Two-Parameter Rayleigh Distribution, Muth Distribution, Uniform-Geometric Distribution, Discrete Weibull Distribution.
Classify Open Street Map (OSM) features into meaningful functional or analytical categories. Designed for OSM PBF files, e.g. from <https://download.geofabrik.de/> imported as spatial data frames. A classification consists of a list of categories that are related to certain OSM tags and values. Given a layer from an OSM PBF file and a classification, the main osm_classify() function returns a classification data table giving, for each feature, the primary and alternative categories (if there is overlap) assigned, and the tag(s) and value(s) matched on. The package also contains a classification of OSM features by economic function/significance, following Krantz (2023) <https://www.ssrn.com/abstract=4537867>.
An Electronic Data Capture system (EDC) and Data Standard agnostic solution that enables the pharmaceutical programming community to develop Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) datasets in R. The reusable algorithms concept in sdtm.oak provides a framework for modular programming and can potentially automate the conversion of raw clinical data to SDTM through standardized SDTM specifications. SDTM is one of the required standards for data submission to the Food and Drug Administration (FDA) in the United States and Pharmaceuticals and Medical Devices Agency (PMDA) in Japan. SDTM standards are implemented following the SDTM Implementation Guide as defined by CDISC <https://www.cdisc.org/standards/foundational/sdtmig>.
Powerful graphical displays and statistical tools for structured problem solving and diagnosis. The functions of the sherlock package are especially useful for applying the process of elimination as a problem diagnosis technique. The sherlock package was designed to seamlessly work with the tidyverse set of packages and provides a collection of graphical displays built on top of the ggplot and plotly packages, such as different kinds of small multiple plots as well as helper functions such as adding reference lines, normalizing observations, reading in data or saving analysis results in an Excel file. References: David Hartshorne (2019, ISBN: 978-1-5272-5139-7). Stefan H. Steiner, R. Jock MacKay (2005, ISBN: 0873896467).
This package provides functions to generate K-fold cross validation (CV) folds and CV test error estimates that take into account how a survey dataset's sampling design was constructed (SRS, clustering, stratification, and/or unequal sampling weights). You can input linear and logistic regression models, along with data and a type of survey design in order to get an output that can help you determine which model best fits the data using K-fold cross validation. Our paper on "K-Fold Cross-Validation for Complex Sample Surveys" by Wieczorek, Guerin, and McMahon (2022) <doi:10.1002/sta4.454> explains why differing how we take folds based on survey design is useful.