Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://www.epa.gov/comptox-tools/computational-toxicology-and-exposure-apis>. ctxR was developed to streamline the process of accessing the information available through the CTX APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.
Compares distributions with one another in terms of their fit to each sample in a dataset that contains multiple samples, as described in Joo, Aguinis, and Bradley (in press). Users can examine the fit of seven distributions per sample: pure power law, lognormal, exponential, power law with an exponential cutoff, normal, Poisson, and Weibull. Automation features allow the user to compare all distributions for all samples with a single command line, which creates a separate row containing results for each sample until the entire dataset has been analyzed.
This package provides a dynamic programming algorithm for the fast segmentation of univariate signals into piecewise constant profiles. The fpop package is a wrapper to a C++ implementation of the fpop (Functional Pruning Optimal Partioning) algorithm described in Maidstone et al. 2017 <doi:10.1007/s11222-016-9636-3>. The problem of detecting changepoints in an univariate sequence is formulated in terms of minimising the mean squared error over segmentations. The fpop algorithm exactly minimizes the mean squared error for a penalty linear in the number of changepoints.
This package provides a convenient interface with the OpenAI ChatGPT API <https://openai.com/api>. gptr allows you to interact with ChatGPT', a powerful language model, for various natural language processing tasks. The gptr R package makes talking to ChatGPT in R super easy. It helps researchers and data folks by simplifying the complicated stuff, like asking questions and getting answers. With gptr', you can use ChatGPT in R without any hassle, making it simpler for everyone to do cool things with language!
This package provides functions for specifying and fitting marginal models for contingency tables proposed by Bergsma and Rudas (2002) <doi:10.1214/aos/1015362188> here called hierarchical multinomial marginal models (hmmm) and their extensions presented by Bartolucci, Colombi and Forcina (2007) <https://www.jstor.org/stable/24307737>; multinomial Poisson homogeneous (mph) models and homogeneous linear predictor (hlp) models for contingency tables proposed by Lang (2004) <doi:10.1214/aos/1079120140> and Lang (2005) <doi:10.1198/016214504000001042>. Inequality constraints on the parameters are allowed and can be tested.
Fast and accurate inference of gene-environment associations (GEA) in genome-wide studies (Caye et al., 2019, <doi:10.1093/molbev/msz008>). We developed a least-squares estimation approach for confounder and effect sizes estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several times faster than the existing GEA approaches, then our previous version of the LFMM program present in the LEA package (Frichot and Francois, 2015, <doi:10.1111/2041-210X.12382>).
This package provides a PC Algorithm with the Principle of Mendelian Randomization. This package implements the MRPC (PC with the principle of Mendelian randomization) algorithm to infer causal graphs. It also contains functions to simulate data under a certain topology, to visualize a graph in different ways, and to compare graphs and quantify the differences. See Badsha and Fu (2019) <doi:10.3389/fgene.2019.00460>, Badsha, Martin and Fu (2021) <doi:10.3389/fgene.2021.651812>, Kvamme and Badsha, et al. (2025) <doi:10.1093/genetics/iyaf064>.
This package provides a collection of functions and data sets that support teaching a quantitative finance MS level course on Portfolio Construction and Risk Analysis, and the writing of a textbook for such a course. The package is unique in providing several real-world data sets that may be used for problem assignments and student projects. The data sets include cross-sections of stock data from the Center for Research on Security Prices, LLC (CRSP), corresponding factor exposures data from S&P Global, and several SP500 data sets.
Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and <https://sccs-studies.info/index.html>.
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
This package facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R, a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file. It also may be converted into other popular R objects. This package provides a link between VCF data and familiar R software.
Computes approximated adjusted fractional Bayes factors for equality, inequality, and about equality constrained hypotheses. For a tutorial on this method, see Hoijtink, Mulder, van Lissa, & Gu, (2019) <doi:10.1037/met0000201>. For applications in structural equation modeling, see: Van Lissa, Gu, Mulder, Rosseel, Van Zundert, & Hoijtink, (2021) <doi:10.1080/10705511.2020.1745644>. For the statistical underpinnings, see Gu, Mulder, and Hoijtink (2018) <doi:10.1111/bmsp.12110>; Hoijtink, Gu, & Mulder, J. (2019) <doi:10.1111/bmsp.12145>; Hoijtink, Gu, Mulder, & Rosseel, (2019) <doi:10.31234/osf.io/q6h5w>.
This package implements the regression approach of Zuber and Strimmer (2011) "High-dimensional regression and variable selection using CAR scores" SAGMB 10: 34, <DOI:10.2202/1544-6115.1730>. CAR scores measure the correlation between the response and the Mahalanobis-decorrelated predictors. The squared CAR score is a natural measure of variable importance and provides a canonical ordering of variables. This package provides functions for estimating CAR scores, for variable selection using CAR scores, and for estimating corresponding regression coefficients. Both shrinkage as well as empirical estimators are available.
Some EM-type algorithms to estimate parameters for the well-known Heckman selection model are provided in the package. Such algorithms are as follow: ECM(Expectation/Conditional Maximization), ECM(NR)(the Newton-Raphson method is adapted to the ECM) and ECME(Expectation/Conditional Maximization Either). Since the algorithms are based on the EM algorithm, they also have EMâ s main advantages, namely, stability and ease of implementation. Further details and explanations of the algorithms can be found in Zhao et al. (2020) <doi: 10.1016/j.csda.2020.106930>.
An implementation of the generalized graded unfolding model (GGUM) in R, see Roberts, Donoghue, and Laughlin (2000) <doi:10.1177/01466216000241001>). It allows to simulate data sets based on the GGUM. It fits the GGUM and the GUM, and it retrieves item and person parameter estimates. Several plotting functions are available (item and test information functions; item and test characteristic curves; item category response curves). Additionally, there are some functions that facilitate the communication between R and GGUM2004'. Finally, a model-fit checking utility, MODFIT(), is also available.
This package implements Cumulative Sum (CUSUM) control charts specifically designed for monitoring processes following a Gamma distribution. Provides functions to estimate distribution parameters, simulate control limits, and apply cautious learning schemes for adaptive thresholding. It supports upward and downward monitoring with guaranteed performance evaluated via Monte Carlo simulations. It is useful for quality control applications in industries where data follows a Gamma distribution. Methods are based on Madrid-Alvarez et al. (2024) <doi:10.1002/qre.3464> and Madrid-Alvarez et al. (2024) <doi:10.1080/08982112.2024.2440368>.
This package provides a lightweight, dependency-free, and simplified implementation of the Pseudo-Expectation Gauss-Seidel (PEGS) algorithm. It fits the multivariate ridge regression model for genomic prediction Xavier and Habier (2022) <doi:10.1186/s12711-022-00730-w> and Xavier et al. (2025) <doi:10.1093/genetics/iyae179>, providing heritability estimates, genetic correlations, breeding values, and regression coefficient estimates for prediction. This package provides an alternative to the bWGR package by Xavier et al. (2019) <doi:10.1093/bioinformatics/btz794> by using LAPACK for its algebraic operations.
The goal of rFIA is to increase the accessibility and use of the United States Forest Services (USFS) Forest Inventory and Analysis (FIA) Database by providing a user-friendly, open source toolkit to easily query and analyze FIA Data. Designed to accommodate a wide range of potential user objectives, rFIA simplifies the estimation of forest variables from the FIA Database and allows all R users (experts and newcomers alike) to unlock the flexibility inherent to the Enhanced FIA design. Specifically, rFIA improves accessibility to the spatial-temporal estimation capacity of the FIA Database by producing space-time indexed summaries of forest variables within user-defined population boundaries. Direct integration with other popular R packages (e.g., dplyr', tidyr', and sf') facilitates efficient space-time query and data summary, and supports common data representations and API design. The package implements design-based estimation procedures outlined by Bechtold & Patterson (2005) <doi:10.2737/SRS-GTR-80>, and has been validated against estimates and sampling errors produced by FIA EVALIDator'. Current development is focused on the implementation of spatially-enabled model-assisted and model-based estimators to improve population, change, and ratio estimates.
TSIS is used for detecting transcript isoform switches in time-series data. Transcript isoform switches occur when a pair of alternatively spliced isoforms reverse the order of their relative expression levels. TSIS characterizes the transcript switch by defining the isoform switch time-points for any pair of transcript isoforms within a gene. In addition, this tool describes the switch using five different features or metrics. Also it filters the results with user’s specifications and visualizes the results using different plots for the user to examine further details of the switches.
Statistic methods to evaluate variations of differential expression (DE) between multiple biological conditions. It takes into account the fold-changes and p-values from previous differential expression (DE) results that use large-scale data (*e.g.*, microarray and RNA-seq) and evaluates which genes would react in response to the distinct experiments. This evaluation involves an unique pipeline of statistical methods, including weighted summarization, quantile detection, cluster analysis, and ANOVA tests, in order to classify a subset of relevant genes whose DE is similar or dependent to certain biological factors.
‘idpr’ aims to integrate tools for the computational analysis of intrinsically disordered proteins (IDPs) within R. This package is used to identify known characteristics of IDPs for a sequence of interest with easily reported and dynamic results. Additionally, this package includes tools for IDP-based sequence analysis to be used in conjunction with other R packages. Described in McFadden WM & Yanowitz JL (2022). "idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R." PloS one, 17(4), e0266929. <https://doi.org/10.1371/journal.pone.0266929>.
Two partially supervised mixture modeling methods: soft-label and belief-based modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software <doi:10.18637/jss.v047.i03>.
This package provides a tool that imports, subsets, visualizes, and exports the Correlates of State Policy Project dataset assembled by Marty P. Jordan and Matt Grossmann (2020) <http://ippsr.msu.edu/public-policy/correlates-state-policy>. The Correlates data contains over 2000 variables across more than 100 years that pertain to state politics and policy in the United States. Users with only a basic understanding of R can subset this data across multiple dimensions, export their search results, create map visualizations, export the citations associated with their searches, and more.
This package provides a framework that provides the methods for quantifying entropy-based local indicator of spatial association (ELSA) that can be used for both continuous and categorical data. In addition, this package offers other methods to measure local indicators of spatial associations (LISA). Furthermore, global spatial structure can be measured using a variogram-like diagram, called entrogram. For more information, please check that paper: Naimi, B., Hamm, N. A., Groen, T. A., Skidmore, A. K., Toxopeus, A. G., & Alibakhshi, S. (2019) <doi:10.1016/j.spasta.2018.10.001>.