This package provides a convenient interface with the OpenAI ChatGPT API <https://openai.com/api>. gptr allows you to interact with ChatGPT', a powerful language model, for various natural language processing tasks. The gptr R package makes talking to ChatGPT in R super easy. It helps researchers and data folks by simplifying the complicated stuff, like asking questions and getting answers. With gptr', you can use ChatGPT in R without any hassle, making it simpler for everyone to do cool things with language!
This package provides functions for specifying and fitting marginal models for contingency tables proposed by Bergsma and Rudas (2002) <doi:10.1214/aos/1015362188> here called hierarchical multinomial marginal models (hmmm) and their extensions presented by Bartolucci, Colombi and Forcina (2007) <https://www.jstor.org/stable/24307737>; multinomial Poisson homogeneous (mph) models and homogeneous linear predictor (hlp) models for contingency tables proposed by Lang (2004) <doi:10.1214/aos/1079120140> and Lang (2005) <doi:10.1198/016214504000001042>. Inequality constraints on the parameters are allowed and can be tested.
Fast and accurate inference of gene-environment associations (GEA) in genome-wide studies (Caye et al., 2019, <doi:10.1093/molbev/msz008>). We developed a least-squares estimation approach for confounder and effect sizes estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several times faster than the existing GEA approaches, then our previous version of the LFMM program present in the LEA package (Frichot and Francois, 2015, <doi:10.1111/2041-210X.12382>).
This package provides a PC Algorithm with the Principle of Mendelian Randomization. This package implements the MRPC (PC with the principle of Mendelian randomization) algorithm to infer causal graphs. It also contains functions to simulate data under a certain topology, to visualize a graph in different ways, and to compare graphs and quantify the differences. See Badsha and Fu (2019) <doi:10.3389/fgene.2019.00460>, Badsha, Martin and Fu (2021) <doi:10.3389/fgene.2021.651812>, Kvamme and Badsha, et al. (2025) <doi:10.1093/genetics/iyaf064>.
Analysis of multivariate functional spatial data, including spectral multivariate functional principal component analysis and related statistical procedures (Si-Ahmed, Idris, et al. "Principal component analysis of multivariate spatial functional data." Big Data Research 39 (2025) 100504). (Kuenzer, T., Hörmann, S., & Kokoszka, P. (2021). "Principal component analysis of spatially indexed functions." Journal of the American Statistical Association, 116(535), 1444-1456.) (Happ, C., & Greven, S. (2018). "Multivariate functional principal component analysis for data observed on different (dimensional) domains." Journal of the American Statistical Association, 113(522), 649-659.).
Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and <https://sccs-studies.info/index.html>.
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
This package facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R, a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file. It also may be converted into other popular R objects. This package provides a link between VCF data and familiar R software.
Computes approximated adjusted fractional Bayes factors for equality, inequality, and about equality constrained hypotheses. For a tutorial on this method, see Hoijtink, Mulder, van Lissa, & Gu, (2019) <doi:10.1037/met0000201>. For applications in structural equation modeling, see: Van Lissa, Gu, Mulder, Rosseel, Van Zundert, & Hoijtink, (2021) <doi:10.1080/10705511.2020.1745644>. For the statistical underpinnings, see Gu, Mulder, and Hoijtink (2018) <doi:10.1111/bmsp.12110>; Hoijtink, Gu, & Mulder, J. (2019) <doi:10.1111/bmsp.12145>; Hoijtink, Gu, Mulder, & Rosseel, (2019) <doi:10.31234/osf.io/q6h5w>.
This package implements the regression approach of Zuber and Strimmer (2011) "High-dimensional regression and variable selection using CAR scores" SAGMB 10: 34, <DOI:10.2202/1544-6115.1730>. CAR scores measure the correlation between the response and the Mahalanobis-decorrelated predictors. The squared CAR score is a natural measure of variable importance and provides a canonical ordering of variables. This package provides functions for estimating CAR scores, for variable selection using CAR scores, and for estimating corresponding regression coefficients. Both shrinkage as well as empirical estimators are available.
Some EM-type algorithms to estimate parameters for the well-known Heckman selection model are provided in the package. Such algorithms are as follow: ECM(Expectation/Conditional Maximization), ECM(NR)(the Newton-Raphson method is adapted to the ECM) and ECME(Expectation/Conditional Maximization Either). Since the algorithms are based on the EM algorithm, they also have EMâ s main advantages, namely, stability and ease of implementation. Further details and explanations of the algorithms can be found in Zhao et al. (2020) <doi: 10.1016/j.csda.2020.106930>.
An implementation of the generalized graded unfolding model (GGUM) in R, see Roberts, Donoghue, and Laughlin (2000) <doi:10.1177/01466216000241001>). It allows to simulate data sets based on the GGUM. It fits the GGUM and the GUM, and it retrieves item and person parameter estimates. Several plotting functions are available (item and test information functions; item and test characteristic curves; item category response curves). Additionally, there are some functions that facilitate the communication between R and GGUM2004'. Finally, a model-fit checking utility, MODFIT(), is also available.
This package implements Cumulative Sum (CUSUM) control charts specifically designed for monitoring processes following a Gamma distribution. Provides functions to estimate distribution parameters, simulate control limits, and apply cautious learning schemes for adaptive thresholding. It supports upward and downward monitoring with guaranteed performance evaluated via Monte Carlo simulations. It is useful for quality control applications in industries where data follows a Gamma distribution. Methods are based on Madrid-Alvarez et al. (2024) <doi:10.1002/qre.3464> and Madrid-Alvarez et al. (2024) <doi:10.1080/08982112.2024.2440368>.
This package provides a lightweight, dependency-free, and simplified implementation of the Pseudo-Expectation Gauss-Seidel (PEGS) algorithm. It fits the multivariate ridge regression model for genomic prediction Xavier and Habier (2022) <doi:10.1186/s12711-022-00730-w> and Xavier et al. (2025) <doi:10.1093/genetics/iyae179>, providing heritability estimates, genetic correlations, breeding values, and regression coefficient estimates for prediction. This package provides an alternative to the bWGR package by Xavier et al. (2019) <doi:10.1093/bioinformatics/btz794> by using LAPACK for its algebraic operations.
Calculate Predictive Moran's Eigenvector Maps (pMEM) for spatially-explicit prediction of environmental variables, as defined by Guénard and Legendre (2024) <doi:10.1111/2041-210X.14413>. pMEM extends classical MEM by enabling interpolation and prediction at unsampled locations using spatial weighting functions parameterized by range (and optionally shape). The package implements multiple pMEM types (e.g., exponential, Gaussian, linear) and features a modular architecture that allows programmers to define custom weighting functions. Designed for ecologists, geographers, and spatial analysts working with spatially-structured data.
The goal of rFIA is to increase the accessibility and use of the United States Forest Services (USFS) Forest Inventory and Analysis (FIA) Database by providing a user-friendly, open source toolkit to easily query and analyze FIA Data. Designed to accommodate a wide range of potential user objectives, rFIA simplifies the estimation of forest variables from the FIA Database and allows all R users (experts and newcomers alike) to unlock the flexibility inherent to the Enhanced FIA design. Specifically, rFIA improves accessibility to the spatial-temporal estimation capacity of the FIA Database by producing space-time indexed summaries of forest variables within user-defined population boundaries. Direct integration with other popular R packages (e.g., dplyr', tidyr', and sf') facilitates efficient space-time query and data summary, and supports common data representations and API design. The package implements design-based estimation procedures outlined by Bechtold & Patterson (2005) <doi:10.2737/SRS-GTR-80>, and has been validated against estimates and sampling errors produced by FIA EVALIDator'. Current development is focused on the implementation of spatially-enabled model-assisted and model-based estimators to improve population, change, and ratio estimates.
TSIS is used for detecting transcript isoform switches in time-series data. Transcript isoform switches occur when a pair of alternatively spliced isoforms reverse the order of their relative expression levels. TSIS characterizes the transcript switch by defining the isoform switch time-points for any pair of transcript isoforms within a gene. In addition, this tool describes the switch using five different features or metrics. Also it filters the results with user’s specifications and visualizes the results using different plots for the user to examine further details of the switches.
Statistic methods to evaluate variations of differential expression (DE) between multiple biological conditions. It takes into account the fold-changes and p-values from previous differential expression (DE) results that use large-scale data (*e.g.*, microarray and RNA-seq) and evaluates which genes would react in response to the distinct experiments. This evaluation involves an unique pipeline of statistical methods, including weighted summarization, quantile detection, cluster analysis, and ANOVA tests, in order to classify a subset of relevant genes whose DE is similar or dependent to certain biological factors.
‘idpr’ aims to integrate tools for the computational analysis of intrinsically disordered proteins (IDPs) within R. This package is used to identify known characteristics of IDPs for a sequence of interest with easily reported and dynamic results. Additionally, this package includes tools for IDP-based sequence analysis to be used in conjunction with other R packages. Described in McFadden WM & Yanowitz JL (2022). "idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R." PloS one, 17(4), e0266929. <https://doi.org/10.1371/journal.pone.0266929>.
This package provides a tool that imports, subsets, visualizes, and exports the Correlates of State Policy Project dataset assembled by Marty P. Jordan and Matt Grossmann (2020) <http://ippsr.msu.edu/public-policy/correlates-state-policy>. The Correlates data contains over 2000 variables across more than 100 years that pertain to state politics and policy in the United States. Users with only a basic understanding of R can subset this data across multiple dimensions, export their search results, create map visualizations, export the citations associated with their searches, and more.
This package provides a framework that provides the methods for quantifying entropy-based local indicator of spatial association (ELSA) that can be used for both continuous and categorical data. In addition, this package offers other methods to measure local indicators of spatial associations (LISA). Furthermore, global spatial structure can be measured using a variogram-like diagram, called entrogram. For more information, please check that paper: Naimi, B., Hamm, N. A., Groen, T. A., Skidmore, A. K., Toxopeus, A. G., & Alibakhshi, S. (2019) <doi:10.1016/j.spasta.2018.10.001>.
An implementation of the sandwich smoother proposed in Fast Bivariate Penalized Splines by Xiao et al. (2012) <doi:10.1111/rssb.12007>. A hero is a specific type of sandwich. Dictionary.com (2018) <https://www.dictionary.com> describes a hero as: a large sandwich, usually consisting of a small loaf of bread or long roll cut in half lengthwise and containing a variety of ingredients, as meat, cheese, lettuce, and tomatoes. Also implements the spatio-temporal sandwich smoother of French and Kokoszka (2021) <doi:10.1016/j.spasta.2020.100413>.
Multivariable fractional polynomial algorithm simultaneously selects variables and functional forms in both generalized linear models and Cox proportional hazard models. Key references are Royston and Altman (1994) <doi:10.2307/2986270> and Royston and Sauerbrei (2008, ISBN:978-0-470-02842-1). In addition, it can model a sigmoid relationship between variable x and an outcome variable y using the approximate cumulative distribution transformation proposed by Royston (2014) <doi:10.1177/1536867X1401400206>. This feature distinguishes it from a standard fractional polynomial function, which lacks the ability to achieve such modeling.
Bivariate additive categorical regression via penalized maximum likelihood. Under a multinomial framework, the method fits bivariate models where both responses are nominal, ordinal, or a mix of the two. Partial proportional odds models are supported, with flexible (non-)uniform association structures. Various logit types and parametrizations can be specified for both marginals and the association, including Daleâ s model. The association structure can be regularized using polynomial-type penalty terms. Additive effects are modeled using P-splines. Standard methods such as summary(), residuals(), and predict() are available.