An R interface to the MinIO Client. The MinIO Client ('mc') provides a modern alternative to UNIX commands like ls', cat', cp', mirror', diff', find etc. It supports filesystems and Amazon "S3" compatible cloud storage service ("AWS" Signature v2 and v4). This package provides convenience functions for installing the MinIO client and running any operations, as described in the official documentation, <https://min.io/docs/minio/linux/reference/minio-mc.html?ref=docs-redirect>. This package provides a flexible and high-performance alternative to aws.s3'.
Evaluate the predictive performance of an existing (i.e. previously developed) prediction/ prognostic model given relevant information about the existing prediction model (e.g. coefficients) and a new dataset. Provides a range of model updating methods that help tailor the existing model to the new dataset; see Su et al. (2018) <doi:10.1177/0962280215626466>. Techniques to aggregate multiple existing prediction models on the new data are also provided; see Debray et al. (2014) <doi:10.1002/sim.6080> and Martin et al. (2018) <doi:10.1002/sim.7586>).
This package provides methods for inference using stacked multiple imputations augmented with weights. The vignette provides example R code for implementation in general multiple imputation settings. For additional details about the estimation algorithm, we refer the reader to Beesley, Lauren J and Taylor, Jeremy M G (2020) â A stacked approach for chained equations multiple imputation incorporating the substantive modelâ <doi:10.1111/biom.13372>, and Beesley, Lauren J and Taylor, Jeremy M G (2021) â Accounting for not-at-random missingness through imputation stackingâ <arXiv:2101.07954>.
This is a tidy implementation for heatmap. At the moment it is based on the (great) package ComplexHeatmap'. The goal of this package is to interface a tidy data frame with this powerful tool. Some of the advantages are: Row and/or columns colour annotations are easy to integrate just specifying one parameter (column names). Custom grouping of rows is easy to specify providing a grouped tbl. For example: df %>% group_by(...). Labels size adjusted by row and column total number. Default use of Brewer and Viridis palettes.
Imports variables from ReaderBench (Dascalu et al., 2018)<doi:10.1007/978-3-319-66610-5_48>, Coh-Metrix (McNamara et al., 2014)<doi:10.1017/CBO9780511894664>, and/or GAMET (Crossley et al., 2019) <doi:10.17239/jowr-2019.11.02.01> output files; downloads predictive scoring models described in Mercer & Cannon (2022)<doi:10.31244/jero.2022.01.03> and Mercer et al.(2021)<doi:10.1177/0829573520987753>; and generates predicted writing quality and curriculum-based measurement (McMaster & Espin, 2007)<doi:10.1177/00224669070410020301> scores.
This package provides two functions frameableWidget()', and frameWidget()'. The frameableWidget() is used to add extra code to a htmlwidget which allows is to be rendered correctly inside a responsive iframe'. The frameWidget() is a htmlwidget which displays content of another htmlwidget inside a responsive iframe'. These functions allow for easier embedding of htmlwidgets in content management systems such as wordpress', blogger etc. They also allow for separation of widget content from main HTML content where CSS of the main HTML could interfere with the widget.
The web version WebGestalt <https://www.webgestalt.org> supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists.
This package provides a suite of functions for analyzing sequences of events. Users can generate and code sequences based on predefined rules, with a special focus on the identification of sequences coded as ABA (when one element appears, followed by a different one, and then followed by the first). Additionally, the package offers the ability to calculate the length of consecutive ABA'-coded sequences sharing common elements. The methods implemented in this package are based on the work by Ziembowicz, K., Rychwalska, A., & Nowak, A. (2022). <doi:10.1177/10464964221118674>.
Bond Pricing and Fixed-Income Valuation of Selected Securities included here serve as a quick reference of Quantitative Methods for undergraduate courses on Fixed-Income and CFA Level I Readings on Fixed-Income Valuation, Risk and Return. CFA Institute ("CFA Program Curriculum 2020 Level I Volumes 1-6. (Vol. 5, pp. 107-151, pp. 237-299)", 2019, ISBN: 9781119593577). Barbara S. Petitt ("Fixed Income Analysis", 2019, ISBN: 9781119628132). Frank J. Fabozzi ("Handbook of Finance: Financial Markets and Instruments", 2008, ISBN: 9780470078143). Frank J. Fabozzi ("Fixed Income Analysis", 2007, ISBN: 9780470052211).
This package provides functionality for clustering origin-destination (OD) pairs, representing desire lines (or flows). This includes creating distance matrices between OD pairs and passing distance matrices to a clustering algorithm. See the academic paper Tao and Thill (2016) <doi:10.1111/gean.12100> for more details on spatial clustering of flows. See the paper on delineating demand-responsive operating areas by Mahfouz et al. (2025) <doi:10.1016/j.urbmob.2025.100135> for an example of how this package can be used to cluster flows for applied transportation research.
William S. Cleveland's book Visualizing Data is a classic piece of literature on Exploratory Data Analysis. Although it was written several decades ago, its content is still relevant as it proposes several tools which are useful to discover patterns and relationships among the data under study, and also to assess the goodness of fit o a model. This package provides functions to produce the ggplot2 versions of the visualization tools described in this book and is thought to be used in the context of courses on Exploratory Data Analysis.
This package implements iterative conditional expectation (ICE) estimators of the plug-in g-formula (Wen, Young, Robins, and Hernán (2020) <doi: 10.1111/biom.13321>). Both singly robust and doubly robust ICE estimators based on parametric models are available. The package can be used to estimate survival curves under sustained treatment strategies (interventions) using longitudinal data with time-varying treatments, time-varying confounders, censoring, and competing events. The interventions can be static or dynamic, and deterministic or stochastic (including threshold interventions). Both prespecified and user-defined interventions are available.
This package provides tools for applying Sklar's Omega (Hughes, 2022) <doi:10.1007/s11222-022-10105-2> methodology to nominal scores, ordinal scores, percentages, counts, amounts (i.e., non-negative real numbers), and balances (i.e., any real number). The framework can accommodate any number of units, any number of coders, and missingness; and can be used to measure agreement with a gold standard, intra-coder agreement, and/or inter-coder agreement. Frequentist inference is supported for all levels of measurement. Bayesian inference is supported for continuous scores only.
This package provides functions to run statistical analyses on surface-based neuroimaging data, computing measures including cortical thickness and surface area of the whole-brain and of the hippocampi. It can make use of FreeSurfer', fMRIprep', XCP-D', HCP and CAT12 preprocessed datasets, HippUnfold hippocampal outputs and SubCortexMesh subcortical outputs for a given sample by restructuring the data values into a single file. The single file can then be used by the package for analyses independently from its base dataset and without need for its access.
Calculates marginal effects based on logistic model objects such as glm or speedglm at the average (default) or at given values using finite differences. It also returns confidence intervals for said marginal effects and the p-values, which can easily be used as input in stargazer. The function only returns the essentials and is therefore much faster but not as detailed as other functions available to calculate marginal effects. As a result, it is highly suitable for large datasets for which other packages may require too much time or calculating power.
Perform fuzzy joins on data frames using approximate string matching. Implements inner, left, right, full, semi, and anti joins with string distance metrics from the stringdist package, including Optimal String Alignment, Levenshtein, Damerau-Levenshtein, Jaro-Winkler, q-gram, cosine, Jaccard, and Soundex. Uses a data.table backend plus compiled C++ result assembly to reduce overhead in large joins, while adaptive candidate planning avoids unnecessary distance evaluations in single-column string joins. Suitable for reconciling misspellings, inconsistent labels, and other near-match identifiers while optionally returning the computed distance for each match.
Implementation of two sample comparison procedures based on median-based statistical tests for functional data, introduced in Smida et al (2022) <doi:10.1080/10485252.2022.2064997>. Other competitive state-of-the-art approaches proposed by Chakraborty and Chaudhuri (2015) <doi:10.1093/biomet/asu072>, Horvath et al (2013) <doi:10.1111/j.1467-9868.2012.01032.x> or Cuevas et al (2004) <doi:10.1016/j.csda.2003.10.021> are also included in the package, as well as procedures to run test result comparisons and power analysis using simulations.
This package provides functions connecting to the Salesforce Platform APIs (REST, SOAP, Bulk 1.0, Bulk 2.0, Metadata, Reports and Dashboards) <https://trailhead.salesforce.com/content/learn/modules/api_basics/api_basics_overview>. "API" is an acronym for "application programming interface". Most all calls from these APIs are supported as they use CSV, XML or JSON data that can be parsed into R data structures. For more details please see the Salesforce API documentation and this package's website <https://stevenmmortimer.github.io/salesforcer/> for more information, documentation, and examples.
Affords researchers the ability to draw stratified samples from the U.S. Department of Veteran's Affairs/Department of Defense Identity Repository (VADIR) database according to a variety of population characteristics. The VADIR database contains information for all veterans who were separated from the military after 1980. The central utility of the present package is to integrate data cleaning and formatting for the VADIR database with the stratification methods described by Mahto (2019) <https://CRAN.R-project.org/package=splitstackshape>. Data from VADIR are not provided as part of this package.
This package provides a pilot matching design to automatically stratify and match large datasets. The manual_stratify() function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify() function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match() function then does optimal matching of the data set in parallel within strata.
The SALTSampler package facilitates Monte Carlo Markov Chain (MCMC) sampling of random variables on a simplex. A Self-Adjusting Logit Transform (SALT) proposal is used so that sampling is still efficient even in difficult cases, such as those in high dimensions or with parameters that differ by orders of magnitude. Special care is also taken to maintain accuracy even when some coordinates approach 0 or 1 numerically. Diagnostic and graphic functions are included in the package, enabling easy assessment of the convergence and mixing of the chain within the constrained space.
Selection of spatially balanced samples. In particular, the implemented sampling designs allow to select probability samples well spread over the population of interest, in any dimension and using any distance function (e.g. Euclidean distance, Manhattan distance). For more details, Pantalone F, Benedetti R, and Piersimoni F (2022) <doi:10.18637/jss.v103.c02>, Benedetti R and Piersimoni F (2017) <doi:10.1002/bimj.201600194>, and Benedetti R and Piersimoni F (2017) <arXiv:1710.09116>. The implementation has been done in C++ through the use of Rcpp and RcppArmadillo'.
This package provides a suite of helper functions to support Bayesian Kernel Machine Regression (BKMR) analyses in environmental health research. It enables the simulation of realistic multivariate exposure data using Multivariate Skewed Gamma distributions, estimation of distributional parameters by subgroup, and application of adaptive, data-driven thresholds for feature selection via Posterior Inclusion Probabilities (PIPs). It is especially suited for handling skewed exposure data and enhancing the interpretability of BKMR results through principled variable selection. The methodology is shown in Hasan et. al. (2025) <doi:10.1101/2025.04.14.25325822>.
This package provides a tool for computing network representations of attitudes, extracted from tabular data such as sociological surveys. Development of surveygraph software and training materials was initially funded by the European Union under the ERC Proof-of-concept programme (ERC, Attitude-Maps-4-All, project number: 101069264). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.