Builds and interprets multi-response machine learning models using tidymodels syntax. Users can supply a tidy model, and mrIML automates the process of fitting multiple response models to multivariate data and applying interpretable machine learning techniques across them. For more details see Fountain-Jones (2021) <doi:10.1111/1755-0998.13495> and Fountain-Jones et al. (2024) <doi:10.22541/au.172676147.77148600/v1>.
It includes four methods: DCOL-based K-profiles clustering, non-linear network reconstruction, non-linear hierarchical clustering, and variable selection for generalized additive model. References: Tianwei Yu (2018)<DOI: 10.1002/sam.11381>; Haodong Liu and others (2016)<DOI: 10.1371/journal.pone.0158247>; Kai Wang and others (2015)<DOI: 10.1155/2015/918954>; Tianwei Yu and others (2010)<DOI: 10.1109/TCBB.2010.73>.
This package contains functions useful for debugging, set operations on vectors, and UTC date and time functionality. It adds a few vector manipulation verbs to purrr and dplyr packages. It can also generate an R file to install and update packages to simplify deployment into production. The functions were developed at the data science firm Numeract LLC and are used in several packages and projects.
Extracts and summarizes metadata from data frames, including variable names, labels, types, and missing values. Computes compact descriptive statistics, frequency tables, and cross-tabulations to assist with efficient data exploration. Includes an interactive and exportable codebook generator for documenting variable metadata. Facilitates the identification of missing data patterns and structural issues in datasets. Designed to streamline initial data management and exploratory analysis workflows within R'.
Different estimators are provided to solve the blind source separation problem for multivariate time series with stochastic volatility and supervised dimension reduction problem for multivariate time series. Different functions based on AMUSE and SOBI are also provided for estimating the dimension of the white noise subspace. The package is fully described in Nordhausen, Matilainen, Miettinen, Virta and Taskinen (2021) <doi:10.18637/jss.v098.i15>.
This package provides functions to compute Wasserstein barycenters of subset posteriors using the swapping algorithm developed by Puccetti, Rüschendorf and Vanduffel (2020) <doi:10.1016/j.jmaa.2017.02.003>. The Wasserstein barycenter is a geometric approach for combining subset posteriors. It allows for parallel and distributed computation of the posterior in case of complex models and/or big datasets, thereby increasing computational speed tremendously.
Latent variable modeling with Principal Component Analysis (PCA) and Partial Least Squares (PLS) are powerful methods for visualization, regression, classification, and feature selection of omics data where the number of variables exceeds the number of samples and with multicollinearity among variables. Orthogonal Partial Least Squares (OPLS) enables to separately model the variation correlated (predictive) to the factor of interest and the uncorrelated (orthogonal) variation. While performing similarly to PLS, OPLS facilitates interpretation.
This package provides imlementations of PCA, PLS, and OPLS for multivariate analysis and feature selection of omics data. In addition to scores, loadings and weights plots, the package provides metrics and graphics to determine the optimal number of components (e.g. with the R2 and Q2 coefficients), check the validity of the model by permutation testing, detect outliers, and perform feature selection (e.g. with Variable Importance in Projection or regression coefficients).
Fits linear models with endogenous regressor using latent instrumental variable approaches. The methods included in the package are Lewbel's (1997) <doi:10.2307/2171884> higher moments approach as well as Lewbel's (2012) <doi:10.1080/07350015.2012.643126> heteroscedasticity approach, Park and Gupta's (2012) <doi:10.1287/mksc.1120.0718> joint estimation method that uses Gaussian copula and Kim and Frees's (2007) <doi:10.1007/s11336-007-9008-1> multilevel generalized method of moment approach that deals with endogeneity in a multilevel setting. These are statistical techniques to address the endogeneity problem where no external instrumental variables are needed. See the publication related to this package in the Journal of Statistical Software for more details: <doi:10.18637/jss.v107.i03>. Note that with version 2.0.0 sweeping changes were introduced which greatly improve functionality and usability but break backwards compatibility.
The MBECS provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.
ravanan is a CWL implementation that is powered by GNU Guix and provides strong reproducibility guarantees. ravanan provides strong caching of intermediate results so the same step of a workflow is never run twice. ravanan captures logs from every step of the workflow for easy tracing back in case of job failures. ravanan currently runs on single machines and on slurm via its API.
This package provides simple functions to compute and plot two types (sample-size- and coverage-based) rarefaction and extrapolation curves for species diversity (Hill numbers) based on individual-based abundance data or sampling-unit- based incidence data; see Chao and others (2014, Ecological Monographs) for pertinent theory and methodologies, and Hsieh, Ma and Chao (2016, Methods in Ecology and Evolution) for an introduction of the R package.
There are a number of binary files associated with the Webdriver/Selenium project (see http://www.seleniumhq.org/download/, https://sites.google.com/a/chromium.org/chromedriver/, https://github.com/mozilla/geckodriver, http://phantomjs.org/download.html, and https://github.com/SeleniumHQ/selenium/wiki/InternetExplorerDriver for more information). This package provides functions to download these binaries and to manage processes involving them.
The Connectivity Map (CMap) is a massive resource of perturbational gene expression profiles built by researchers at the Broad Institute and funded by the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) program. Please visit https://clue.io for more information. The cmapR package implements methods to parse, manipulate, and write common CMap data objects, such as annotated matrices and collections of gene sets.
This package provides tools to compute and visualize overlaps between gene sets or genomic regions. Venn diagrams with proportional areas are provided, while UpSet plots are recommended for larger numbers of sets. The package supports GRanges and GRangesList inputs, and integrates with analysis workflows for ChIP-seq, ATAC-seq, and other genomic interval data. It generates clean, interpretable, and publication-ready figures.
sosta (Spatial Omics STructure Analysis) is a package for analyzing spatial omics data to explore tissue organization at the anatomical structure level. It reconstructs anatomically relevant structures based on molecular features or cell types. It further calculates a range of metrics at the structure level to quantitatively describe tissue architecture. The package is designed to integrate with other packages for the analysis of spatial omics data.
Align-GVGD ('A-GVGD') is a method to predict the impact of missense substitutions based on the properties of amino acid side chains and protein multiple sequence alignments <doi:10.1136/jmg.2005.033878>. A-GVGD is an extension of the original Grantham distance to multiple sequence alignments. This package provides an alternative R implementation to the web version found on <http://agvgd.hci.utah.edu/>.
This package provides functions for Posterior estimates of Accelerated Failure Time(AFT) model with MCMC and Maximum likelihood estimates of AFT model without MCMC for univariate and multivariate analysis in high dimensional gene expression data are available in this afthd package. AFT model with Bayesian framework for multivariate in high dimensional data has been proposed by Prabhash et al.(2016) <doi:10.21307/stattrans-2016-046>.
Boldness-recalibration maximally spreads out probability predictions while maintaining a user specified level of calibration, facilitated the brcal() function. Supporting functions to assess calibration via Bayesian and Frequentist approaches, Maximum Likelihood Estimator (MLE) recalibration, Linear in Log Odds (LLO)-adjust via any specified parameters, and visualize results are also provided. Methodological details can be found in Guthrie & Franck (2024) <doi:10.1080/00031305.2024.2339266>.
Estimation of Markov generator matrices from discrete-time observations. The implemented approaches comprise diagonal and weighted adjustment of matrix logarithm based candidate solutions as in Israel (2001) <doi:10.1111/1467-9965.00114> as well as a quasi-optimization approach. Moreover, the expectation-maximization algorithm and the Gibbs sampling approach of Bladt and Sorensen (2005) <doi:10.1111/j.1467-9868.2005.00508.x> are included.
Compute covariate-adjusted specificity at controlled sensitivity level, or covariate-adjusted sensitivity at controlled specificity level, or covariate-adjust receiver operating characteristic curve, or covariate-adjusted thresholds at controlled sensitivity/specificity level. All statistics could also be computed for specific sub-populations given their covariate values. Methods are described in Ziyi Li, Yijian Huang, Datta Patil, Martin G. Sanda (2021+) "Covariate adjustment in continuous biomarker assessment".
Fit Cox proportional hazards models containing both fixed and random effects. The random effects can have a general form, of which familial interactions (a "kinship" matrix) is a particular special case. Note that the simplest case of a mixed effects Cox model, i.e. a single random per-group intercept, is also called a "frailty" model. The approach is based on Ripatti and Palmgren, Biometrics 2002.
This package provides methods for reading, displaying, processing and writing files originally arranged for the DSSAT-CSM fixed width format. The DSSAT-CSM cropping system model is described at J.W. Jones, G. Hoogenboomb, C.H. Porter, K.J. Boote, W.D. Batchelor, L.A. Hunt, P.W. Wilkens, U. Singh, A.J. Gijsman, J.T. Ritchie (2003) <doi:10.1016/S1161-0301(02)00107-7>.
Various Expectation-Maximization (EM) algorithms are implemented for item response theory (IRT) models. The package includes IRT models for binary and ordinal responses, along with dynamic and hierarchical IRT models with binary responses. The latter two models are fitted using variational EM. The package also includes variational network and text scaling models. The algorithms are described in Imai, Lo, and Olmsted (2016) <DOI:10.1017/S000305541600037X>.
Flow of funds are financial accounts that are provided by Federal Reserve quarterly. The package contains all datasets <https://www.federalreserve.gov/datadownload/Choose.aspx?rel=z1>, tables <https://www.federalreserve.gov/apps/fof/FOFTables.aspx> and descriptions <https://www.federalreserve.gov/apps/fof/Guide/z1_tables_description.pdf> with functions to understand series <https://www.federalreserve.gov/apps/fof/SeriesStructure.aspx> and explore them.