Extension of the mgcv package, providing visual tools for Generalized Additive Models that exploit the additive structure of such models, scale to large data sets and can be used in conjunction with a wide range of response distributions. The focus is providing visual methods for better understanding the model output and for aiding model checking and development beyond simple exponential family regression. The graphical framework is based on the layering system provided by ggplot2'.
Solve scalar-on-function linear models, including generalized linear mixed effect model and quantile linear regression model, and bias correction estimation methods due to measurement error. Details about the measurement error bias correction methods, see Luan et al. (2023) <doi:10.48550/arXiv.2305.12624>, Tekwe et al. (2022) <doi:10.1093/biostatistics/kxac017>, Zhang et al. (2023) <doi:10.5705/ss.202021.0246>, Tekwe et al. (2019) <doi:10.1002/sim.8179>.
This package provides a suite of diagnostic tools for univariate point processes. This includes tools for simulating and fitting both common and more complex temporal point processes. We also include functions to visualise these point processes and collect existing diagnostic tools of Brown et al. (2002) <doi:10.1162/08997660252741149> and Wu et al. (2021) <doi:10.1002/9781119821588.ch7>, which can be used to assess the fit of a chosen point process model.
Evaluation of the pdf and the cdf of the univariate, noncentral, p-generalized normal distribution. Sampling from the univariate, noncentral, p-generalized normal distribution using either the p-generalized polar method, the p-generalized rejecting polar method, the Monty Python method, the Ziggurat method or the method of Nardon and Pianca. The package also includes routines for the simulation of the bivariate, p-generalized uniform distribution and the simulation of the corresponding angular distribution.
This package provides a robust and powerful empirical Bayesian approach is developed for replicability analysis of two large-scale experimental studies. The method controls the false discovery rate by using the joint local false discovery rate based on the replicability null as the test statistic. An EM algorithm combined with a shape constraint nonparametric method is used to estimate unknown parameters and functions. [Li, Y. et al., (2024), <doi:10.1371/journal.pgen.1011423>].
This package creates classifier for binary outcomes using Adaptive Boosting (AdaBoost) algorithm on decision stumps with a fast C++ implementation. For a description of AdaBoost, see Freund and Schapire (1997) <doi:10.1006/jcss.1997.1504>. This type of classifier is nonlinear, but easy to interpret and visualize. Feature vectors may be a combination of continuous (numeric) and categorical (string, factor) elements. Methods for classifier assessment, predictions, and cross-validation also included.
Separate a data frame in two based on key columns. The function unjoin() provides an inside-out version of a nested data frame. This is used to identify duplication and normalize it (in the database sense) by linking two tables with the redundancy removed. This is a basic requirement for detecting topology within spatial structures that has motivated the need for this package as a building block for workflows within more applied projects.
This package provides a toolbox of common robust statistical tests, including robust descriptives, robust t-tests, and robust ANOVA. It is also available as a module for jamovi (see <https://www.jamovi.org> for more information). Walrus is based on the WRS2 package by Patrick Mair, which is in turn based on the scripts and work of Rand Wilcox. These analyses are described in depth in the book Introduction to Robust Estimation & Hypothesis Testing'.
Allows to provide live interpretations and explanations of statistical functions in R. These interpretations and explanations are shown when the explained function is called by the user. They can interact with the values of the explained function's actual results to offer relevant, meaningful insights. The xplain interpretations and explanations are based on an easy-to-use XML format that allows to include R code to interact with the returns of the explained function.
HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular `.tsv` files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro `.matrix` and `.bed` files.
This package represents a collection of plotting and table output functions for data visualization. Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, histograms, box plots, (generalized) linear models, mixed effects models, principal component analysis and correlation matrices, cluster analyses, scatter plots, stacked scales, effects plots of regression models (including interaction terms) and much more. This package supports labelled data.
This package provides bindings to ImageMagick, a comprehensive image processing library. It supports many common formats (PNG, JPEG, TIFF, PDF, etc.) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio, images are automatically previewed when printed to the console, resulting in an interactive editing environment.
The goal is to print an "aperçu", a short view of a vector, a matrix, a data.frame, a list or an array. By default, it prints the first 5 elements of each dimension. By default, the number of columns is equal to the number of lines. If you want to control the selection of the elements, you can pass a list, with each element being a vector giving the selection for each dimension.
Searches for, accesses, and retrieves Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package enriches the tables with metadata, deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. For more efficient data access the package allows for caching data in a local database and database level filtering, data manipulation and summarizing.
This package contains methods for observed-score linking and equating under the single-group, equivalent-groups, and nonequivalent-groups with anchor test(s) designs. Equating types include identity, mean, linear, general linear, equipercentile, circle-arc, and composites of these. Equating methods include synthetic, nominal weights, Tucker, Levine observed score, Levine true score, Braun/Holland, frequency estimation, and chained equating. Plotting and summary methods, and methods for multivariate presmoothing and bootstrap error estimation are also provided.
The summation notation suggested by Einstein (1916) <doi:10.1002/andp.19163540702> is a concise mathematical notation that implicitly sums over repeated indices of n-dimensional arrays. Many ordinary matrix operations (e.g. transpose, matrix multiplication, scalar product, diag()', trace etc.) can be written using Einstein notation. The notation is particularly convenient for expressing operations on arrays with more than two dimensions because the respective operators ('tensor products') might not have a standardized name.
This package provides a function that uses a genetic algorithm to search for a subset of size k from the integers 1:n, such that a user-supplied objective function is minimized at that subset. The selection step is done by tournament selection based on ranks, and elitism may be used to retain a portion of the best solutions from one generation to the next. Population objective function values may optionally be evaluated in parallel.
L1 estimation for linear regression using Barrodale and Roberts method <doi:10.1145/355616.361024> and the EM algorithm <doi:10.1023/A:1020759012226>. Estimation of mean and covariance matrix using the multivariate Laplace distribution, density, distribution function, quantile function and random number generation for univariate and multivariate Laplace distribution <doi:10.1080/03610929808832115>. Implementation of Naik and Plungpongpun <doi:10.1007/0-8176-4487-3_7> for the Generalized spatial median estimator is included.
This package provides a set of functions for some multivariate analyses utilizing a structural equation modeling (SEM) approach through the OpenMx package. These analyses include canonical correlation analysis (CANCORR), redundancy analysis (RDA), and multivariate principal component regression (MPCR). It implements procedures discussed in Gu and Cheung (2023) <doi:10.1111/bmsp.12301>, Gu, Yung, and Cheung (2019) <doi:10.1080/00273171.2018.1512847>, and Gu et al. (2023) <doi:10.1080/00273171.2022.2141675>.
This package implements variable selection procedures for low to moderate size generalized linear regressions models. It includes the STOPES functions for linear regression (Capanu M, Giurcanu M, Begg C, Gonen M, Optimized variable selection via repeated data splitting, Statistics in Medicine, 2020, 19(6):2167-2184) as well as subsampling based optimization methods for generalized linear regression models (Marinela Capanu, Mihai Giurcanu, Colin B Begg, Mithat Gonen, Subsampling based variable selection for generalized linear models).
Using any importation code designed for SAS users to read ASCII files into sas7bdat files, this package parses through the INPUT block of a .sas syntax file to design the parameters needed for a read.fwf() function call. This allows the user to specify the location of the ASCII (often a .dat') file and the location of the SAS syntax file, and then load the data frame directly into R in just one step.
This package provides functionality for image processing and shape analysis in the context of reconstructed medical images generated by deep learning-based methods or standard image processing algorithms and produced from different medical imaging types, such as X-ray, Computational Tomography (CT), Magnetic Resonance Imaging (MRI), and pathology imaging. Specifically, offers tools to segment regions of interest and to extract quantitative shape descriptors for applications in signal processing, statistical analysis and modeling, and machine learning.
The ta-test is a modified two-sample or two-group t-test of Gosset (1908). In small samples with less than 15 replicates,the ta-test significantly reduces type I error rate but has almost the same power with the t-test and hence can greatly enhance reliability or reproducibility of discoveries in biology and medicine. The ta-test can test single null hypothesis or multiple null hypotheses without needing to correct p-values.
ASSIGN is a computational tool to evaluate the pathway deregulation/activation status in individual patient samples. ASSIGN employs a flexible Bayesian factor analysis approach that adapts predetermined pathway signatures derived either from knowledge-based literature or from perturbation experiments to the cell-/tissue-specific pathway signatures. The deregulation/activation level of each context-specific pathway is quantified to a score, which represents the extent to which a patient sample encompasses the pathway deregulation/activation signature.