Separate a data frame in two based on key columns. The function unjoin()
provides an inside-out version of a nested data frame. This is used to identify duplication and normalize it (in the database sense) by linking two tables with the redundancy removed. This is a basic requirement for detecting topology within spatial structures that has motivated the need for this package as a building block for workflows within more applied projects.
This package provides a toolbox of common robust statistical tests, including robust descriptives, robust t-tests, and robust ANOVA. It is also available as a module for jamovi (see <https://www.jamovi.org> for more information). Walrus is based on the WRS2 package by Patrick Mair, which is in turn based on the scripts and work of Rand Wilcox. These analyses are described in depth in the book Introduction to Robust Estimation & Hypothesis Testing'.
Allows to provide live interpretations and explanations of statistical functions in R. These interpretations and explanations are shown when the explained function is called by the user. They can interact with the values of the explained function's actual results to offer relevant, meaningful insights. The xplain interpretations and explanations are based on an easy-to-use XML format that allows to include R code to interact with the returns of the explained function.
HiCDOC
normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular `.tsv` files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro
`.matrix` and `.bed` files.
BSeq-sc is a bioinformatics analysis pipeline that leverages single-cell sequencing data to estimate cell type proportion and cell type-specific gene expression differences from RNA-seq data from bulk tissue samples. This is a companion package to the publication "A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure." Baron et al. Cell Systems (2016) https://www.ncbi.nlm.nih.gov/pubmed/27667365.
This package provides an implementation of sparse linear discriminant analysis, which is a supervised classification method for multiple classes. Various novel optimization approaches to this problem are implemented including alternating direction method of multipliers (ADMM), proximal gradient (PG) and accelerated proximal gradient (APG). Functions for performing cross validation are also supplied along with basic prediction and plotting functions. Sparse zero variance discriminant (SZVD) analysis is also included in the package.
This package provides e-statistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, one-sample, two-sample, and multi-sample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodness-of-fit tests, clustering based on energy distance, testing for multivariate normality, distance components (disco) for non-parametric analysis of structured data, and other energy statistics/methods are implemented.
This a package containing diverse spatial datasets for demonstrating, benchmarking and teaching spatial data analysis. It includes R data of class sf
, Spatial
, and nb
. It also contains data stored in a range of file formats including GeoJSON, ESRI Shapefile and GeoPackage. Some of the datasets are designed to illustrate specific analysis techniques. cycle_hire()
and cycle_hire_osm()
, for example, are designed to illustrate point pattern analysis techniques.
Multivariate regression methodologies including classical reduced-rank regression (RRR) studied by Anderson (1951) <doi:10.1214/aoms/1177729580> and Reinsel and Velu (1998) <doi:10.1007/978-1-4757-2853-8>, reduced-rank regression via adaptive nuclear norm penalization proposed by Chen et al. (2013) <doi:10.1093/biomet/ast036> and Mukherjee et al. (2015) <doi:10.1093/biomet/asx080>, robust reduced-rank regression (R4) proposed by She and Chen (2017) <doi:10.1093/biomet/asx032>, generalized/mixed-response reduced-rank regression (mRRR
) proposed by Luo et al. (2018) <doi:10.1016/j.jmva.2018.04.011>, row-sparse reduced-rank regression (SRRR) proposed by Chen and Huang (2012) <doi:10.1080/01621459.2012.734178>, reduced-rank regression with a sparse singular value decomposition (RSSVD) proposed by Chen et al. (2012) <doi:10.1111/j.1467-9868.2011.01002.x> and sparse and orthogonal factor regression (SOFAR) proposed by Uematsu et al. (2019) <doi:10.1109/TIT.2019.2909889>.
The goal is to print an "aperçu", a short view of a vector, a matrix, a data.frame, a list or an array. By default, it prints the first 5 elements of each dimension. By default, the number of columns is equal to the number of lines. If you want to control the selection of the elements, you can pass a list, with each element being a vector giving the selection for each dimension.
Searches for, accesses, and retrieves Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package enriches the tables with metadata, deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. For more efficient data access the package allows for caching data in a local database and database level filtering, data manipulation and summarizing.
Fits a Causal Effect Random Forest of Interaction Tress (CERFIT) which is a modification of the Random Forest algorithm where each split is chosen to maximize subgroup treatment heterogeneity. Doing this allows it to estimate the individualized treatment effect for each observation in either randomized controlled trial (RCT) or observational data. For more information see X. Su, A. T. Peña, L. Liu, and R. A. Levine (2018) <doi:10.48550/arXiv.1709.04862>
.
This package contains methods for observed-score linking and equating under the single-group, equivalent-groups, and nonequivalent-groups with anchor test(s) designs. Equating types include identity, mean, linear, general linear, equipercentile, circle-arc, and composites of these. Equating methods include synthetic, nominal weights, Tucker, Levine observed score, Levine true score, Braun/Holland, frequency estimation, and chained equating. Plotting and summary methods, and methods for multivariate presmoothing and bootstrap error estimation are also provided.
The summation notation suggested by Einstein (1916) <doi:10.1002/andp.19163540702> is a concise mathematical notation that implicitly sums over repeated indices of n-dimensional arrays. Many ordinary matrix operations (e.g. transpose, matrix multiplication, scalar product, diag()
', trace etc.) can be written using Einstein notation. The notation is particularly convenient for expressing operations on arrays with more than two dimensions because the respective operators ('tensor products') might not have a standardized name.
This package provides a function that uses a genetic algorithm to search for a subset of size k from the integers 1:n, such that a user-supplied objective function is minimized at that subset. The selection step is done by tournament selection based on ranks, and elitism may be used to retain a portion of the best solutions from one generation to the next. Population objective function values may optionally be evaluated in parallel.
L1 estimation for linear regression using Barrodale and Roberts method <doi:10.1145/355616.361024> and the EM algorithm <doi:10.1023/A:1020759012226>. Estimation of mean and covariance matrix using the multivariate Laplace distribution, density, distribution function, quantile function and random number generation for univariate and multivariate Laplace distribution <doi:10.1080/03610929808832115>. Implementation of Naik and Plungpongpun <doi:10.1007/0-8176-4487-3_7> for the Generalized spatial median estimator is included.
This package provides a set of functions for some multivariate analyses utilizing a structural equation modeling (SEM) approach through the OpenMx
package. These analyses include canonical correlation analysis (CANCORR), redundancy analysis (RDA), and multivariate principal component regression (MPCR). It implements procedures discussed in Gu and Cheung (2023) <doi:10.1111/bmsp.12301>, Gu, Yung, and Cheung (2019) <doi:10.1080/00273171.2018.1512847>, and Gu et al. (2023) <doi:10.1080/00273171.2022.2141675>.
Estimation of multivariate differences between two groups (e.g., multivariate sex differences) with regularized regression methods and predictive approach. See Lönnqvist & Ilmarinen (2021) <doi:10.1007/s11109-021-09681-2> and Ilmarinen et al. (2023) <doi:10.1177/08902070221088155>. Includes tools that help in understanding difference score reliability, predictions of difference score variables, conditional intra-class correlations, and heterogeneity of variance estimates. Package development was supported by the Academy of Finland research grant 338891.
This package provides functionality for image processing and shape analysis in the context of reconstructed medical images generated by deep learning-based methods or standard image processing algorithms and produced from different medical imaging types, such as X-ray, Computational Tomography (CT), Magnetic Resonance Imaging (MRI), and pathology imaging. Specifically, offers tools to segment regions of interest and to extract quantitative shape descriptors for applications in signal processing, statistical analysis and modeling, and machine learning.
Using any importation code designed for SAS users to read ASCII files into sas7bdat files, this package parses through the INPUT block of a .sas syntax file to design the parameters needed for a read.fwf()
function call. This allows the user to specify the location of the ASCII (often a .dat') file and the location of the SAS syntax file, and then load the data frame directly into R in just one step.
This package implements variable selection procedures for low to moderate size generalized linear regressions models. It includes the STOPES functions for linear regression (Capanu M, Giurcanu M, Begg C, Gonen M, Optimized variable selection via repeated data splitting, Statistics in Medicine, 2020, 19(6):2167-2184) as well as subsampling based optimization methods for generalized linear regression models (Marinela Capanu, Mihai Giurcanu, Colin B Begg, Mithat Gonen, Subsampling based variable selection for generalized linear models).
The ta-test is a modified two-sample or two-group t-test of Gosset (1908). In small samples with less than 15 replicates,the ta-test significantly reduces type I error rate but has almost the same power with the t-test and hence can greatly enhance reliability or reproducibility of discoveries in biology and medicine. The ta-test can test single null hypothesis or multiple null hypotheses without needing to correct p-values.
ASSIGN is a computational tool to evaluate the pathway deregulation/activation status in individual patient samples. ASSIGN employs a flexible Bayesian factor analysis approach that adapts predetermined pathway signatures derived either from knowledge-based literature or from perturbation experiments to the cell-/tissue-specific pathway signatures. The deregulation/activation level of each context-specific pathway is quantified to a score, which represents the extent to which a patient sample encompasses the pathway deregulation/activation signature.
Banksy is an R package that incorporates spatial information to cluster cells in a feature space (e.g. gene expression). To incorporate spatial information, BANKSY computes the mean neighborhood expression and azimuthal Gabor filters that capture gene expression gradients. These features are combined with the cell's own expression to embed cells in a neighbor-augmented product space which can then be clustered, allowing for accurate and spatially-aware cell typing and tissue domain segmentation.