Evaluates whether the relationship between two vectors is linear or nonlinear. Performs a test to determine how well a linear model fits the data compared to higher order polynomial models. Jhang et al. (2004) <doi:10.1043/1543-2165(2004)128%3C44:EOLITC%3E2.0.CO;2>.
Estimate the sufficient dimension reduction space using sparsed sliced inverse regression via Lasso (Lasso-SIR) introduced in Lin, Zhao, and Liu (2019) <doi:10.1080/01621459.2018.1520115>. The Lasso-SIR is consistent and achieve the optimal convergence rate under certain sparsity conditions for the multiple index models.
Insieme di funzioni di supporto al volume "Laboratorio di Statistica con R", Iacus-Masarotto, MacGraw-Hill Italia, 2006. This package contains sets of functions defined in "Laboratorio di Statistica con R", Iacus-Masarotto, MacGraw-Hill Italia, 2006. Function names and docs are in italian as well.
Network meta-analysis and network meta-regression models for aggregate data, individual patient data, and mixtures of both individual and aggregate data using multilevel network meta-regression as described by Phillippo et al. (2020) <doi:10.1111/rssa.12579>. Models are estimated in a Bayesian framework using Stan'.
We introduce a high-dimensional multi-study robust factor model, which learns latent features and accounts for the heterogeneity among source. It could be used for analyzing heterogeneous RNA sequencing data. More details can be referred to Jiang et al. (2025) <doi:10.48550/arXiv.2506.18478>.
Collection of functions to compute within-study covariances for different effect sizes, data visualization, and single and multiple imputations for missing data. Effect sizes include correlation (r), mean difference (MD), standardized mean difference (SMD), log odds ratio (logOR), log risk ratio (logRR), and risk difference (RD).
This package performs nonparametric estimation in mixture cure models when the cure status is partially known. For details, see Safari et al (2021) <doi:10.1002/bimj.202100156>, Safari et al (2022) <doi:10.1177/09622802221115880> and Safari et al (2023) <doi:10.1007/s10985-023-09591-x>.
Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025) <doi:10.48550/arXiv.2503.22284>.
Estimate penalized synthetic control models and perform hold-out validation to determine their penalty parameter. This method is based on the work by Abadie & L'Hour (2021) <doi:10.1080/01621459.2021.1971535>. Penalized synthetic controls smoothly interpolate between one-to-one matching and the synthetic control method.
Different methods for PLS analysis of one or two data tables such as Tucker's Inter-Battery, NIPALS, SIMPLS, SIMPLS-CA, PLS Regression, and PLS Canonical Analysis. The main reference for this software is the awesome book (in French) La Regression PLS: Theorie et Pratique by Michel Tenenhaus.
This package performs tuning of clustering models, methods and algorithms including the problem of determining an appropriate number of clusters. Validation of cluster analysis results is performed via quadratic scoring using resampling methods, as in Coraggio, L. and Coretto, P. (2023) <doi:10.1016/j.jmva.2023.105181>.
Median-of-means is a generic yet powerful framework for scalable and robust estimation. A framework for Bayesian analysis is called M-posterior, which estimates a median of subset posterior measures. For general exposition to the topic, see the paper by Minsker (2015) <doi:10.3150/14-BEJ645>.
Suite of helper functions for data wrangling and visualization. The only theme for these functions is that they tend towards simple, short, and narrowly-scoped. These functions are built for tasks that often recur but are not large enough in scope to warrant an ecosystem of interdependent functions.
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a dplyr compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Allow R users to interact with the Canvas Learning Management System (LMS) API (see <https://canvas.instructure.com/doc/api/all_resources.html> for details). It provides a set of functions to access and manipulate course data, assignments, grades, users, and other resources available through the Canvas API.
This package performs clustering of quantitative variables, assuming that clusters lie in low-dimensional subspaces. Segmentation of variables, number of clusters and their dimensions are selected based on BIC. Candidate models are identified based on many runs of K-means algorithm with different random initializations of cluster centers.
We perform linear, logistic, and cox regression using the base functions lm(), glm(), and coxph() in the R software and the survival package. Likewise, we can use ols(), lrm() and cph() from the rms package for the same functionality. Each of these two sets of commands has a different focus. In many cases, we need to use both sets of commands in the same situation, e.g. we need to filter the full subset model using AIC, and we need to build a visualization graph for the final model. base.rms package can help you to switch between the two sets of commands easily.
Three robust marginal integration procedures for additive models based on local polynomial kernel smoothers. As a preliminary estimator of the multivariate function for the marginal integration procedure, a first approach uses local constant M-estimators, a second one uses local polynomials of order 1 over all the components of covariates, and the third one uses M-estimators based on local polynomials but only in the direction of interest. For this last approach, estimators of the derivatives of the additive functions can be obtained. All three procedures can compute predictions for points outside the training set if desired. See Boente and Martinez (2017) <doi:10.1007/s11749-016-0508-0> for details.
Model based simulation of dynamic networks under tie-oriented (Butts, C., 2008, <doi:10.1111/j.1467-9531.2008.00203.x>) and actor-oriented (Stadtfeld, C., & Block, P., 2017, <doi:10.15195/v4.a14>) relational event models. Supports simulation from a variety of relational event model extensions, including temporal variability in effects, heterogeneity through dyadic latent class relational event models (DLC-REM), random effects, blockmodels, and memory decay in relational event models (Lakdawala, R., 2024 <doi:10.48550/arXiv.2403.19329>). The development of this package was supported by a Vidi Grant (452-17-006) awarded by the Netherlands Organization for Scientific Research (NWO) Grant and an ERC Starting Grant (758791).
This package installs a self-contained Conda instance that is managed by the R/Bioconductor installation machinery. This aims to provide a consistent Python environment that can be used reliably by Bioconductor packages. Functions are also provided to enable smooth interoperability of multiple Python environments in a single R session.
Genome level Trellis graph visualizes genomic data conditioned by genomic categories (e.g. chromosomes). For each genomic category, multiple dimensional data which are represented as tracks describe different features from different aspects. This package provides high flexibility to arrange genomic categories and to add self-defined graphics in the plot.
Deciding what resolution to use can be a difficult question when approaching a clustering analysis. One way to approach this problem is to look at how samples move as the number of clusters increases. This package allows you to produce clustering trees, a visualization for interrogating clusterings as resolution increases.
This package provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to JavaScript promises, but with a syntax that is idiomatic R.