Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025) <doi:10.48550/arXiv.2503.22284>.
This package performs tuning of clustering models, methods and algorithms including the problem of determining an appropriate number of clusters. Validation of cluster analysis results is performed via quadratic scoring using resampling methods, as in Coraggio, L. and Coretto, P. (2023) <doi:10.1016/j.jmva.2023.105181>.
Suite of helper functions for data wrangling and visualization. The only theme for these functions is that they tend towards simple, short, and narrowly-scoped. These functions are built for tasks that often recur but are not large enough in scope to warrant an ecosystem of interdependent functions.
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
Median-of-means is a generic yet powerful framework for scalable and robust estimation. A framework for Bayesian analysis is called M-posterior, which estimates a median of subset posterior measures. For general exposition to the topic, see the paper by Minsker (2015) <doi:10.3150/14-BEJ645>.
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a dplyr compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
This package provides a lightweight toolkit that provides functions for printing tables from input data in the R console or terminal with customizable formatting. Supported outputs include American Psychological Association (APA)-style tables (American Psychological Association, 2020, ISBN:9781433832178), correlation matrices, contingency tables, and two-column summary tables.
Multiple ways to bin numeric columns with a tidy output. Wraps a variety of existing binning methods into one function, and includes a new method for binning by equal value, which is useful for sales data. Provides a function to automatically summarize the properties of the binned columns.
Allow R users to interact with the Canvas Learning Management System (LMS) API (see <https://canvas.instructure.com/doc/api/all_resources.html> for details). It provides a set of functions to access and manipulate course data, assignments, grades, users, and other resources available through the Canvas API.
This package performs clustering of quantitative variables, assuming that clusters lie in low-dimensional subspaces. Segmentation of variables, number of clusters and their dimensions are selected based on BIC. Candidate models are identified based on many runs of K-means algorithm with different random initializations of cluster centers.
We perform linear, logistic, and cox regression using the base functions lm(), glm(), and coxph() in the R software and the survival package. Likewise, we can use ols(), lrm() and cph() from the rms package for the same functionality. Each of these two sets of commands has a different focus. In many cases, we need to use both sets of commands in the same situation, e.g. we need to filter the full subset model using AIC, and we need to build a visualization graph for the final model. base.rms package can help you to switch between the two sets of commands easily.
Three robust marginal integration procedures for additive models based on local polynomial kernel smoothers. As a preliminary estimator of the multivariate function for the marginal integration procedure, a first approach uses local constant M-estimators, a second one uses local polynomials of order 1 over all the components of covariates, and the third one uses M-estimators based on local polynomials but only in the direction of interest. For this last approach, estimators of the derivatives of the additive functions can be obtained. All three procedures can compute predictions for points outside the training set if desired. See Boente and Martinez (2017) <doi:10.1007/s11749-016-0508-0> for details.
Model based simulation of dynamic networks under tie-oriented (Butts, C., 2008, <doi:10.1111/j.1467-9531.2008.00203.x>) and actor-oriented (Stadtfeld, C., & Block, P., 2017, <doi:10.15195/v4.a14>) relational event models. Supports simulation from a variety of relational event model extensions, including temporal variability in effects, heterogeneity through dyadic latent class relational event models (DLC-REM), random effects, blockmodels, and memory decay in relational event models (Lakdawala, R., 2024 <doi:10.48550/arXiv.2403.19329>). The development of this package was supported by a Vidi Grant (452-17-006) awarded by the Netherlands Organization for Scientific Research (NWO) Grant and an ERC Starting Grant (758791).
This package installs a self-contained Conda instance that is managed by the R/Bioconductor installation machinery. This aims to provide a consistent Python environment that can be used reliably by Bioconductor packages. Functions are also provided to enable smooth interoperability of multiple Python environments in a single R session.
Genome level Trellis graph visualizes genomic data conditioned by genomic categories (e.g. chromosomes). For each genomic category, multiple dimensional data which are represented as tracks describe different features from different aspects. This package provides high flexibility to arrange genomic categories and to add self-defined graphics in the plot.
Deciding what resolution to use can be a difficult question when approaching a clustering analysis. One way to approach this problem is to look at how samples move as the number of clusters increases. This package allows you to produce clustering trees, a visualization for interrogating clusterings as resolution increases.
This package provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to JavaScript promises, but with a syntax that is idiomatic R.
This package simplifies the creation of Excel .xlsx files by providing a high level interface to writing, styling and editing worksheets. Through the use of Rcpp, read/write times are comparable to the xlsx and XLConnect packages with the added benefit of removing the dependency on Java.
mlr3misc provides frequently used helper functions and assertions used in mlr3 and its companion packages. It comes with helper functions for functional programming, for printing, to work with data.table, as well as some generally useful R6 classes. This package also supersedes the package BBmisc.
clevRvis provides a set of visualization techniques for clonal evolution. These include shark plots, dolphin plots and plaice plots. Algorithms for time point interpolation as well as therapy effect estimation are provided. Phylogeny-aware color coding is implemented. A shiny-app for generating plots interactively is additionally provided.
Implementation of the Interval-Wise Testing (IWT) for omics data. This inferential procedure tests for differences in "Omics" data between two groups of genomic regions (or between a group of genomic regions and a reference center of symmetry), and does not require fixing location and scale at the outset.
VCFArray extends the DelayedArray to represent VCF data entries as array-like objects with on-disk / remote VCF file as backend. Data entries from VCF files, including info fields, FORMAT fields, and the fixed columns (REF, ALT, QUAL, FILTER) could be converted into VCFArray instances with different dimensions.
This package provides functions for interacting directly with the ALTADATA API. With this R package, developers can build applications around the ALTADATA API without having to deal with accessing and managing requests and responses. ALTADATA is a curated data marketplace for more information go to <https://www.altadata.io>.
Fit Generalized Additive Models (GAM) using mgcv with parsnip'/'tidymodels via additive <doi:10.5281/zenodo.4784245>. tidymodels is a collection of packages for machine learning; see Kuhn and Wickham (2020) <https://www.tidymodels.org>). The technical details of mgcv are described in Wood (2017) <doi:10.1201/9781315370279>.