This package performs nonparametric estimation in mixture cure models when the cure status is partially known. For details, see Safari et al (2021) <doi:10.1002/bimj.202100156>, Safari et al (2022) <doi:10.1177/09622802221115880> and Safari et al (2023) <doi:10.1007/s10985-023-09591-x>.
Different methods for PLS analysis of one or two data tables such as Tucker's Inter-Battery, NIPALS, SIMPLS, SIMPLS-CA, PLS Regression, and PLS Canonical Analysis. The main reference for this software is the awesome book (in French) La Regression PLS: Theorie et Pratique by Michel Tenenhaus.
Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025) <doi:10.48550/arXiv.2503.22284>
.
This package provides Partial least squares Regression for (weighted) beta regression models (Bertrand 2013, <http://journal-sfds.fr/article/view/215>) and k-fold cross-validation of such models using various criteria. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available.
Estimate penalized synthetic control models and perform hold-out validation to determine their penalty parameter. This method is based on the work by Abadie & L'Hour (2021) <doi:10.1080/01621459.2021.1971535>. Penalized synthetic controls smoothly interpolate between one-to-one matching and the synthetic control method.
This package performs tuning of clustering models, methods and algorithms including the problem of determining an appropriate number of clusters. Validation of cluster analysis results is performed via quadratic scoring using resampling methods, as in Coraggio, L. and Coretto, P. (2023) <doi:10.1016/j.jmva.2023.105181>.
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a dplyr compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
Median-of-means is a generic yet powerful framework for scalable and robust estimation. A framework for Bayesian analysis is called M-posterior, which estimates a median of subset posterior measures. For general exposition to the topic, see the paper by Minsker (2015) <doi:10.3150/14-BEJ645>.
Suite of helper functions for data wrangling and visualization. The only theme for these functions is that they tend towards simple, short, and narrowly-scoped. These functions are built for tasks that often recur but are not large enough in scope to warrant an ecosystem of interdependent functions.
Multiple ways to bin numeric columns with a tidy output. Wraps a variety of existing binning methods into one function, and includes a new method for binning by equal value, which is useful for sales data. Provides a function to automatically summarize the properties of the binned columns.
Improves the predictive performance of ridge and lasso regression exploiting one or more sources of prior information on the importance and direction of effects (Rauschenberger and others 2023, <doi:10.1093/bioinformatics/btad680>). For running the vignette (optional), install fwelnet from GitHub
<https://github.com/kjytay/fwelnet>.
Allow R users to interact with the Canvas Learning Management System (LMS) API (see <https://canvas.instructure.com/doc/api/all_resources.html> for details). It provides a set of functions to access and manipulate course data, assignments, grades, users, and other resources available through the Canvas API.
This package performs clustering of quantitative variables, assuming that clusters lie in low-dimensional subspaces. Segmentation of variables, number of clusters and their dimensions are selected based on BIC. Candidate models are identified based on many runs of K-means algorithm with different random initializations of cluster centers.
Data driven strategy to find hidden groups of patients with complex diseases using clinical data. ClustAll
facilitates the unsupervised identification of multiple robust stratifications. ClustAll
, is able to overcome the most common limitations found when dealing with clinical data (missing values, correlated data, mixed data types).
HMP2Data is a Bioconductor package of the Human Microbiome Project 2 (HMP2) 16S rRNA
sequencing data. Processed data is provided as phyloseq, SummarizedExperiment
, and MultiAssayExperiment
class objects. Individual matrices and data.frames used for building these S4 class objects are also provided in the package.
svaRetro
contains functions for detecting retrotransposed transcripts (RTs) from structural variant calls. It takes structural variant calls in GRanges of breakend notation and identifies RTs by exon-exon junctions and insertion sites. The candidate RTs are reported by events and annotated with information of the inserted transcripts.
`tomoseqr` is an R package for analyzing Tomo-seq data. Tomo-seq is a genome-wide RNA tomography method that combines combining high-throughput RNA sequencing with cryosectioning for spatially resolved transcriptomics. `tomoseqr` reconstructs 3D expression patterns from tomo-seq data and visualizes the reconstructed 3D expression patterns.
This package implements functions for simulation-based inference. In particular, it implements functions to perform likelihood inference from data summaries whose distributions are simulated. The package implements more advanced methods than the ones first described in: Rousset, Gouy, Almoyna and Courtiol (2017) <doi:10.1111/1755-0998.12627>.
MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. This package relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods.
This package can be used to normalize cytometry samples when a control sample is taken along in each of the batches. This is done by first identifying multiple clusters/cell types, learning the batch effects from the control samples and applying quantile normalization on all markers of interest.
This is a package for random number generation for the truncated multivariate normal and Student t distribution. It computes probabilities, quantiles and densities, including one-dimensional and bivariate marginal densities. It computes first and second moments (i.e. mean and covariance matrix) for the double-truncated multinormal case.
This package provides functions for interacting directly with the ALTADATA API. With this R package, developers can build applications around the ALTADATA API without having to deal with accessing and managing requests and responses. ALTADATA is a curated data marketplace for more information go to <https://www.altadata.io>.
Fit Generalized Additive Models (GAM) using mgcv with parsnip'/'tidymodels via additive <doi:10.5281/zenodo.4784245>. tidymodels is a collection of packages for machine learning; see Kuhn and Wickham (2020) <https://www.tidymodels.org>). The technical details of mgcv are described in Wood (2017) <doi:10.1201/9781315370279>.