Variable and interaction selection are essential to classification in high-dimensional setting. In this package, we provide the implementation of SODA procedure, which is a forward-backward algorithm that selects both main and interaction effects under logistic regression and quadratic discriminant analysis. We also provide an extension, S-SODA, for dealing with the variable selection problem for semi-parametric models with continuous responses.
Support for reading/writing simple feature ('sf') spatial objects from/to Parquet files. Parquet files are an open-source, column-oriented data storage format from Apache (<https://parquet.apache.org/>), now popular across programming languages. This implementation converts simple feature list geometries into well-known binary format for use by arrow', and coordinate reference system information is maintained in a standard metadata format.
Algorithms for accelerating the convergence of slow, monotone sequences from smooth, contraction mapping such as the EM and MM algorithms. It can be used to accelerate any smooth, linearly convergent acceleration scheme. A tutorial style introduction to this package is available in a vignette on the CRAN download page or, when the package is loaded in an R session, with vignette("turboEM").
Read, manipulate and write voxel spaces. Voxel spaces are read from text-based output files of the AMAPVox software. AMAPVox is a LiDAR point cloud voxelisation software that aims at estimating leaf area through several theoretical/numerical approaches. See more in the article Vincent et al. (2017) <doi:10.23708/1AJNMP> and the technical note Vincent et al. (2021) <doi:10.23708/1AJNMP>.
Estimation of association between disease or death counts (e.g. COVID-19) and socio-environmental risk factors using a zero-inflated Bayesian spatiotemporal model. Non-spatiotemporal models and/or models without zero-inflation are also included for comparison. Functions to produce corresponding maps are also included. See Chakraborty et al. (2022) <doi:10.1007/s13253-022-00487-1> for more details on the method.
Generate urls and hyperlinks to commonly used biological databases and resources based on standard identifiers. This is primarily useful when writing dynamic reports that reference things like gene symbols in text or tables, allowing you to, for example, convert gene identifiers to hyperlinks pointing to their entry in the NCBI Gene database. Currently supports NCBI Gene, PubMed', Gene Ontology, KEGG', CRAN and Bioconductor.
Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. (1997) There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from - optionally weighted - linear regression (lm) or robust linear regression ('rlm from the MASS package).
Expectile regression is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. This package implements a regression tree based gradient boosting estimator for nonparametric multiple expectile regression, proposed by Yang, Y., Qian, W. and Zou, H. (2018) <doi:10.1080/00949655.2013.876024>. The code is based on the gbm package originally developed by Greg Ridgeway.
Dynamic and Interactive Maps with R, powered by leaflet <https://leafletjs.com>. evolMap generates a web page with interactive and dynamic maps to which you can add geometric entities (points, lines or colored geographic areas), and/or markers with optional links between them. The dynamic ability of these maps allows their components to evolve over a continuous period of time or by periods.
For ordinal rating data, consider the accelerated EM algorithm to estimate and test models within the family of CUB models (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions). The procedure is built upon Louis identity for the observed information matrix. Best-subset variable selection is then implemented since it becomes more feasible from the computational point of view.
This package provides methods to construct and power group sequential clinical trial designs for outcomes at multiple times. Outcomes at earlier times provide information on the final (primary) outcome. A range of recruitment and correlation models are available as are methods to simulate data in order to explore design operating characteristics. For more details see Parsons (2024) <doi:10.1186/s12874-024-02174-w>.
Fits sparse interaction models for continuous and binary responses subject to the strong (or weak) hierarchy restriction that an interaction between two variables only be included if both (or at least one of) the variables is included as a main effect. For more details, see Bien, J., Taylor, J., Tibshirani, R., (2013) "A Lasso for Hierarchical Interactions." Annals of Statistics. 41(3). 1111-1141.
Simulates respiratory virus epidemics using meta-population compartmental models following Fadikar et. al. (2025) <doi:10.1101/2025.05.05.25327021>. MetaRVM implements a stochastic SEIRD (Susceptible-Exposed-Infected-Recovered-Dead) framework with demographic stratification by age, race, and geographic zones. It supports complex epidemiological scenarios including asymptomatic and presymptomatic transmission, hospitalization dynamics, vaccination schedules, and time-varying contact patterns via mixing matrices.
Comprehensive network analysis package. Calculate correlation network fastly, accelerate lots of analysis by parallel computing. Support for multi-omics data, search sub-nets fluently. Handle bigger data, more than 10,000 nodes in each omics. Offer various layout method for multi-omics network and some interfaces to other software ('Gephi', Cytoscape', ggplot2'), easy to visualize. Provide comprehensive topology indexes calculation, including ecological network stability.
Implementation of a next-generation, multi-stock age-structured fisheries assessment model. multiSA is intended for use in mixed fisheries where stock composition can not be readily identified in fishery data alone, e.g., from catch and age/length composition. Models can be fitted to genetic data, e.g., stock composition of catches and close-kin pairs, with seasonal stock availability and movement.
The penalized inverse-variance weighted (pIVW) estimator is a Mendelian randomization method for estimating the causal effect of an exposure variable on an outcome of interest based on summary-level GWAS data. The pIVW estimator accounts for weak instruments and balanced horizontal pleiotropy simultaneously. See Xu S., Wang P., Fung W.K. and Liu Z. (2022) <doi:10.1111/biom.13732>.
Extends the mlr3 ecosystem to functional analysis by adding support for irregular and regular functional data as defined in the tf package. The package provides PipeOps for preprocessing functional columns and for extracting scalar features, thereby allowing standard machine learning algorithms to be applied afterwards. Available operations include simple functional features such as the mean or maximum, smoothing, interpolation, flattening, and functional PCA'.
This package provides transfusion-related differential tests on Near-infrared spectroscopy (NIRS) time series with detection limit, which contains two testing statistics: Mean Area Under the Curve (MAUC) and slope statistic. This package applied a penalized spline method within imputation setting. Testing is conducted by a nested permutation approach within imputation. Refer to Guo et al (2018) <doi:10.1177/0962280218786302> for further details.
Distributed reproducible computing framework, adopting ideas from git, docker and other software. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.
XKCD described a supposedly "bad" colormap that it called a "Painbow" (see <https://xkcd.com/2537/>). But simple tests demonstrate that under some circumstances, the colormap can perform very well, and people can find information that is difficult to detect with the ggplot2 default and even supposedly "good" colormaps like viridis. This library let's you use the Painbow in your own ggplot graphs.
Estimating the Shapley values using the algorithm in the paper Liuqing Yang, Yongdao Zhou, Haoda Fu, Min-Qian Liu and Wei Zheng (2024) <doi:10.1080/01621459.2023.2257364> "Fast Approximation of the Shapley Values Based on Order-of-Addition Experimental Designs". You provide the data and define the value function, it retures the estimated Shapley values based on sampling methods or experimental designs.
Consolidated data simulation, sample size calculation and analysis functions for several snSMART (small sample sequential, multiple assignment, randomized trial) designs under one library. See Wei, B., Braun, T.M., Tamura, R.N. and Kidwell, K.M. "A Bayesian analysis of small n sequential multiple assignment randomized trials (snSMARTs)." (2018) Statistics in medicine, 37(26), pp.3723-3732 <doi:10.1002/sim.7900>.
Indirect method for the estimation of reference intervals (RIs) using Real-World Data ('RWD') and methods for comparing and verifying RIs. Estimates RIs by applying advanced statistical methods to routine diagnostic test measurements, which include both pathological and non-pathological samples, to model the distribution of non-pathological samples. This distribution is then used to derive reference intervals and support RI verification, i.e., deciding if a specific RI is suitable for the local population. The package also provides functions for printing and plotting algorithm results. See ?refineR for a detailed description of features. Version 1.0 of the algorithm is described in Ammer et al. (2021) <doi:10.1038/s41598-021-95301-2>. Additional guidance is in Ammer et al. (2023) <doi:10.1093/jalm/jfac101>. The verification method is described in Beck et al. (2025) <doi:10.1515/cclm-2025-0728>.
This package provides a set of tools for creation, manipulation, and modeling of tensors with arbitrary number of modes. A tensor in the context of data analysis is a multidimensional array. rTensor does this by providing a S4 class Tensor that wraps around the base array class. rTensor provides common tensor operations as methods, including matrix unfolding, summing/averaging across modes, calculating the Frobenius norm, and taking the inner product between two tensors. Familiar array operations are overloaded, such as index subsetting via [ and element-wise operations. rTensor also implements various tensor decomposition, including CP, GLRAM, MPCA, PVD, and Tucker. For tensors with 3 modes, rTensor also implements transpose, t-product, and t-SVD, as defined in Kilmer et al. (2013). Some auxiliary functions include the Khatri-Rao product, Kronecker product, and the Hadamard product for a list of matrices.