The agghoo procedure is an alternative to usual cross-validation. Instead of choosing the best model trained on V subsamples, it determines a winner model for each subsample, and then aggregates the V outputs. For the details, see "Aggregated hold-out" by Guillaume Maillard, Sylvain Arlot, Matthieu Lerasle (2021) <arXiv:1909.04890> published in Journal of Machine Learning Research 22(20):1--55.
This package implements the Bayesian Synthetic Control method for causal inference in comparative case studies. This package provides tools for estimating treatment effects in settings with a single treated unit and multiple control units, allowing for uncertainty quantification and flexible modeling of time-varying effects. The methodology is based on the paper by Vives and Martinez (2022) <doi:10.48550/arXiv.2206.01779>.
Bindings for additional tree-based model engines for use with the parsnip package. Models include gradient boosted decision trees with LightGBM (Ke et al, 2017.), conditional inference trees and conditional random forests with partykit (Hothorn and Zeileis, 2015. and Hothorn et al, 2006. <doi:10.1198/106186006X133933>), and accelerated oblique random forests with aorsf (Jaeger et al, 2022 <doi:10.5281/zenodo.7116854>).
This package contains functions to estimate a smoothed and a non-smoothed (empirical) time-dependent receiver operating characteristic curve and the corresponding area under the receiver operating characteristic curve and the optimal cutoff point for the right and interval censored survival data. See Beyene and El Ghouch (2020)<doi:10.1002/sim.8671> and Beyene and El Ghouch (2022) <doi:10.1002/bimj.202000382>.
This package provides a framework for specifying and running flexible linear-time reachability-based algorithms for graphical causal inference. Rule tables are used to encode and customize the reachability algorithm to typical causal and probabilistic reasoning tasks such as finding d-connected nodes or more advanced applications. For more information, see Wienöbst, Weichwald and Henckel (2025) <doi:10.48550/arXiv.2506.15758>.
We provide 70 data sets of females of reproductive age from 19 Asian countries, ranging in age from 15 to 49. The data sets are extracted from demographic and health surveys that were conducted over an extended period of time. Moreover, the functions also provide Whippleâ s index as well as age reporting quality such as very rough, rough, approximate, accurate, and highly accurate.
Some wrappers, functions and data sets for for spatial point pattern analysis (mainly based on spatstat'), used in the book "Introduccion al Analisis Espacial de Datos en Ecologia y Ciencias Ambientales: Metodos y Aplicaciones" and in the papers by De la Cruz et al. (2008) <doi:10.1111/j.0906-7590.2008.05299.x> and Olano et al. (2009) <doi:10.1051/forest:2008074>.
Create and maintain delayed-data packages (ddp's). Data stored in a ddp are available on demand, but do not take up memory until requested. You attach a ddp with g.data.attach(), then read from it and assign to it in a manner similar to S-PLUS, except that you must run g.data.save() to actually commit to disk.
This package provides a unified framework for sparse-group regularization and precision matrix estimation in Gaussian graphical models. It implements multiple sparse-group penalties, including sparse-group lasso, sparse-group adaptive lasso, sparse-group SCAD, and sparse-group MCP, and solves them efficiently using ADMM-based optimization. The package is designed for high-dimensional network inference where both sparsity and group structure are present.
Parse, trim, join, visualise and analyse data from Itrax sediment core multi-parameter scanners manufactured by Cox Analytical Systems, Sweden. Functions are provided for parsing XRF-peak area files, line-scan optical images, and radiographic images, alongside accompanying metadata. A variety of data wrangling tasks like trimming, joining and reducing XRF-peak area data are simplified. Multivariate methods are implemented with appropriate data transformation.
This package provides a fast negative binomial mixed model for conducting association analysis of multi-subject single-cell data. It can be used for identifying marker genes, differential expression and co-expression analyses. The model includes subject-level random effects to account for the hierarchical structure in multi-subject single-cell data. See He et al. (2021) <doi:10.1038/s42003-021-02146-6>.
Perform a stratified weighted log-rank test in a randomized controlled trial. Tests can be visualized as a difference in average score on the two treatment arms. These methods are described in Magirr and Burman (2018) <doi:10.48550/arXiv.1807.11097>, Magirr (2020) <doi:10.48550/arXiv.2007.04767>, and Magirr and Jimenez (2022) <doi:10.48550/arXiv.2201.10445>.
Using the R package reticulate', this package creates an interface to the pysd toolset. The package provides an R interface to a number of pysd functions, and can read files in Vensim mdl format, and xmile format. The resulting simulations are returned as a tibble', and from that the results can be processed using dplyr and ggplot2'. The package has been tested using python3'.
This package implements tools for the analysis of partially ordered data, with a particular focus on the evaluation of multidimensional systems of indicators and on the analysis of poverty. References, Fattore M. (2016) <doi:10.1007/s11205-015-1059-6> Fattore M., Arcagni A. (2016) <doi:10.1007/s11205-016-1501-4> Arcagni A. (2017) <doi:10.1007/978-3-319-45421-4_19>.
Convert Chinese characters into Pinyin (the official romanization system for Standard Chinese in mainland China, Malaysia, Singapore, and Taiwan. See <https://en.wikipedia.org/wiki/Pinyin> for details), Sijiao (four or five numerical digits per character. See <https://en.wikipedia.org/wiki/Four-Corner_Method>.), Wubi (an input method with five strokes. See <https://en.wikipedia.org/wiki/Wubi_method>) or user-defined codes.
This package provides tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.
This package provides a method for prediction of environmental conditions based on transcriptome data linked with the environmental gradients. This package provides functions to overview gene-environment relationships, to construct the prediction model, and to predict environmental conditions where the transcriptomes were generated. This package can quest for candidate genes for the model construction even in non-model organisms transcriptomes without any genetic information.
Shortest paths between points in grids. Optional barriers and custom transition functions. Applications regarding planet Earth, as well as generally spheres and planes. Optimized for computational performance, customizability, and user friendliness. Graph-theoretical implementation tailored to gridded data. Currently focused on Dijkstra's (1959) <doi:10.1007/BF01386390> algorithm. Future updates broaden the scope to other least cost path algorithms and to centrality measures.
This package provides functions to manipulate PDF files: fill out PDF forms; merge multiple PDF files into one; remove selected pages from a file; rename multiple files in a directory; rotate entire pdf document; rotate selected pages of a pdf file; Select pages from a file; splits single input PDF document into individual pages; splits single input PDF document into parts from given points.
Set of functions to quantify and map the behaviour of winds generated by tropical storms and cyclones in space and time. It includes functions to compute and analyze fields such as the maximum sustained wind field, power dissipation index and duration of exposure to winds above a given threshold. It also includes functions to map the trajectories as well as characteristics of the storms.
Spatial stratified heterogeneity (SSH) denotes the coexistence of within-strata homogeneity and between-strata heterogeneity. Information consistency-based methods provide a rigorous approach to quantify SSH and evaluate its role in spatial processes, grounded in principles of geographical stratification and information theory (Bai, H. et al. (2023) <doi:10.1080/24694452.2023.2223700>; Wang, J. et al. (2024) <doi:10.1080/24694452.2023.2289982>).
This package provides a utility for working with women's basketball data. A scraping and aggregating interface for the WNBA Stats API <https://stats.wnba.com/> and ESPN's <https://www.espn.com> women's college basketball and WNBA statistics. It provides users with the capability to access the game play-by-plays, box scores, standings and results to analyze the data for themselves.
The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself. Note that RGtk2 and cairoDevice have been archived on CRAN. See <https://rattle.togaware.com> for installation instructions.
This package lets you carry out network-based gene set analysis by incorporating external information about interactions among genes, as well as novel interactions learned from data. It implements methods described in Shojaie A, Michailidis G (2010) <doi:10.1093/biomet/asq038>, Shojaie A, Michailidis G (2009) <doi:10.1089/cmb.2008.0081>, and Ma J, Shojaie A, Michailidis G (2016) <doi:10.1093/bioinformatics/btw410>.