An implementation of popular evaluation metrics that are commonly used in survival prediction including Concordance Index, Brier Score, Integrated Brier Score, Integrated Square Error, Integrated Absolute Error and Mean Absolute Error. For a detailed information, see (Ishwaran H, Kogalur UB, Blackstone EH and Lauer MS (2008) <doi:10.1214/08-AOAS169>) , (Moradian H, Larocque D and Bellavance F (2017) <doi:10.1007/s10985-016-9372-1>), (Hanpu Zhou, Hong Wang, Sizheng Wang and Yi Zou (2023) <doi:10.32614/rj-2023-009>) for different evaluation metrics.
Balancing computational and statistical efficiency, subsampling techniques offer a practical solution for handling large-scale data analysis. Subsampling methods enhance statistical modeling for massive datasets by efficiently drawing representative subsamples from full dataset based on tailored sampling probabilities. These probabilities are optimized for specific goals, such as minimizing the variance of coefficient estimates or reducing prediction error. Based on specified modeling assumptions and subsampling techniques, the package provides functions to draw subsamples from the full data, fit the model on the subsamples, and perform statistical inference.
Additive copula regression for regression problems with binary outcome via gradient boosting [Brant, Hobæk Haff (2022); <arXiv:2208.04669>]. The fitting process includes a specialised model selection algorithm for each component, where each component is found (by greedy optimisation) among all the D-vines with only Gaussian pair-copulas of a fixed dimension, as specified by the user. When the variables and structure have been selected, the algorithm then re-fits the component where the pair-copula distributions can be different from Gaussian, if specified.
Computes diagnostics for linear regression when treatment effects are heterogeneous. The output of hettreatreg represents ordinary least squares (OLS) estimates of the effect of a binary treatment as a weighted average of the average treatment effect on the treated (ATT) and the average treatment effect on the untreated (ATU). The program estimates the OLS weights on these parameters, computes the associated model diagnostics, and reports the implicit OLS estimate of the average treatment effect (ATE). See Sloczynski (2019), <http://people.brandeis.edu/~tslocz/Sloczynski_paper_regression.pdf>.
This system allows one to model a multi-variate, multi-response problem with interaction effects. It combines the usual squared error loss for the multi-response problem with some penalty terms to encourage responses that correlate to form groups and also allow for modeling main and interaction effects that exit within the covariates. The optimization method employed is the Alternating Direction Method of Multipliers (ADMM). The implementation is based on the methodology presented on Quachie Asenso, T., & Zucknick, M. (2023) <doi:10.48550/arXiv.2303.11155>.
Utilities to retrieve and tidy U.S. macroeconomic data series from public government data providers. Functions streamline access to series from the Federal Reserve Bank of St. Louis Federal Reserve Economic Data (FRED), the Bureau of Labor Statistics flat files, and the Bureau of Economic Analysis National Income and Product Accounts tables, then return consistent, tidy data frames ready for modeling and graphics. The package includes helpers for date alignment, log-linear projections, and common macro diagnostics, along with convenience plot builders for quick publication-quality charts.
Fits yield curves using Nelson-Siegel (1987) <doi:10.1086/296409>, Svensson (1994) <doi:10.3386/w4871>, and cubic spline methods. Extracts forward rates, discount factors, and par rates from fitted curves. Computes duration and convexity risk measures. Computes Z-spread and key rate durations. Provides principal component decomposition following Litterman and Scheinkman (1991) <doi:10.3905/jfi.1991.692347>, carry and roll-down analysis, and slope measures. All methods are pure computation with no external dependencies beyond base R; works with yield data from any source.
This package provides functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class fingerprint. The bitwise logical functions in R are overridden so that they can be used directly with fingerprint objects. A number of distance metrics are also available. Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.
Analyses districted electoral systems of any magnitude by computing district-party conversion ratios and seats-to-votes deviations, decomposing the sources of deviation. Traditional indexes are also computed. References: Kedar, O., Harsgor, L. and Sheinerman, R.A. (2016). <doi:10.1111/ajps.12225>. Penades, A and Pavia, J.M. (2025) The decomposition of seats-to-votes distortion in elections: mean, variance, malapportionment and participation''. Acknowledgements: The authors wish to thank Consellerà a de Educación, Cultura, Universidades y Empleo, Generalitat Valenciana (grant CIACO/2023/031) for supporting this research.
Reference datasets commonly used in the geosciences. These include standard atomic weights of the elements, a periodic table, a list of minerals including their abbreviations and chemistry, geochemical data of reservoirs (primitive mantle, continental crust, mantle, basalts, etc.), decay constants and isotopic ratios frequently used in geochronology, color codes of the chronostratigraphic chart. In addition, the package provides functions for basic queries of atomic weights, the list of minerals, and chronostratigraphic chart colors. All datasets are fully referenced, and a BibTeX file containing the references is included.
Develop outstanding shiny apps for iOS and Android as well as beautiful shiny gadgets. shinyMobile is built on top of the latest Framework7 template <https://framework7.io>. Discover 14 new input widgets (sliders, vertical sliders, stepper, grouped action buttons, toggles, picker, smart select, ...), 2 themes (light and dark), 12 new widgets (expandable cards, badges, chips, timelines, gauges, progress bars, ...) combined with the power of server-side notifications such as alerts, modals, toasts, action sheets, sheets (and more) as well as 3 layouts (single, tabs and split).
This package provides tools for decomposing differences in rate metrics between two groups into contributions from individual subgroups and visualizing them as a "Theseus Plot". Inspired by the story of the Ship of Theseus, the method replaces subgroup data from one group with that of another step by step, recalculating the overall metric at each stage to quantify subgroup contributions. A Theseus Plot combines the stepwise progression of a waterfall plot with the comparative bars of a bar chart, offering an intuitive way to understand subgroup-level effects.
This package is a port of the new matplotlib color maps (viridis, magma, plasma and inferno) to R. matplotlib is a popular plotting library for Python. These color maps are designed in such a way that they will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. They are also designed to be perceived by readers with the most common form of color blindness. This is the lite version of the more complete viridis package.
The Internet Engineering Task Force (IETF) and the Internet Society (ISOC) publish various Internet-related protocols and specifications as "Request for Comments" (RFC) documents and Internet Standard (STD) documents. RFCs and STDs are published in a simple text form. This package provides an Emacs major mode, rfcview-mode, which makes it more pleasant to read these documents in Emacs. It prettifies the text and adds hyperlinks/menus for easier navigation. It also provides functions for browsing the index of RFC documents and fetching them from remote servers or local directories.
Subset of BAM files of human lung tumor and pooled normal samples by targeted panel sequencing. [Zhao et al 2014. Targeted Sequencing in Non-Small Cell Lung Cancer (NSCLC) Using the University of North Carolina (UNC) Sequencing Assay Captures Most Previously Described Genetic Aberrations in NSCLC. In preparation.] Each sample is a 10 percent random subsample drawn from the original sequencing data. The pooled normal sample has been rescaled accroding to the total number of normal samples in the "pool". Here provided is the subsampled data on chr6 (hg19).
This package provides a toolbox for programming Clinical Data Standards Interchange Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>). The package is an extension package of the admiral package for pediatric clinical trials.
Generalized additive model selection via approximate Bayesian inference is provided. Bayesian mixed model-based penalized splines with spike-and-slab-type coefficient prior distributions are used to facilitate fitting and selection. The approximate Bayesian inference engine options are: (1) Markov chain Monte Carlo and (2) mean field variational Bayes. Markov chain Monte Carlo has better Bayesian inferential accuracy, but requires a longer run-time. Mean field variational Bayes is faster, but less accurate. The methodology is described in He and Wand (2024) <doi:10.1007/s10182-023-00490-y>.
Supply functions for the creation and handling of missing data as well as tools to evaluate missing data methods. Nearly all possibilities of generating missing data discussed by Santos et al. (2019) <doi:10.1109/ACCESS.2019.2891360> and some additional are implemented. Functions are supplied to compare parameter estimates and imputed values to true values to evaluate missing data methods. Evaluations of these types are done, for example, by Cetin-Berber et al. (2019) <doi:10.1177/0013164418805532> and Kim et al. (2005) <doi:10.1093/bioinformatics/bth499>.
Comprehensive toolkit for Environmental Phillips Curve analysis featuring multidimensional instrumental variable creation, transfer entropy causal discovery, network analysis, and state-of-the-art econometric methods. Implements geographic, technological, migration, geopolitical, financial, and natural risk instruments with robust diagnostics and visualization. Provides 24 different instrumental variable approaches with empirical validation. Methods based on Phillips (1958) <doi:10.1111/j.1468-0335.1958.tb00003.x>, transfer entropy by Schreiber (2000) <doi:10.1103/PhysRevLett.85.461>, and weak instrument tests by Stock and Yogo (2005) <doi:10.1017/CBO9780511614491.006>.
An R interface to the MinIO Client. The MinIO Client ('mc') provides a modern alternative to UNIX commands like ls', cat', cp', mirror', diff', find etc. It supports filesystems and Amazon "S3" compatible cloud storage service ("AWS" Signature v2 and v4). This package provides convenience functions for installing the MinIO client and running any operations, as described in the official documentation, <https://min.io/docs/minio/linux/reference/minio-mc.html?ref=docs-redirect>. This package provides a flexible and high-performance alternative to aws.s3'.
Evaluate the predictive performance of an existing (i.e. previously developed) prediction/ prognostic model given relevant information about the existing prediction model (e.g. coefficients) and a new dataset. Provides a range of model updating methods that help tailor the existing model to the new dataset; see Su et al. (2018) <doi:10.1177/0962280215626466>. Techniques to aggregate multiple existing prediction models on the new data are also provided; see Debray et al. (2014) <doi:10.1002/sim.6080> and Martin et al. (2018) <doi:10.1002/sim.7586>).
This package provides methods for inference using stacked multiple imputations augmented with weights. The vignette provides example R code for implementation in general multiple imputation settings. For additional details about the estimation algorithm, we refer the reader to Beesley, Lauren J and Taylor, Jeremy M G (2020) â A stacked approach for chained equations multiple imputation incorporating the substantive modelâ <doi:10.1111/biom.13372>, and Beesley, Lauren J and Taylor, Jeremy M G (2021) â Accounting for not-at-random missingness through imputation stackingâ <arXiv:2101.07954>.
This is a tidy implementation for heatmap. At the moment it is based on the (great) package ComplexHeatmap'. The goal of this package is to interface a tidy data frame with this powerful tool. Some of the advantages are: Row and/or columns colour annotations are easy to integrate just specifying one parameter (column names). Custom grouping of rows is easy to specify providing a grouped tbl. For example: df %>% group_by(...). Labels size adjusted by row and column total number. Default use of Brewer and Viridis palettes.
The web version WebGestalt <https://www.webgestalt.org> supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists.