Most of the time floating point arithmetic does approximately the right thing. When adding sums or having products of numbers that greatly differ in magnitude, the floating point arithmetic may be incorrect. This package implements the Kahan (1965) sum <doi:10.1145/363707.363723>, Neumaier (1974) sum <doi:10.1002/zamm.19740540106>, pairwise-sum (adapted from NumPy', See Castaldo (2008) <doi:10.1137/070679946> for a discussion of accuracy), and arbitrary precision sum (adapted from the fsum in Python ; Shewchuk (1997) <https://people.eecs.berkeley.edu/~jrs/papers/robustr.pdf>). In addition, products are changed to long double precision for accuracy, or changed into a log-sum for accuracy.
We included functions to assess the performance of risk models. The package contains functions for the various measures that are used in empirical studies, including univariate and multivariate odds ratios (OR) of the predictors, the c-statistic (or area under the receiver operating characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test, reclassification table, net reclassification improvement (NRI) and integrated discrimination improvement (IDI). Also included are functions to create plots, such as risk distributions, ROC curves, calibration plot, discrimination box plot and predictiveness curves. In addition to functions to assess the performance of risk models, the package includes functions to obtain weighted and unweighted risk scores as well as predicted risks using logistic regression analysis. These logistic regression functions are specifically written for models that include genetic variables, but they can also be applied to models that are based on non-genetic risk factors only. Finally, the package includes function to construct a simulated dataset with genotypes, genetic risks, and disease status for a hypothetical population, which is used for the evaluation of genetic risk models.
Early generation breeding trials are to be conducted in multiple environments where it may not be possible to replicate all the lines in each environment due to scarcity of resources. For such situations, partially replicated (p-Rep) designs have wide application potential as only a proportion of the test lines are replicated at each environment. A collection of several utility functions related to p-Rep designs have been developed. Here, the package contains six functions for a complete stepwise analytical study of these designs. Five functions pRep1(), pRep2(), pRep3(), pRep4() and pRep5(), are used to generate five new series of p-Rep designs and also compute average variance factors and canonical efficiency factors of generated designs. A fourth function NCEV() is used to generate incidence matrix (N), information matrix (C), canonical efficiency factor (E) and average variance factor (V). This function is general in nature and can be used for studying the characterization properties of any block design. A construction procedure for p-Rep designs was given by Williams et al.(2011) <doi:10.1002/bimj.201000102> which was tedious and time consuming. Here, in this package, five different methods have been given to generate p-Rep designs easily.
This package implements fast, safe, and customizable assertions routines, which can be used in place of base::stopifnot().
This package provides simple and crisp publication-quality graphics for the ExPosition family of packages. See An ExPosition of the Singular Value Decomposition in R (Beaton et al 2014) <doi:10.1016/j.csda.2013.11.006>.
Providing functions to diagnose and make inferences from various linear models, such as those obtained from aov', lm', glm', gls', lme', lmer', glmmTMB and semireg'. Inferences include predicted means and standard errors, contrasts, multiple comparisons, permutation tests, adjusted R-square and graphs.
Based on different statistical definitions of discrimination, several methods have been proposed to detect and mitigate social inequality in machine learning models. This package aims to provide an alternative to fairness treatment in predictive models. The ROC method implemented in this package is described by Kamiran, Karim and Zhang (2012) <https://ieeexplore.ieee.org/document/6413831/>.
Preprocess numeric data matrices into desired transformed representations. Standardization, Unitization, Cubitization and adaptive intervals are offered.
Multi-state models are essential tools in longitudinal data analysis. One primary goal of these models is the estimation of transition probabilities, a critical metric for predicting clinical prognosis across various stages of diseases or medical conditions. Traditionally, inference in multi-state models relies on the Aalen-Johansen (AJ) estimator which is consistent under the Markov assumption. However, in many practical applications, the Markovian nature of the process is often not guaranteed, limiting the applicability of the AJ estimator in more complex scenarios. This package extends the landmark Aalen-Johansen estimator (Putter, H, Spitoni, C (2018) <doi:10.1177/0962280216674497>) incorporating presmoothing techniques described by Soutinho, Meira-Machado and Oliveira (2020) <doi:10.1080/03610918.2020.1762895>, offering a robust alternative for estimating transition probabilities in non-Markovian multi-state models with multiple states and potential reversible transitions.
An experimentdata package to supplement the preciseTAD package containing pre-trained models and the variable importances of each genomic annotation used to build the model parsed into list objects and available in ExperimentHub. In total, preciseTADhub provides access to n=84 random forest classification models optimized to predict TAD/chromatin loop boundary regions and stored as .RDS files. The value, n, comes from the fact that we considered l=2 cell lines GM12878, K562, g=2 ground truth boundaries Arrowhead, Peakachu, and c=21 autosomal chromosomes CHR1, CHR2, ..., CHR22 (omitting CHR9). Furthermore, each object is itself a two-item list containing: (1) the model object, and (2) the variable importances for CTCF, RAD21, SMC3, and ZNF143 used to predict boundary regions. Each model is trained via a "holdout" strategy, in which data from chromosomes CHR1, CHR2, ..., CHRi-1, CHRi+1, ..., CHR22 were used to build the model and the ith chromosome was reserved for testing. See https://doi.org/10.1101/2020.09.03.282186 for more detail on the model building strategy.
This package provides a library of core pre-processing and normalization routines.
Sample data for PREDA package. (annotations objects synchronized with GeneAnnot custom CDFs version 2.2.0).
Consider a linear predictive regression setting with a potentially large set of candidate predictors. This work is concerned with detecting the presence of out of sample predictability based on out of sample mean squared error comparisons given in Gonzalo and Pitarakis (2023) <doi:10.1016/j.ijforecast.2023.10.005>.
An implementation of reliability estimation methods described in the paper (Bosnic, Z., & Kononenko, I. (2008) <doi:10.1007/s10489-007-0084-9>), which allows you to test the reliability of a single predicted instance made by your model and prediction function. It also allows you to make a correlation test to estimate which reliability estimate is the most accurate for your model.
This package provides a set of functions useful when evaluating the results of presence-absence models. Package includes functions for calculating threshold dependent measures such as confusion matrices, pcc, sensitivity, specificity, and Kappa, and produces plots of each measure as the threshold is varied. It will calculate optimal threshold choice according to a choice of optimization criteria. It also includes functions to plot the threshold independent ROC curves along with the associated AUC (area under the curve).
This package provides a selection of tools that make it easier to place elements onto a (base R) plot exactly where you want them. It allows users to identify points and distances on a plot in terms of inches, pixels, margin lines, data units, and proportions of the plotting space, all in a manner more simple than manipulating par().
This package provides a common problem faced by journal reviewers and authors is the question of whether the results of a replication study are consistent with the original published study. One solution to this problem is to examine the effect size from the original study and generate the range of effect sizes that could reasonably be obtained (due to random sampling) in a replication attempt (i.e., calculate a prediction interval). This package has functions that calculate the prediction interval for the correlation (i.e., r), standardized mean difference (i.e., d-value), and mean.
Dynamize headers or R code within Rmd files to prevent proliferation of Rmd files for similar reports. Add in external HTML document within rmarkdown rendered HTML doc.
In this record linkage package, data preprocessing has been meticulously executed to cover a wide range of datasets, ensuring that variable names are standardized using synonyms. This approach facilitates seamless data integration and analysis across various datasets. While users have the flexibility to modify variable names, the system intelligently ensures that changes are only permitted when they do not compromise data consistency or essential variable essence.