Create an interactive pizza chart visualizing a specific player's statistics across various attributes in a sports dataset. The chart is constructed based on input parameters: data', a dataframe containing player data for any sports; player_stats_col', a vector specifying the names of the columns from the dataframe that will be used to create slices in the pizza chart, with statistics ranging between 0 and 100; name_col', specifying the name of the column in the dataframe that contains the player names; and player_name', representing the specific player whose statistics will be visualized in the chart, serving as the chart title.
The MSstatsLOBD
package allows calculation and visualization of limit of blac (LOB) and limit of detection (LOD). We define the LOB as the highest apparent concentration of a peptide expected when replicates of a blank sample containing no peptides are measured. The LOD is defined as the measured concentration value for which the probability of falsely claiming the absence of a peptide in the sample is 0.05, given a probability 0.05 of falsely claiming its presence. These functionalities were previously a part of the MSstats package. The methodology is described in Galitzine (2018) <doi:10.1074/mcp.RA117.000322>.
Fit and simulate bivariate correlated frailty models with proportional hazard structure. Frailty distributions, such as gamma and lognormal models are supported for semiparametric procedures. Frailty variances of the two subjects can be varied or equal. Details on the models are available in book of Wienke (2011,ISBN:978-1-4200-7388-1). Bivariate gamma fit is obtained using the approach given in Iachine (1995) with modifications. Lognormal fit is based on the approach by Ripatti and Palmgren (2000) <doi:10.1111/j.0006-341X.2000.01016.x>. Frailty distributions, such as gamma, inverse gaussian and power variance frailty models are supported for parametric approach.
This package provides functions representing some useful empirical and data-driven models of heat loss, corrosion diagnostics, reliability and predictive maintenance of pipeline systems. The package is an option for technical engineering departments of heat generating and heat transfer companies that use or plan to use regulatory calculations in their activities. Methods are described in Timashev et al. (2016) <doi:10.1007/978-3-319-25307-7>, A.C.Reddy (2017) <doi:10.1016/j.matpr.2017.07.081>, Minenergo (2008) <https://docs.cntd.ru/document/902148459>, Minenergo (2005) <https://docs.cntd.ru/document/1200035568>, Xing LU. (2014) <doi:10.1080/23744731.2016.1258371>.
Samples large data such that spectral clustering is possible while preserving density information in edge weights. More specifically, given a matrix of coordinates as input, SamSPECTRAL
first builds the communities to sample the data points. Then, it builds a graph and after weighting the edges by conductance computation, the graph is passed to a classic spectral clustering algorithm to find the spectral clusters. The last stage of SamSPECTRAL
is to combine the spectral clusters. The resulting "connected components" estimate biological cell populations in the data. See the vignette for more details on how to use this package, some illustrations, and simple examples.
Generates a list, with a size defined by the user, containing the main scientific references and the frequency distribution of authors and journals in the list obtained. The database is a dataframe with academic production metadata made available by bibliographic collections such as Scopus, Web of Science, etc. The temporal evolution of scientific production on a given topic is presented and ordered lists of articles are constructed by number of citations and of authors and journals by level of productivity. Massimo Aria, Corrado Cuccurullo. (2017) <doi:10.1016/j.joi.2017.08.007>. Caibo Zhou, Wenyan Song. (2021) <doi:10.1016/j.jclepro.2021.126943>.
Randomly reassigns the group identifications to one of the variables of the database, say Treatment, and randomly reassigns the observation numbers of the dataset. Reorders the observations according to these new numbers. Centers each group of Treatment at the grand mean in order to further mask the treatment. An unmasking function is provided so that the user can identify the potential outliers in terms of their original values when blinding is no longer needed. It is suggested that a forward search procedure be performed on the masked data. Details of some forward search functions may be found in <https://CRAN.R-project.org/package=forsearch>.
This k-means algorithm is able to cluster data with missing values and as a by-product completes the data set. The implementation can deal with missing values in multiple variables and is computationally efficient since it iteratively uses the current cluster assignment to define a plausible distribution for missing value imputation. Weights are used to shrink early random draws for missing values (i.e., draws based on the cluster assignments after few iterations) towards the global mean of each feature. This shrinkage slowly fades out after a fixed number of iterations to reflect the increasing credibility of cluster assignments. See the vignette for details.
This package provides several functions to simplify using the glmnet package: converting data frames into matrices ready for glmnet'; b) imputing missing variables multiple times; c) fitting and applying prediction models straightforwardly; d) assigning observations to folds in a balanced way; e) cross-validate the models; f) selecting the most representative model across imputations and folds; and g) getting the relevance of the model regressors; as described in several publications: Solanes et al. (2022) <doi:10.1038/s41537-022-00309-w>, Palau et al. (2023) <doi:10.1016/j.rpsm.2023.01.001>, Sobregrau et al. (2024) <doi:10.1016/j.jpsychores.2024.111656>.
The landmark approach allows survival predictions to be updated dynamically as new measurements from an individual are recorded. The idea is to set predefined time points, known as "landmark times", and form a model at each landmark time using only the individuals in the risk set. This package allows the longitudinal data to be modelled either using the last observation carried forward or linear mixed effects modelling. There is also the option to model competing risks, either through cause-specific Cox regression or Fine-Gray regression. To find out more about the methods in this package, please see <https://isobelbarrott.github.io/Landmarking/articles/Landmarking>.
This package provides a flexible and easy-to use interface for the soil vegetation atmosphere transport (SVAT) model LWF-BROOK90, written in Fortran. The model simulates daily transpiration, interception, soil and snow evaporation, streamflow and soil water fluxes through a soil profile covered with vegetation, as described in Hammel & Kennel (2001, ISBN:978-3-933506-16-0) and Federer et al. (2003) <doi:10.1175/1525-7541(2003)004%3C1276:SOAETS%3E2.0.CO;2>. A set of high-level functions for model set up, execution and parallelization provides easy access to plot-level SVAT simulations, as well as multi-run and large-scale applications.
Programs to find the sample size or power of studies using the Sequential Parallel Comparison Design (SPCD) and programs to analyze such studies. This is a clinical trial design where patients initially on placebo who did not respond are re-randomized between placebo and active drug in a second phase and the results of the two phases are pooled. The method of analyzing binary data with this design is described in Fava,Evins, Dorer and Schoenfeld(2003) <doi:10.1159/000069738>, and the method of analyzing continuous data is described in Chen, Yang, Hung and Wang (2011) <doi:10.1016/j.cct.2011.04.006>.
This package provides tools to assess the association between two spatial processes. Currently, several methodologies are implemented: A modified t-test to perform hypothesis testing about the independence between the processes, a suitable nonparametric correlation coefficient, the codispersion coefficient, and an F test for assessing the multiple correlation between one spatial process and several others. Functions for image processing and computing the spatial association between images are also provided. Functions contained in the package are intended to accompany Vallejos, R., Osorio, F., Bevilacqua, M. (2020). Spatial Relationships Between Two Georeferenced Variables: With Applications in R. Springer, Cham <doi:10.1007/978-3-030-56681-4>.
This package provides a conditional independence test that can be applied both to univariate and multivariate random variables. The test is based on a weighted form of the sample covariance of the residuals after a nonlinear regression on the conditioning variables. Details are described in Scheidegger, Hoerrmann and Buehlmann (2021) "The Weighted Generalised Covariance Measure" <arXiv:2111.04361>
. The test is a generalisation of the Generalised Covariance Measure (GCM) implemented in the R package GeneralisedCovarianceMeasure
by Jonas Peters and Rajen D. Shah based on Shah and Peters (2020) "The Hardness of Conditional Independence Testing and the Generalised Covariance Measure" <arXiv:1804.07203>
.
The main aim is to further facilitate the creation of exercises based on the package exams by Grün, B., and Zeileis, A. (2009) <doi:10.18637/jss.v029.i10>. Creating effective student exercises involves challenges such as creating appropriate data sets and ensuring access to intermediate values for accurate explanation of solutions. The functionality includes the generation of univariate and bivariate data including simple time series, functions for theoretical distributions and their approximation, statistical and mathematical calculations for tasks in basic statistics courses as well as general tasks such as string manipulation, LaTeX/HTML
formatting and the editing of XML task files for Moodle'.
Function and support for medication and dosing information extraction from free-text clinical notes. Medication entities for the basic medExtractR
implementation that can be extracted include drug name, strength, dose amount, dose, frequency, intake time, dose change, and time of last dose. The basic medExtractR
is outlined in Weeks, Beck, McNeer
, Williams, Bejan, Denny, Choi (2020) <doi: 10.1093/jamia/ocz207>. The extended medExtractR_tapering
implementation is intended to extract dosing information for more tapering schedules, which are far more complex. The tapering extension allows for the extraction of additional entities including dispense amount, refills, dose schedule, time keyword, transition, and preposition.
Get z-scores, percentiles, absolute values, and percent of predicted of a reference cohort. Functionality requires installing the data packages adiposerefdata and musclerefdata'. For more information on the underlying research, please visit our website which also includes a graphical interface. The models and underlying data are described in Marquardt JP et al.(planned publication 2025; reserved doi 10.1097/RLI.0000000000001104), "Subcutaneous and Visceral adipose tissue Reference Values from Framingham Heart Study Thoracic and Abdominal CT", *Investigative Radiology* and Tonnesen PE et al. (2023), "Muscle Reference Values from Thoracic and Abdominal CT for Sarcopenia Assessment [column] The Framingham Heart Study", *Investigative Radiology*, <doi:10.1097/RLI.0000000000001012>.
Efficiently implements the Graphical Lasso algorithm, utilizing the Armadillo C++ library for rapid computation. This algorithm introduces an L1 penalty to derive sparse inverse covariance matrices from observations of multivariate normal distributions. Features include the generation of random and structured sparse covariance matrices, beneficial for simulations, statistical method testing, and educational purposes in graphical modeling. A unique function for regularization parameter selection based on predefined sparsity levels is also offered, catering to users with specific sparsity requirements in their models. The methodology for sparse inverse covariance estimation implemented in this package is based on the work of Friedman, Hastie, and Tibshirani (2008) <doi:10.1093/biostatistics/kxm045>.
Automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. The main function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. modelStudio
facilitates the process of Interactive Explanatory Model Analysis introduced in Baniecki et al. (2023) <doi:10.1007/s10618-023-00924-w>.
This package provides a comprehensive framework for batch effect diagnostics, harmonization, and post-harmonization downstream analysis. Features include interactive visualization tools, robust statistical tests, and a range of harmonization techniques. Additionally, ComBatFamQC
enables the creation of life-span age trend plots with estimated age-adjusted centiles and facilitates the generation of covariate-corrected residuals for analytical purposes. Methods for harmonization are based on approaches described in Johnson et al., (2007) <doi:10.1093/biostatistics/kxj037>, Beer et al., (2020) <doi:10.1016/j.neuroimage.2020.117129>, Pomponio et al., (2020) <doi:10.1016/j.neuroimage.2019.116450>, and Chen et al., (2021) <doi:10.1002/hbm.25688>.
Calculates the probabilities of k successes given n trials of a binomial random variable with non-negative correlation across trials. The function takes as inputs the scalar values the level of correlation or association between trials, the success probability, the number of trials, an optional input specifying the number of bits of precision used in the calculation, and an optional input specifying whether the calculation approach to be used is from Witt (2014) <doi:10.1080/03610926.2012.725148> or from Kuk (2004) <doi:10.1046/j.1467-9876.2003.05369.x>. The output is a (trials+1)-dimensional vector containing the likelihoods of 0, 1, ..., trials successes.
The stepwise variable selection procedure (with iterations between the forward and backward steps) can be used to obtain the best candidate final regression model in regression analysis. All the relevant covariates are put on the variable list to be selected. The significance levels for entry (SLE) and for stay (SLS) are usually set to 0.15 (or larger) for being conservative. Then, with the aid of substantive knowledge, the best candidate final regression model is identified manually by dropping the covariates with p value > 0.05 one at a time until all regression coefficients are significantly different from 0 at the chosen alpha level of 0.05.
The SparseArray
package is an infrastructure package that provides an array-like container for efficient in-memory representation of multidimensional sparse data in R. The package defines the SparseArray
virtual class and two concrete subclasses: COO_SparseArray
and SVT_SparseArray
. Each subclass uses its own internal representation of the nonzero multidimensional data, the "COO layout" and the "SVT layout", respectively. SVT_SparseArray
objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they support most of the "standard matrix and array API" defined in base R and in the matrixStats
package from CRAN.
Implementations of the multiple testing procedures for discrete tests described in the paper Döhler, Durand and Roquain (2018) "New FDR bounds for discrete and heterogeneous tests" <doi:10.1214/18-EJS1441>. The main procedures of the paper (HSU and HSD), their adaptive counterparts (AHSU and AHSD), and the HBR variant are available and are coded to take as input the results of a test procedure from package DiscreteTests
', or a set of observed p-values and their discrete support under their nulls. A shortcut function to obtain such p-values and supports is also provided, along with a wrapper allowing to apply discrete procedures directly to data.