This package provides wrapper of various machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the interpretable machine learning, there are more and more new ideas for explaining black-box models, that are implemented in R'. DALEXtra creates DALEX Biecek (2018) <arXiv:1806.08915>
explainer for many type of models including those created using python scikit-learn and keras libraries, and java h2o library. Important part of the package is Champion-Challenger analysis and innovative approach to model performance across subsets of test data presented in Funnel Plot.
This package provides functions to compute coefficients measuring the dependence of two or more than two variables. The functions can be deployed to gain information about functional dependencies of the variables with emphasis on monotone functions. The statistics describe how well one response variable can be approximated by a monotone function of other variables. In regression analysis the variable selection is an important issue. In this framework the functions could be useful tools in modeling the regression function. Detailed explanations on the subject can be found in papers Liebscher (2014) <doi:10.2478/demo-2014-0004>; Liebscher (2017) <doi:10.1515/demo-2017-0012>; Liebscher (2019, submitted).
This package provides a comprehensive visualization toolkit built with coders of all skill levels and color-vision impaired audiences in mind. It allows creation of finely-tuned, publication-quality figures from single function calls. Visualizations include scatter plots, compositional bar plots, violin, box, and ridge plots, and more. Customization ranges from size and title adjustments to discrete-group circling and labeling, hidden data overlay upon cursor hovering via ggplotly()
conversion, and many more, all with simple, discrete inputs. Color blindness friendliness is powered by legend adjustments (enlarged keys), and by allowing the use of shapes or letter-overlay in addition to the carefully selected dittoColors()
.
Implementation of Das Gupta's standardisation and decomposition of population rates, as set out "Standardization and decomposition of rates: A userâ s manual", Das Gupta (1993) <https://www2.census.gov/library/publications/1993/demographics/p23-186.pdf>. The goal of these methods is to calculate adjusted rates based on compositional factors and quantify the contribution of each factor to the difference in crude rates between populations. The package offers functionality to handle various scenarios for any number of factors and populations, where said factors can be comprised of vectors across sub-populations (including cross-classified population breakdowns), and with the option to specify user-defined rate functions.
This package provides a toolbox for estimating vector fields from intensive longitudinal data, and construct potential landscapes thereafter. The vector fields can be estimated with two nonparametric methods: the Multivariate Vector Field Kernel Estimator (MVKE) by Bandi & Moloche (2018) <doi:10.1017/S0266466617000305> and the Sparse Vector Field Consensus (SparseVFC
) algorithm by Ma et al. (2013) <doi:10.1016/j.patcog.2013.05.017>. The potential landscapes can be constructed with a simulation-based approach with the simlandr package (Cui et al., 2021) <doi:10.31234/osf.io/pzva3>, or the Bhattacharya et al. (2011) method for path integration <doi:10.1186/1752-0509-5-85>.
We consider studies in which information from error-prone diagnostic tests or self-reports are gathered sequentially to determine the occurrence of a silent event. Using a likelihood-based approach incorporating the proportional hazards assumption, we provide functions to estimate the survival distribution and covariate effects. We also provide functions for power and sample size calculations for this setting. Please refer to Xiangdong Gu, Yunsheng Ma, and Raji Balasubramanian (2015) <doi: 10.1214/15-AOAS810>, Xiangdong Gu and Raji Balasubramanian (2016) <doi: 10.1002/sim.6962>, Xiangdong Gu, Mahlet G Tadesse, Andrea S Foulkes, Yunsheng Ma, and Raji Balasubramanian (2020) <doi: 10.1186/s12911-020-01223-w>.
Allows biomechanical pressure data from a range of systems to be imported and processed in a reproducible manner. Automatic and manual tools are included to let the user define regions (masks) to be analyzed. Also includes functions for visualizing and animating pressure data. Example methods are described in Shi et al., (2022) <doi:10.1038/s41598-022-19814-0>, Lee et al., (2014) <doi:10.1186/1757-1146-7-18>, van der Zward et al., (2014) <doi:10.1186/1757-1146-7-20>, Najafi et al., (2010) <doi:10.1016/j.gaitpost.2009.09.003>, Cavanagh and Rodgers (1987) <doi:10.1016/0021-9290(87)90255-7>.
Generate common data forms for complex data suitable for conversions and transmission by decomposition as paths or primitives. Paths are sequentially-linked records, primitives are basic atomic elements and both can model many forms and be grouped into hierarchical structures. The universal models SC0 (structural) and SC (labelled, relational) are composed of edges and can represent any hierarchical form. Specialist models PATH', ARC and TRI provide the most common intermediate forms used for converting from one form to another. The methods are inspired by the simplicial complex <https://en.wikipedia.org/wiki/Simplicial_complex> and provide intermediate forms that relate spatial data structures to this mathematical construct.
The epistack package main objective is the visualizations of stacks of genomic tracks (such as, but not restricted to, ChIP-seq
, ATAC-seq, DNA methyation or genomic conservation data) centered at genomic regions of interest. epistack needs three different inputs: 1) a genomic score objects, such as ChIP-seq
coverage or DNA methylation values, provided as a `GRanges` (easily obtained from `bigwig` or `bam` files). 2) a list of feature of interest, such as peaks or transcription start sites, provided as a `GRanges` (easily obtained from `gtf` or `bed` files). 3) a score to sort the features, such as peak height or gene expression value.
This package provides a lightweight, dependency-free toolbox for pre-processing XY data from experimental methods (i.e. any signal that can be measured along a continuous variable). This package provides methods for baseline estimation and correction, smoothing, normalization, integration and peaks detection. Baseline correction methods includes polynomial fitting as described in Lieber and Mahadevan-Jansen (2003) <doi:10.1366/000370203322554518>, Rolling Ball algorithm after Kneen and Annegarn (1996) <doi:10.1016/0168-583X(95)00908-6>, SNIP algorithm after Ryan et al. (1988) <doi:10.1016/0168-583X(88)90063-8>, 4S Peak Filling after Liland (2015) <doi:10.1016/j.mex.2015.02.009> and more.
Automates delta log-normal boosted regression tree abundance prediction. Loops through parameters provided (LR (learning rate), TC (tree complexity), BF (bag fraction)), chooses best, simplifies, & generates line, dot & bar plots, & outputs these & predictions & a report, makes predicted abundance maps, and Unrepresentativeness surfaces. Package core built around gbm (gradient boosting machine) functions in dismo (Hijmans, Phillips, Leathwick & Jane Elith, 2020 & ongoing), itself built around gbm (Greenwell, Boehmke, Cunningham & Metcalfe, 2020 & ongoing, originally by Ridgeway). Indebted to Elith/Leathwick/Hastie 2008 Working Guide <doi:10.1111/j.1365-2656.2008.01390.x>; workflow follows Appendix S3. See <https://www.simondedman.com/> for published guides and papers using this package.
This package provides a shiny application, which allows you to perform single- and multi-omics analyses using your own omics datasets. After the upload of the omics datasets and a metadata file, single-omics is performed for feature selection and dataset reduction. These datasets are used for pairwise- and multi-omics analyses, where automatic tuning is done to identify correlations between the datasets - the end goal of the recommended Holomics workflow. Methods used in the package were implemented in the package mixomics by Florian Rohart,Benoît Gautier,Amrit Singh,Kim-Anh Lê Cao (2017) <doi:10.1371/journal.pcbi.1005752> and are described there in further detail.
An implementation of Ichimoku Kinko Hyo', also commonly known as cloud charts'. Static and interactive visualizations with tools for creating, backtesting and development of quantitative ichimoku strategies. As described in Sasaki (1996, ISBN:4925152009), the technique is a refinement on candlestick charting, originating from Japan and now in widespread use in technical analysis worldwide. Translating as one-glance equilibrium chart', it allows the price action and market structure of financial securities to be determined at-a-glance'. Incorporates an interface with the OANDA fxTrade
API <https://developer.oanda.com/> for retrieving historical and live streaming price data for major currencies, metals, commodities, government bonds and stock indices.
This package provides a simulation modeling framework which significantly extends capabilities from the MGDrivE
simulation package via a new mathematical and computational framework based on stochastic Petri nets. For more information about MGDrivE
', see our publication: Sánchez et al. (2019) <doi:10.1111/2041-210X.13318> Some of the notable capabilities of MGDrivE2
include: incorporation of human populations, epidemiological dynamics, time-varying parameters, and a continuous-time simulation framework with various sampling algorithms for both deterministic and stochastic interpretations. MGDrivE2
relies on the genetic inheritance structures provided in package MGDrivE
', so we suggest installing that package initially.
This package provides a collection of functions for converting and visualization the free induction decay of mono dimensional nuclear magnetic resonance (NMR) spectra into an audio file. It facilitates the conversion of Bruker datasets in files WAV. The sound of NMR signals could provide an alternative to the current representation of the individual metabolic fingerprint and supply equally significant information. The package includes also NMR spectra of the urine samples provided by four healthy donors. Based on Cacciatore S, Saccenti E, Piccioli M. Hypothesis: the sound of the individual metabolic phenotype? Acoustic detection of NMR experiments. OMICS. 2015;19(3):147-56. <doi:10.1089/omi.2014.0131>.
Generates Weibull-parameterized estimates of phenology for any percentile of a distribution using the framework established in Cooke (1979) <doi:10.1093/biomet/66.2.367>. Extensive testing against other estimators suggest the weib_percentile()
function is especially useful in generating more accurate and less biased estimates of onset and offset (Belitz et al. 2020 <doi.org:10.1111/2041-210X.13448>. Non-parametric bootstrapping can be used to generate confidence intervals around those estimates, although this is computationally expensive. Additionally, this package offers an easy way to perform non-parametric bootstrapping to generate confidence intervals for quantile estimates, mean estimates, or any statistical function of interest.
This package provides a collection of functions for data manipulation, plotting and statistical computing, to use separately or with the book "Visual Statistics. Use R!": Shipunov (2020) <http://ashipunov.info/shipunov/software/r/r-en.htm>. Dr Alexey Shipunov died in December 2022. Most useful functions: Bclust()
, Jclust()
and BootA()
which bootstrap hierarchical clustering; Recode()
which does multiple recoding in a fast, simple and flexible way; Misclass()
which outputs confusion matrix even if classes are not concerted; Overlap()
which measures group separation on any projection; Biarrows()
which converts any scatterplot into biplot; and Pleiad()
which is fast and flexible correlogram.
Time series prediction is a critical task in data analysis, requiring not only the selection of appropriate models, but also suitable data preprocessing and tuning strategies. TSPredIT
(Time Series Prediction with Integrated Tuning) is a framework that provides a seamless integration of data preprocessing, decomposition, model training, hyperparameter optimization, and evaluation. Unlike other frameworks, TSPredIT
emphasizes the co-optimization of both preprocessing and modeling steps, improving predictive performance. It supports a variety of statistical and machine learning models, filtering techniques, outlier detection, data augmentation, and ensemble strategies. More information is available in Salles et al. <doi:10.1007/978-3-662-68014-8_2>.
Bayesian power/type I error calculation and model fitting using the power prior and the normalized power prior for generalized linear models. Detailed examples of applying the package are available at <doi:10.32614/RJ-2023-016>. Models for time-to-event outcomes are implemented in the R package BayesPPDSurv
'. The Bayesian clinical trial design methodology is described in Chen et al. (2011) <doi:10.1111/j.1541-0420.2011.01561.x>, and Psioda and Ibrahim (2019) <doi:10.1093/biostatistics/kxy009>. The normalized power prior is described in Duan et al. (2006) <doi:10.1002/env.752> and Ibrahim et al. (2015) <doi:10.1002/sim.6728>.
This package provides a set of functions for counterfactual decomposition (cfdecomp). The functions available in this package decompose differences in an outcome attributable to a mediating variable (or sets of mediating variables) between groups based on counterfactual (causal inference) theory. By using Monte Carlo (MC) integration (simulations based on empirical estimates from multivariable models) we provide added flexibility compared to existing (analytical) approaches, at the cost of computational power or time. The added flexibility means that we can decompose difference between groups in any outcome or and with any mediator (any variable type and distribution). See Sudharsanan & Bijlsma (2019) <doi:10.4054/MPIDR-WP-2019-004> for more information.
This package performs the identification of differential risk hotspots (Briz-Redon et al. 2019) <doi:10.1016/j.aap.2019.105278> along a linear network. Given a marked point pattern lying on the linear network, the method implemented uses a network-constrained version of kernel density estimation (McSwiggan
et al. 2017) <doi:10.1111/sjos.12255> to approximate the probability of occurrence across space for the type of event specified by the user through the marks of the pattern (Kelsall and Diggle 1995) <doi:10.2307/3318678>. The goal is to detect microzones of the linear network where the type of event indicated by the user is overrepresented.
Processes the raw data from closed loop flux chamber (or tent) setups into ecosystem gas fluxes usable for analysis. It goes from a data frame of gas concentration over time (which can contain several measurements) and a meta data file indicating which measurement was done when, to a data frame of ecosystem gas fluxes including quality diagnostics. Functions provided include different models (exponential as described in Zhao et al (2018) <doi:10.1016/j.agrformet.2018.08.022>, quadratic and linear) to estimate the fluxes from the raw data, quality assessment, plotting for visual check and calculation of fluxes based on the setup specific parameters (chamber size, plot area, ...).
With this package, it is possible to compute nonparametric simultaneous confidence intervals for relative contrast effects in the unbalanced one way layout. Moreover, it computes simultaneous p-values. The simultaneous confidence intervals can be computed using multivariate normal distribution, multivariate t-distribution with a Satterthwaite Approximation of the degree of freedom or using multivariate range preserving transformations with Logit or Probit as transformation function. 2 sample comparisons can be performed with the same methods described above. There is no assumption on the underlying distribution function, only that the data have to be at least ordinal numbers. See Konietschke et al. (2015) <doi:10.18637/jss.v064.i09> for details.
Implementation of the SIC epsilon-telescope method, either using single or distributional (multiparameter) regression. Includes classical regression with normally distributed errors and robust regression, where the errors are from the Laplace distribution. The "smooth generalized normal distribution" is used, where the estimation of an additional shape parameter allows the user to move smoothly between both types of regression. See O'Neill and Burke (2022) "Robust Distributional Regression with Automatic Variable Selection" for more details. <arXiv:2212.07317>
. This package also contains the data analyses from O'Neill and Burke (2023). "Variable selection using a smooth information criterion for distributional regression models". <doi:10.1007/s11222-023-10204-8>.