Machine Learning models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance, but such black-box models usually lack interpretability. The DALEX package contains various explainers that help to understand the link between input variables and model output.
This package provides tools for making the descriptive "Table 1" used in medical articles, a transition plot for showing changes between categories (also known as a Sankey diagram), flow charts by extending the grid package, a method for variable selection based on the SVD, Bezier lines with arrows complementing the ones in the grid package, and more.
This package provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L'Ecuyer's combined multiple-recursive generator. It enables to easily convert standard %dopar% loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.
Ggdag is built on top of dagitty, an R package that uses the DAGitty web tool for creating and analyzing DAGs. ggdag makes it easy to tidy and plot dagitty objects using ggplot2 and ggraph, as well as common analytic and graphical functions, such as determining adjustment sets and node relationships.
Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.
The R implementation of mCOPA package published by Wang et al. (2012). Oppar provides methods for Cancer Outlier profile Analysis. Although initially developed to detect outlier genes in cancer studies, methods presented in oppar can be used for outlier profile analysis in general. In addition, tools are provided for gene set enrichment and pathway analysis.
The package provides `rlang` data masks for the SummarizedExperiment class. The enables the evaluation of unquoted expression in different contexts of the SummarizedExperiment object with optional access to other contexts. The goal for `plyxp` is for evaluation to feel like a data.frame object without ever needing to unwind to a rectangular data.frame.
The method models RNA-seq reads using a mixture of 3 beta-binomial distributions to generate posterior probabilities for genotyping bi-allelic single nucleotide polymorphisms. Elena Vigorito, Anne Barton, Costantino Pitzalis, Myles J. Lewis and Chris Wallace (2023) <doi:10.1093/bioinformatics/btad393> "BBmix: a Bayesian beta-binomial mixture model for accurate genotyping from RNA-sequencing.".
Este pacote traduz os seguintes conjuntos de dados: airlines', airports', ames_raw', AwardsManagers', babynames', Batting', diamonds', faithful', fueleconomy', Fielding', flights', gapminder', gss_cat', iris', Managers', mpg', mtcars', atmos', penguins', People, Pitching', pixarfilms','planes', presidential', table1', table2', table3', table4a', table4b', table5', vehicles', weather', who'. English: It provides a Portuguese translated version of the datasets listed above.
This package provides functions to compute state-specific and marginal life expectancies. The computation is based on a fitted continuous-time multi-state model that includes an absorbing death state; see Van den Hout (2017, ISBN:9781466568402). The fitted multi-state model model should be estimated using the msm package using age as the time-scale.
Given a set of parameters describing model dynamics and a corresponding cost function, FAMoS performs a dynamic forward-backward model selection on a specified selection criterion. It also applies a non-local swap search method. Works on any cost function. For detailed information see Gabel et al. (2019) <doi:10.1371/journal.pcbi.1007230>.
This package provides optimized C++ code for computing the partial Receiver Operating Characteristic (ROC) test used in niche and species distribution modeling. The implementation follows Peterson et al. (2008) <doi:10.1016/j.ecolmodel.2007.11.008>. Parallelization via OpenMP was implemented with assistance from the DeepSeek Artificial Intelligence Assistant (<https://www.deepseek.com/>).
Analysis of Bayesian adaptive enrichment clinical trial using Free-Knot Bayesian Model Averaging (FK-BMA) method of Maleyeff et al. (2024) for Gaussian data. Maleyeff, L., Golchi, S., Moodie, E. E. M., & Hudson, M. (2024) "An adaptive enrichment design using Bayesian model averaging for selection and threshold-identification of predictive variables" <doi:10.1093/biomtc/ujae141>.
Wrapper for computing parameters for univariate distributions using MLE. It creates an object that stores d, p, q, r functions as well as parameters and statistics for diagnostics. Currently supports automated fitting from base and actuar packages. A manually fitting distribution fitting function is included to support directly specifying parameters for any distribution from ancillary packages.
Utilizing Generative Artificial Intelligence models like GPT-4 and Gemini Pro as coding and writing assistants for R users. Through these models, GenAI offers a variety of functions, encompassing text generation, code optimization, natural language processing, chat, and image interpretation. The goal is to aid R users in streamlining laborious coding and language processing tasks.
Apply an adaptation of the SuperFastHash algorithm to any R object. Hash whole R objects or, for vectors or lists, hash R objects to obtain a set of hash values that is stored in a structure equivalent to the input. See <http://www.azillionmonkeys.com/qed/hash.html> for a description of the hash algorithm.
These are data and functions to support quantitative peace science research. The data are important state-year information on democracy and wealth, which require periodic updates and regular maintenance. The functions permit some exploratory and diagnostic assessment of the kinds of data in demand by the community, but do not impose many dependencies on the user.
Helps make implicit data assumptions explicit by attaching keys to flat-file data that error when those assumptions are violated. Designed for CSV-first workflows without database infrastructure or version control. Provides key definition, assumption checks, join diagnostics, and automatic drift detection via watched data frames that snapshot before each transformation and report cell-level changes.
Under an L0 penalty framework, a computationally efficient implementation of change point detection is developed. By integrating active set algorithms with warm start initialization, the package achieves linear-time complexity for solving change point detection problems. References: Wen et al. (2020) <doi:10.18637/jss.v094.i04>; Zhu et al. (2020)<doi:10.1073/pnas.2014241117>.
An S4 implementation of the unbiased extension of the model- assisted synthetic-regression estimator proposed by Mandallaz (2013) <DOI:10.1139/cjfr-2012-0381>, Mandallaz et al. (2013) <DOI:10.1139/cjfr-2013-0181> and Mandallaz (2014) <DOI:10.1139/cjfr-2013-0449>. It yields smaller variances than the standard bias correction, the generalised regression estimator.
This package provides tools for data analysis with multivariate Bayesian structural time series (MBSTS) models. Specifically, the package provides facilities for implementing general structural time series models, flexibly adding on different time series components (trend, season, cycle, and regression), simulating them, fitting them to multivariate correlated time series data, conducting feature selection on the regression component.
This package provides tools for analysing multivariate time series with wavelets. This includes: simulation of a multivariate locally stationary wavelet (mvLSW) process from a multivariate evolutionary wavelet spectrum (mvEWS); estimation of the mvEWS, local coherence and local partial coherence. See Park, Eckley and Ombao (2014) <doi:10.1109/TSP.2014.2343937> for details.
Multiple imputation using XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2024) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.
Permutation based non-parametric analysis of CRISPR screen data. Details about this algorithm are published in the following paper published on BMC genomics, Jia et al. (2017) <doi:10.1186/s12864-017-3938-5>: A permutation-based non-parametric analysis of CRISPR screen data. Please cite this paper if you use this algorithm for your paper.