Use probability theory under the Bayesian framework for calculating the risk of selecting candidates in a multi-environment context. Contained are functions used to fit a Bayesian multi-environment model (based on the available presets), extract posterior values and maximum posterior values, compute the variance components, check the modelâ s convergence, and calculate the probabilities. For both across and within-environments scopes, the package computes the probability of superior performance and the pairwise probability of superior performance. Furthermore, the probability of superior stability and the pairwise probability of superior stability across environments is estimated. A joint probability of superior performance and stability is also provided.
Compositional data consisting of three-parts can be color mapped with a ternary color scale. Such a scale is provided by the tricolore packages with options for discrete and continuous colors, mean-centering and scaling. See Jonas Schöley (2021) "The centered ternary balance scheme. A technique to visualize surfaces of unbalanced three-part compositions" <doi:10.4054/DemRes.2021.44.19>, Jonas Schöley, Frans Willekens (2017) "Visualizing compositional data on the Lexis surface" <doi:10.4054/DemRes.2017.36.21>, and Ilya Kashnitsky, Jonas Schöley (2018) "Regional population structures at a glance" <doi:10.1016/S0140-6736(18)31194-2>.
This package implements the whitening methods (ZCA, PCA, Cholesky, ZCA-cor, and PCA-cor) discussed in Kessy, Lewin, and Strimmer (2018) "Optimal whitening and decorrelation", <doi:10.1080/00031305.2016.1277159>, as well as the whitening approach to canonical correlation analysis allowing negative canonical correlations described in Jendoubi and Strimmer (2019) "A whitening approach to probabilistic canonical correlation analysis for omics data integration", <doi:10.1186/s12859-018-2572-9>. The package also offers functions to simulate random orthogonal matrices, compute (correlation) loadings and explained variation. It also contains four example data sets (extended UCI wine data, TCGA LUSC data, nutrimouse data, extended pitprops data).
The CellScore package contains functions to evaluate the cell identity of a test sample, given a cell transition defined with a starting (donor) cell type and a desired target cell type. The evaluation is based upon a scoring system, which uses a set of standard samples of known cell types, as the reference set. The functions have been carried out on a large set of microarray data from one platform (Affymetrix Human Genome U133 Plus 2.0). In principle, the method could be applied to any expression dataset, provided that there are a sufficient number of standard samples and that the data are normalized.
HiCBricks is a library designed for handling large high-resolution Hi-C datasets. Over the years, the Hi-C field has experienced a rapid increase in the size and complexity of datasets. HiCBricks is meant to overcome the challenges related to the analysis of such large datasets within the R environment. HiCBricks offers user-friendly and efficient solutions for handling large high-resolution Hi-C datasets. The package provides an R/Bioconductor framework with the bricks to build more complex data analysis pipelines and algorithms. HiCBricks already incorporates example algorithms for calling domain boundaries and functions for high quality data visualization.
Calculates key indicators such as fertility rates (Total Fertility Rate (TFR), General Fertility Rate (GFR), and Age Specific Fertility Rate (ASFR)) using Demographic and Health Survey (DHS) women/individual data, childhood mortality probabilities and rates such as Neonatal Mortality Rate (NNMR), Post-neonatal Mortality Rate (PNNMR), Infant Mortality Rate (IMR), Child Mortality Rate (CMR), and Under-five Mortality Rate (U5MR), and adult mortality indicators such as the Age Specific Mortality Rate (ASMR), Age Adjusted Mortality Rate (AAMR), Age Specific Maternal Mortality Rate (ASMMR), Age Adjusted Maternal Mortality Rate (AAMMR), Age Specific Pregnancy Related Mortality Rate (ASPRMR), Age Adjusted Pregnancy Related Mortality Rate (AAPRMR), Maternal Mortality Ratio (MMR) and Pregnancy Related Mortality Ratio (PRMR). In addition to the indicators, the DHS.rates package estimates sampling errors indicators such as Standard Error (SE), Design Effect (DEFT), Relative Standard Error (RSE) and Confidence Interval (CI). The package is developed according to the DHS methodology of calculating the fertility indicators and the childhood mortality rates outlined in the "Guide to DHS Statistics" (Croft, Trevor N., Aileen M. J. Marshall, Courtney K. Allen, et al. 2018, <https://dhsprogram.com/Data/Guide-to-DHS-Statistics/index.cfm>) and the DHS methodology of estimating the sampling errors indicators outlined in the "DHS Sampling and Household Listing Manual" (ICF International 2012, <https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf>).
This package provides functions in this package fit a stratified Cox proportional hazards and a proportional subdistribution hazards model by extending Zhang et al., (2007) <doi: 10.1016/j.cmpb.2007.07.010> and Zhang et al., (2011) <doi: 10.1016/j.cmpb.2010.07.005> respectively to clustered right-censored data. The functions also provide the estimates of the cumulative baseline hazard along with their standard errors. Furthermore, the adjusted survival and cumulative incidence probabilities are also provided along with their standard errors. Finally, the estimate of cumulative incidence and survival probabilities given a vector of covariates along with their standard errors are also provided.
Reaction rate dynamics can be retrieved from metabolite concentration time courses. User has to provide corresponding stoichiometric matrix but not a regulation model (Michaelis-Menten or similar). Instead of solving an ordinary differential equation (ODE) system describing the evolution of concentrations, we use B-splines to catch the concentration and rate dynamics then solve a least square problem on their coefficients with non-negativity (and optionally monotonicity) constraints. Constraints can be also set on initial values of concentration. The package dynafluxr can be used as a library but also as an application with command line interface dynafluxr::cli("-h") or graphical user interface dynafluxr::gui().
Interactive labelling of scatter plots, volcano plots and Manhattan plots using a shiny and plotly interface. Users can hover over points to see where specific points are located and click points on/off to easily label them. Labels can be dragged around the plot to place them optimally. Plots can be exported directly to PDF for publication. For plots with large numbers of points, points can optionally be rasterized as a bitmap, while all other elements (axes, text, labels & lines) are preserved as vector objects. This can dramatically reduce file size for plots with millions of points such as Manhattan plots, and is ideal for publication.
DNA methylation (6mA) is a major epigenetic process by which alteration in gene expression took place without changing the DNA sequence. Predicting these sites in-vitro is laborious, time consuming as well as costly. This EpiSemble package is an in-silico pipeline for predicting DNA sequences containing the 6mA sites. It uses an ensemble-based machine learning approach by combining Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting approach to predict the sequences with 6mA sites in it. This package has been developed by using the concept of Chen et al. (2019) <doi:10.1093/bioinformatics/btz015>.
Bayesian model averaging (BMA) algorithms for univariate link latent Gaussian models (ULLGMs). For detailed information, refer to Steel M.F.J. & Zens G. (2024) "Model Uncertainty in Latent Gaussian Models with Univariate Link Function" <doi:10.48550/arXiv.2406.17318>. The package supports various g-priors and a beta-binomial prior on the model space. It also includes auxiliary functions for visualizing and tabulating BMA results. Currently, it offers an out-of-the-box solution for model averaging of Poisson log-normal (PLN) and binomial logistic-normal (BiL) models. The codebase is designed to be easily extendable to other likelihoods, priors, and link functions.
Quantitative RT-PCR data are analyzed using generalized linear mixed models based on lognormal-Poisson error distribution, fitted using MCMC. Control genes are not required but can be incorporated as Bayesian priors or, when template abundances correlate with conditions, as trackers of global effects (common to all genes). The package also implements a lognormal model for higher-abundance data and a "classic" model involving multi-gene normalization on a by-sample basis. Several plotting functions are included to extract and visualize results. The detailed tutorial is available here: <https://matzlab.weebly.com/uploads/7/6/2/2/76229469/mcmc.qpcr.tutorial.v1.2.4.pdf>.
Cooperative learning combines the usual squared error loss of predictions with an agreement penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty (Ding, D., Li, S., Narasimhan, B., Tibshirani, R. (2021) <doi:10.1073/pnas.2202113119>).
This package provides gradient-based MCMC sampling algorithms for use with the MCMC engine provided by the nimble package. This includes two versions of Hamiltonian Monte Carlo (HMC) No-U-Turn (NUTS) sampling, and (under development) Langevin samplers. The `NUTS_classic` sampler implements the original HMC-NUTS algorithm as described in Hoffman and Gelman (2014) <doi:10.48550/arXiv.1111.4246>. The `NUTS` sampler is a modern version of HMC-NUTS sampling matching the HMC sampler available in version 2.32.2 of Stan (Stan Development Team, 2023). In addition, convenience functions are provided for generating and modifying MCMC configuration objects which employ HMC sampling.
We develop a new class of distribution free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data driven threshold along the ranking to control the FDR. For more information, see the website below and the accompanying paper: Du et al. (2020), "False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation", <arXiv:2002.11992>.
Hail is an open-source, general-purpose, python based data analysis tool with additional data types and methods for working with genomic data, see <https://hail.is/>. Hail is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS). Hail is exposed as a python library, using primitives for distributed queries and linear algebra implemented in scala', spark', and increasingly C++'. The sparkhail is an R extension using sparklyr package. The idea is to help R users to use hail functionalities with the well-know tidyverse syntax, see <https://www.tidyverse.org/>.
This package provides a set of functions devoted to multivariate exploratory statistics on textual data. Classical methods such as correspondence analysis and agglomerative hierarchical clustering are available. Chronologically constrained agglomerative hierarchical clustering enriched with labelled-by-words trees is offered. Given a division of the corpus into parts, their characteristic words and documents are identified. Further, accessing to FactoMineR functions is very easy. Two of them are relevant in textual domain. MFA() addresses multiple lexical table allowing applications such as dealing with multilingual corpora as well as simultaneously analyzing both open-ended and closed questions in surveys. See <http://xplortext.unileon.es> for examples.
DEPRECATED. Do not start building new projects based on this package. (The (in-house) APD file format was initially developed to store Affymetrix probe-level data, e.g. normalized CEL intensities. Chip types can be added to APD file and similar to methods in the affxparser package, this package provides methods to read APDs organized by units (probesets). In addition, the probe elements can be arranged optimally such that the elements are guaranteed to be read in order when, for instance, data is read unit by unit. This speeds up the read substantially. This package is supporting the Aroma framework and should not be used elsewhere.).
This package performs adjustments of a user-supplied independence loglikelihood function using a robust sandwich estimator of the parameter covariance matrix, based on the methodology in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>. This can be used for cluster correlated data when interest lies in the parameters of the marginal distributions or for performing inferences that are robust to certain types of model misspecification. Functions for profiling the adjusted loglikelihoods are also provided, as are functions for calculating and plotting confidence intervals, for single model parameters, and confidence regions, for pairs of model parameters. Nested models can be compared using an adjusted likelihood ratio test.
The Markowitz criterion is a multicriteria decision-making method that stands out in risk and uncertainty analysis in contexts where probabilities are known. This approach represents an evolution of Pascal's criterion by incorporating the dimension of variability. In this framework, the expected value reflects the anticipated return, while the standard deviation serves as a measure of risk. The markowitz package provides a practical and accessible tool for implementing this method, enabling researchers and professionals to perform analyses without complex calculations. Thus, the package facilitates the application of the Markowitz criterion. More details on the method can be found in Octave Jokung-Nguéna (2001, ISBN 2100055372).
This package provides spatially survey balanced designs using the quasi-random number method described Robinson et al. (2013) <doi:10.1111/biom.12059> and adjusted in Robinson et al. (2017) <doi:10.1016/j.spl.2017.05.004>. Designs using MBHdesign can: 1) accommodate, without substantial detrimental effects on spatial balance, legacy sites (Foster et al., 2017 <doi:10.1111/2041-210X.12782>); 2) be based on points or transects (foster et al. 2020 <doi:10.1111/2041-210X.13321> and produce clustered samples (Foster et al. (in press). Additional information about the package use itself is given in Foster (2021) <doi:10.1111/2041-210X.13535>.
Website generator with HTML summaries for predictive models. This package uses DALEX explainers to describe global model behavior. We can see how well models behave (tabs: Model Performance, Auditor), how much each variable contributes to predictions (tabs: Variable Response) and which variables are the most important for a given model (tabs: Variable Importance). We can also compare Concept Drift for pairs of models (tabs: Drifter). Additionally, data available on the website can be easily recreated in current R session. Work on this package was financially supported by the NCN Opus grant 2017/27/B/ST6/01307 at Warsaw University of Technology, Faculty of Mathematics and Information Science.
Procedures for testing for group-wide signal in clusters of variables. Tests can be performed for single groups in isolation (univariate) or multiple groups together (multivariate). Specific tests include the exact and approximate (un)selective likelihood ratio tests described in Reid et al (2015), the selective F test and marginal screening prototype test of Reid and Tibshirani (2015). User may pre-specify columns to be included in prototype formation, or allow the function to select them itself. A mixture of these two is also possible. Any variable selection is accounted for using the selective inference framework. Options for non-sampling and hit-and-run null reference distributions.
This package provides a versatile R visualization package that empowers researchers with comprehensive visualization tools for seamlessly mapping peptides to protein sequences, identifying distinct domains and regions of interest, accentuating mutations, and highlighting post-translational modifications, all while enabling comparisons across diverse experimental conditions. Potential applications of PepMapViz include the visualization of cross-software mass spectrometry results at the peptide level for specific protein and domain details in a linearized format and post-translational modification coverage across different experimental conditions; unraveling insights into disease mechanisms. It also enables visualization of Major histocompatibility complex-presented peptide clusters in different antibody regions predicting immunogenicity in antibody drug development.