Efficient implementations for analyzing pre-clinical multiple drug combination datasets. It provides efficient implementations for 1.the popular synergy scoring models, including HSA, Loewe, Bliss, and ZIP to quantify the degree of drug combination synergy; 2. higher order drug combination data analysis and synergy landscape visualization for unlimited number of drugs in a combination; 3. statistical analysis of drug combination synergy and sensitivity with confidence intervals and p-values; 4. synergy barometer for harmonizing multiple synergy scoring methods to provide a consensus metric of synergy; 5. evaluation of synergy and sensitivity simultaneously to provide an unbiased interpretation of the clinical potential of the drug combinations. Based on this package, we also provide a web application (http://www.synergyfinder.org) for users who prefer graphical user interface.
An R API providing access to a relational database with macroeconomic data for Africa. The database contains >700 macroeconomic time series from mostly international sources, grouped into 50 macroeconomic and development-related topics. Series are carefully selected on the basis of data coverage for Africa, frequency, and relevance to the macro-development context. The project is part of the Kiel Institute Africa Initiative <https://www.ifw-kiel.de/institute/initiatives/kiel-institute-africa-initiative/>, which, amongst other things, aims to develop a parsimonious database with highly relevant indicators to monitor macroeconomic developments in Africa, accessible through a fast API and a web-based platform at <https://africamonitor.ifw-kiel.de/>. The database is maintained at the Kiel Institute for the World Economy <https://www.ifw-kiel.de/>.
Simulation-based sensitivity analysis for causal mediation studies. It numerically and graphically evaluates the sensitivity of causal mediation analysis results to the presence of unmeasured pretreatment confounding. The proposed method has primary advantages over existing methods. First, using an unmeasured pretreatment confounder conditional associations with the treatment, mediator, and outcome as sensitivity parameters, the method enables users to intuitively assess sensitivity in reference to prior knowledge about the strength of a potential unmeasured pretreatment confounder. Second, the method accurately reflects the influence of unmeasured pretreatment confounding on the efficiency of estimation of the causal effects. Third, the method can be implemented in different causal mediation analysis approaches, including regression-based, simulation-based, and propensity score-based methods. It is applicable to both randomized experiments and observational studies.
Fast, optimal, and reproducible weighted univariate clustering by dynamic programming. Four problems are solved, including univariate k-means (Wang & Song 2011) <doi:10.32614/RJ-2011-015> (Song & Zhong 2020) <doi:10.1093/bioinformatics/btaa613>, k-median, k-segments, and multi-channel weighted k-means. Dynamic programming is used to minimize the sum of (weighted) within-cluster distances using respective metrics. Its advantage over heuristic clustering in efficiency and accuracy is pronounced when there are many clusters. Multi-channel weighted k-means groups multiple univariate signals into k clusters. An auxiliary function generates histograms adaptive to patterns in data. This package provides a powerful set of tools for univariate data analysis with guaranteed optimality, efficiency, and reproducibility, useful for peak calling on temporal, spatial, and spectral data.
Estimates the precision of transdimensional Markov chain Monte Carlo (MCMC) output, which is often used for Bayesian analysis of models with different dimensionality (e.g., model selection). Transdimensional MCMC (e.g., reversible jump MCMC) relies on sampling a discrete model-indicator variable to estimate the posterior model probabilities. If only few switches occur between the models, precision may be low and assessment based on the assumption of independent samples misleading. Based on the observed transition matrix of the indicator variable, the method of Heck, Overstall, Gronau, & Wagenmakers (2019, Statistics & Computing, 29, 631-643) <doi:10.1007/s11222-018-9828-0> draws posterior samples of the stationary distribution to (a) assess the uncertainty in the estimated posterior model probabilities and (b) estimate the effective sample size of the MCMC output.
Objective Bayesian inference procedures for the parameters of the multivariate random effects model with application to multivariate meta-analysis. The posterior for the model parameters, namely the overall mean vector and the between-study covariance matrix, are assessed by constructing Markov chains based on the Metropolis-Hastings algorithms as developed in Bodnar and Bodnar (2021) (<arXiv:2104.02105>). The Metropolis-Hastings algorithm is designed under the assumption of the normal distribution and the t-distribution when the Berger and Bernardo reference prior and the Jeffreys prior are assigned to the model parameters. Convergence properties of the generated Markov chains are investigated by the rank plots and the split hat-R estimate based on the rank normalization, which are proposed in Vehtari et al. (2021) (<DOI:10.1214/20-BA1221>).
This package provides a collection of user-submitted functions to aid in the analysis of hydrological data, particularly for users in Canada. The functions focus on the use of Canadian data sets, and are suited to Canadian hydrology, such as the important cold region hydrological processes and will work with Canadian hydrological models. The functions are grouped into several themes, currently including Statistical hydrology, Basic data manipulations, Visualization, and Spatial hydrology. Functions developed by the Floodnet project are also included. CSHShydRology has been developed with the assistance of the Canadian Society for Hydrological Sciences (CSHS) which is an affiliated society of the Canadian Water Resources Association (CWRA). As of version 1.2.6, functions now fail gracefully when attempting to download data from a url which is unavailable.
This package provides functions to access data from the BrasilAPI', REST Countries API', Nager.Date API', and World Bank API', related to Brazil's postal codes, banks, holidays, company registrations, international country indicators, public holidays information, and economic development data. Additionally, the package includes curated datasets related to Brazil, covering topics such as demographic data (males and females by state and year), river levels, environmental emission factors, film festivals, and yellow fever outbreak records. The package supports research and analysis focused on Brazil by integrating open APIs with high-quality datasets from multiple domains. For more information on the APIs, see: BrasilAPI <https://brasilapi.com.br/>, Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, and REST Countries API <https://restcountries.com/>.
The main goal of this package is drawing the membership function of the fuzzy p-value which is defined as a fuzzy set on the unit interval for three following problems: (1) testing crisp hypotheses based on fuzzy data, (2) testing fuzzy hypotheses based on crisp data, and (3) testing fuzzy hypotheses based on fuzzy data. In all cases, the fuzziness of data or/and the fuzziness of the boundary of null fuzzy hypothesis transported via the p-value function and causes to produce the fuzzy p-value. If the p-value is fuzzy, it is more appropriate to consider a fuzzy significance level for the problem. Therefore, the comparison of the fuzzy p-value and the fuzzy significance level is evaluated by a fuzzy ranking method in this package.
Predict Scope 1, 2 and 3 carbon emissions for UK Small and Medium-sized Enterprises (SMEs), using Standard Industrial Classification (SIC) codes and annual turnover data. The carbonpredict package provides single and batch prediction, plotting, and workflow tools for carbon accounting and reporting. The package utilises pre-trained models, leveraging rich classified transaction data to accurately predict Scope 1, 2 and 3 carbon emissions for UK SMEs as well as identifying emissions hotspots. The methodology used to produce the estimates in this package is fully detailed in the following peer-reviewed publication in the Journal of Industrial Ecology: Phillpotts, A., Owen. A., Norman, J., Trendl, A., Gathergood, J., Jobst, Norbert., Leake, D. (2025) <doi:10.1111/jiec.70106> "Bridging the SME Reporting Gap: A New Model for Predicting Scope 1 and 2 Emissions".
This package provides tools for Bayesian power analysis and assurance calculations using the statistical frameworks of brms and INLA'. Includes simulation-based approaches, support for multiple decision rules (direction, threshold, ROPE), sequential designs, and visualisation helpers. Methods are based on Kruschke (2014, ISBN:9780124058880) "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan", O'Hagan & Stevens (2001) <doi:10.1177/0272989X0102100307> "Bayesian Assessment of Sample Size for Clinical Trials of Cost-Effectiveness", Kruschke (2018) <doi:10.1177/2515245918771304> "Rejecting or Accepting Parameter Values in Bayesian Estimation", Rue et al. (2009) <doi:10.1111/j.1467-9868.2008.00700.x> "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations", and Bürkner (2017) <doi:10.18637/jss.v080.i01> "brms: An R Package for Bayesian Multilevel Models using Stan".
Multi-stage selection is practiced in numerous fields of life and social sciences and particularly in breeding. A special characteristic of multi-stage selection is that candidates are evaluated in successive stages with increasing intensity and effort, and only a fraction of the superior candidates is selected and promoted to the next stage. For the optimum design of such selection programs, the selection gain plays a crucial role. It can be calculated by integration of a truncated multivariate normal (MVN) distribution. While mathematical formulas for calculating the selection gain and the variance among selected candidates were developed long time ago, solutions for numerical calculation were not available. This package can also be used for optimizing multi-stage selection programs for a given total budget and different costs of evaluating the candidates in each stage.
Computes various stability parameters from Additive Main Effects and Multiplicative Interaction (AMMI) analysis results such as Modified AMMI Stability Value (MASV), Sums of the Absolute Value of the Interaction Principal Component Scores (SIPC), Sum Across Environments of Genotype-Environment Interaction Modelled by AMMI (AMGE), Sum Across Environments of Absolute Value of Genotype-Environment Interaction Modelled by AMMI (AV_(AMGE)), AMMI Stability Index (ASI), Modified ASI (MASI), AMMI Based Stability Parameter (ASTAB), Annicchiarico's D Parameter (DA), Zhang's D Parameter (DZ), Averages of the Squared Eigenvector Values (EV), Stability Measure Based on Fitted AMMI Model (FA), Absolute Value of the Relative Contribution of IPCs to the Interaction (Za). Further calculates the Simultaneous Selection Index for Yield and Stability from the computed stability parameters. See the vignette for complete list of citations for the methods implemented.
We introduce an R function one_two_sample() which can deal with one and two (normal) samples, Ying-Ying Zhang, Yi Wei (2012) <doi:10.2991/asshm-13.2013.29>. For one normal sample x, the function reports descriptive statistics, plot, interval estimation and test of hypothesis of x. For two normal samples x and y, the function reports descriptive statistics, plot, interval estimation and test of hypothesis of x and y, respectively. It also reports interval estimation and test of hypothesis of mu1-mu2 (the difference of the means of x and y) and sigma1^2 / sigma2^2 (the ratio of the variances of x and y), tests whether x and y are from the same population, finds the correlation coefficient of x and y if x and y have the same length.
The developed function generates soil salinity indices using satellite data, utilizing multiple spectral bands such as Blue, Green, Red, Near-Infrared (NIR), and Shortwave Infrared (SWIR1, SWIR2). It computes 24 different salinity indices crucial for monitoring and analyzing salt-affected soils efficiently. For more details see, Rani, et al. (2022). <DOI: 10.1007/s12517-022-09682-3>. One of the key features of the developed function is its flexibility. Users can provide any combination of the required spectral bands, and the function will automatically calculate only the relevant indices based on the available data. This dynamic capability ensures that users can maximize the utility of their data without the need for all spectral bands, making the package versatile and user-friendly. Outputs are provided as GeoTIFF file format, facilitating easy integration with GIS workflows.
Develop and evaluate treatment rules based on: (1) the standard indirect approach of split-regression, which fits regressions separately in both treatment groups and assigns an individual to the treatment option under which predicted outcome is more desirable; (2) the direct approach of outcome-weighted-learning proposed by Yingqi Zhao, Donglin Zeng, A. John Rush, and Michael Kosorok (2012) <doi:10.1080/01621459.2012.695674>; (3) the direct approach, which we refer to as direct-interactions, proposed by Shuai Chen, Lu Tian, Tianxi Cai, and Menggang Yu (2017) <doi:10.1111/biom.12676>. Please see the vignette for a walk-through of how to start with an observational dataset whose design is understood scientifically and end up with a treatment rule that is trustworthy statistically, along with an estimation of rule benefit in an independent sample.
Top-Down mass spectrometry aims to identify entire proteins as well as their (post-translational) modifications or ions bound (eg Chen et al (2018) <doi:10.1021/acs.analchem.7b04747>). The pattern of internal fragments (Haverland et al (2017) <doi:10.1007/s13361-017-1635-x>) may reveal important information about the original structure of the proteins studied (Skinner et al (2018) <doi:10.1038/nchembio.2515> and Li et al (2018) <doi:10.1038/nchem.2908>). However, the number of possible internal fragments gets huge with longer proteins and subsequent identification of internal fragments remains challenging, in particular since the the accuracy of measurements with current mass spectrometers represents a limiting factor. This package attempts to deal with the complexity of internal fragments and allows identification of terminal and internal fragments from deconvoluted mass-spectrometry data.
Multi-state models are essential tools in longitudinal data analysis. One primary goal of these models is the estimation of transition probabilities, a critical metric for predicting clinical prognosis across various stages of diseases or medical conditions. Traditionally, inference in multi-state models relies on the Aalen-Johansen (AJ) estimator which is consistent under the Markov assumption. However, in many practical applications, the Markovian nature of the process is often not guaranteed, limiting the applicability of the AJ estimator in more complex scenarios. This package extends the landmark Aalen-Johansen estimator (Putter, H, Spitoni, C (2018) <doi:10.1177/0962280216674497>) incorporating presmoothing techniques described by Soutinho, Meira-Machado and Oliveira (2020) <doi:10.1080/03610918.2020.1762895>, offering a robust alternative for estimating transition probabilities in non-Markovian multi-state models with multiple states and potential reversible transitions.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. This is a another exporter for org-mode that translates Org-mode file to beautiful PDF file EXAMPLE ORG FILE HEADER: #+title:Readme ox-notes #+author: Matthias David #+options: toc:nil #+ou:Zoom #+quand: 20/2/2021 #+projet: ox-minutes #+absent: C. Robert,T. tartanpion #+present: K. Soulet,I. Payet #+excuse:Sophie Fonsec,Karine Soulet #+logo: logo.png
In self-reported or anonymised data the user often encounters heaped data, i.e. data which are rounded (to a possibly different degree of coarseness). While this is mostly a minor problem in parametric density estimation the bias can be very large for non-parametric methods such as kernel density estimation. This package implements a partly Bayesian algorithm treating the true unknown values as additional parameters and estimates the rounding parameters to give a corrected kernel density estimate. It supports various standard bandwidth selection methods. Varying rounding probabilities (depending on the true value) and asymmetric rounding is estimable as well: Gross, M. and Rendtel, U. (2016) (<doi:10.1093/jssam/smw011>). Additionally, bivariate non-parametric density estimation for rounded data, Gross, M. et al. (2016) (<doi:10.1111/rssa.12179>), as well as data aggregated on areas is supported.
Lag-sequential analysis is a method of assessing of patterns (what tends to follow what?) in sequences of codes. The codes are typically for discrete behaviors or states. The functions in this package read a stream of codes, or a frequency transition matrix, and produce a variety of lag sequential statistics, including transitional frequencies, expected transitional frequencies, transitional probabilities, z values, adjusted residuals, Yule's Q values, likelihood ratio tests of stationarity across time and homogeneity across groups or segments, transformed kappas for unidirectional dependence, bidirectional dependence, parallel and nonparallel dominance, and significance levels based on both parametric and randomization tests. The methods are described in Bakeman & Quera (2011) <doi:10.1017/CBO9781139017343>, O'Connor (1999) <doi:10.3758/BF03200753>, Wampold & Margolin (1982) <doi:10.1037/0033-2909.92.3.755>, and Wampold (1995, ISBN:0-89391-919-5).
The document converter pandoc <https://pandoc.org/> is widely used in the R community. One feature of pandoc is that it can produce and consume JSON-formatted abstract syntax trees (AST). This allows to transform a given source document into JSON-formatted AST, alter it by so called filters and pass the altered JSON-formatted AST back to pandoc'. This package provides functions which allow to write such filters in native R code. Although this package is inspired by the Python package pandocfilters <https://github.com/jgm/pandocfilters/>, it provides additional convenience functions which make it simple to use the pandocfilters package as a report generator. Since pandocfilters inherits most of it's functionality from pandoc it can create documents in many formats (for more information see <https://pandoc.org/>) but is also bound to the same limitations as pandoc'.
This package provides a unified method, called M statistic, is provided for detecting phylogenetic signals in continuous traits, discrete traits, and multi-trait combinations. Blomberg and Garland (2002) <doi:10.1046/j.1420-9101.2002.00472.x> provided a widely accepted statistical definition of the phylogenetic signal, which is the "tendency for related species to resemble each other more than they resemble species drawn at random from the tree". The M statistic strictly adheres to the definition of phylogenetic signal, formulating an index and developing a method of testing in strict accordance with the definition, instead of relying on correlation analysis or evolutionary models. The novel method equivalently expressed the textual definition of the phylogenetic signal as an inequality equation of the phylogenetic and trait distances and constructed the M statistic. Also, there are more distance-based methods under development.
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.