Multivariate data analysis is the simultaneous observation of more than one characteristic. In contrast to the analysis of univariate data, in this approach not only a single variable or the relation between two variables can be investigated, but the relations between many attributes can be considered. For the statistical analysis of chemical data one has to take into account the special structure of this type of data. This package contains about 30 functions, mostly for regression, classification and model evaluation and includes some data sets used in the R help examples. It was designed as a R companion to the book "Introduction to Multivariate Statistical Analysis in Chemometrics" written by K. Varmuza and P. Filzmoser (2009).
This package provides functions for identification and transportation of causal effects. Provides a conditional causal effect identification algorithm (IDC) by Shpitser, I. and Pearl, J. (2006) <http://ftp.cs.ucla.edu/pub/stat_ser/r329-uai.pdf>, an algorithm for transportability from multiple domains with limited experiments by Bareinboim, E. and Pearl, J. (2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, and a selection bias recovery algorithm by Bareinboim, E. and Tian, J. (2015) <http://ftp.cs.ucla.edu/pub/stat_ser/r445.pdf>. All of the previously mentioned algorithms are based on a causal effect identification algorithm by Tian , J. (2002) <http://ftp.cs.ucla.edu/pub/stat_ser/r309.pdf>.
Relatively easy access is provided to 2023 version of the Maddison project data downloaded 2025-08-28. This project collates all the credible data on population and GDP for 169 countries, with some dating back to the year 1 of the current era. One function makes it easy to find the leaders for each year, allowing users to delete countries like OPEC with narrow economies to focus on technology leaders. Another function makes it easy to plot data for only selected countries or years. Another function makes it relatively easy to obtain references to the original sources, which must be cited per the copyright rules of the Maddison Project for different uses of their data.
adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The CGD attack is based on an estimated gradient descent. against adversarial attacks. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks.
This package provides functions to perform comparative causal mediation analysis to compare the mediation effects of different treatments via a common mediator. Results contain the estimates and confidence intervals for the two comparative causal mediation analysis estimands, as well as the ATE and ACME for each treatment. Functions provided in the package will automatically assess the comparative causal mediation analysis scope conditions (i.e. for each comparative causal mediation estimand, a numerator and denominator that are both estimated with the desired statistical significance and of the same sign). Results will be returned for each comparative causal mediation estimand only if scope conditions are met for it. See details in Bansak(2020)<doi:10.1017/pan.2019.31>.
This package provides tools for crop breeding analysis including Genetic Coefficient of Variation (GCV), Phenotypic Coefficient of Variation (PCV), heritability, genetic advance calculations, stability analysis using the Eberhart-Russell model, two-way ANOVA for genotype-environment interactions, and Additive Main Effects and Multiplicative Interaction (AMMI) analysis. These tools are developed for crop breeding research and stability evaluation under various environmental conditions. The methods are based on established statistical and biometrical principles. Refer to Eberhart and Russell (1966) <doi:10.2135/cropsci1966.0011183X000600010011x> for stability parameters, Fisher (1935) "The Design of Experiments" <ISBN:9780198522294>, Falconer (1996) "Introduction to Quantitative Genetics" <ISBN:9780582243026>, and Singh and Chaudhary (1985) "Biometrical Methods in Quantitative Genetic Analysis" <ISBN:9788122433764> for foundational methodologies.
This package provides functions to calculate weights, estimates of changes and corresponding variance estimates for panel data with non-response. Partially overlapping samples are handled. Initially, weights are calculated by linear calibration. By default, the survey package is used for this purpose. It is also possible to use ReGenesees, which can be installed from <https://github.com/DiegoZardetto/ReGenesees>. Variances of linear combinations (changes and averages) and ratios are calculated from a covariance matrix based on residuals according to the calibration model. The methodology was presented at the conference, The Use of R in Official Statistics, and is described in Langsrud (2016) <http://www.revistadestatistica.ro/wp-content/uploads/2016/06/RRS2_2016_A021.pdf>.
Create a skeleton shiny application with create_template() that is reproducible, can be saved and meets academic standards for attribution. Forked from wallace'. Code is split into modules that are loaded and linked together automatically and each call one function. Guidance pages explain modules to users and flexible logging informs them of any errors. Options enable asynchronous operations, viewing of source code, interactive maps and data tables. Use to create complex analytical applications, following best practices in open science and software development. Includes functions for automating repetitive development tasks and an example application at run_shinyscholar() that requires install.packages("shinyscholar", dependencies = TRUE). A guide to developing applications can be found on the package website.
This package provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.
This package provides a likelihood method is implemented to present evidence for evaluating bioequivalence (BE). The functions use bioequivalence data [area under the blood concentration-time curve (AUC) and peak concentration (Cmax)] from various crossover designs commonly used in BE studies including a fully replicated, a partially replicated design, and a conventional 2x2 crossover design. They will calculate the profile likelihoods for the mean difference, total standard deviation ratio, and within subject standard deviation ratio for a test and a reference drug. A plot of a standardized profile likelihood can be generated along with the maximum likelihood estimate and likelihood intervals, which present evidence for bioequivalence. See Liping Du and Leena Choi (2015) <doi:10.1002/pst.1661>.
Empirical Bayes thresholding using the methods developed by I. M. Johnstone and B. W. Silverman. The basic problem is to estimate a mean vector given a vector of observations of the mean vector plus white noise, taking advantage of possible sparsity in the mean vector. Within a Bayesian formulation, the elements of the mean vector are modelled as having, independently, a distribution that is a mixture of an atom of probability at zero and a suitable heavy-tailed distribution. The mixing parameter can be estimated by a marginal maximum likelihood approach. This leads to an adaptive thresholding approach on the original data. Extensions of the basic method, in particular to wavelet thresholding, are also implemented within the package.
Subgroup analyses are routinely performed in clinical trial analyses. From a methodological perspective, two key issues of subgroup analyses are multiplicity (even if only predefined subgroups are investigated) and the low sample sizes of subgroups which lead to highly variable estimates, see e.g. Yusuf et al (1991) <doi:10.1001/jama.1991.03470010097038>. This package implements subgroup estimates based on Bayesian shrinkage priors, see Carvalho et al (2019) <https://proceedings.mlr.press/v5/carvalho09a.html>. In addition, estimates based on penalized likelihood inference are available, based on Simon et al (2011) <doi:10.18637/jss.v039.i05>. The corresponding shrinkage based forest plots address the aforementioned issues and can complement standard forest plots in practical clinical trial analyses.
Google offers public access to global search volumes from its search engine through the Google Trends portal. The package downloads these search volumes provided by Google Trends and uses them to measure and analyze the distribution of search scores across countries or within countries. The package allows researchers and analysts to use these search scores to investigate global trends based on patterns within these scores. This offers insights such as degree of internationalization of firms and organizations or dissemination of political, social, or technological trends across the globe or within single countries. An outline of the package's methodological foundations and potential applications is available as a working paper: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3969013>.
This R package provides a single procedure guix.install(), which allows users to install R packages via Guix right from within their running R session. If the requested R package does not exist in Guix at this time, the package and all its missing dependencies will be imported recursively and the generated package definitions will be written to ~/.Rguix/packages.scm. This record of imported packages can be used later to reproduce the environment, and to add the packages in question to a proper Guix channel (or Guix itself). guix.install() not only supports installing packages from CRAN, but also from Bioconductor or even arbitrary git or mercurial repositories, replacing the need for installation via devtools.
EventPointer is an R package to identify alternative splicing events that involve either simple (case-control experiment) or complex experimental designs such as time course experiments and studies including paired-samples. The algorithm can be used to analyze data from either junction arrays (Affymetrix Arrays) or sequencing data (RNA-Seq). The software returns a data.frame with the detected alternative splicing events: gene name, type of event (cassette, alternative 3',...,etc), genomic position, statistical significance and increment of the percent spliced in (Delta PSI) for all the events. The algorithm can generate a series of files to visualize the detected alternative splicing events in IGV. This eases the interpretation of results and the design of primers for standard PCR validation.
Uses a calibrated model fusion approach to optimally combine multiple surrogate markers. Specifically, two initial estimates of optimal composite scores of the markers are obtained; the optimal calibrated combination of the two estimated scores is then constructed which ensures both validity of the final combined score and optimality with respect to the proportion of treatment effect explained (PTE) by the final combined score. The primary function, pte.estimate.multiple(), estimates the PTE of the identified combination of multiple surrogate markers. Details are described in Wang et al (2022) <doi:10.1111/biom.13677>. A tutorial for the package is available at <https://www.laylaparast.com/cmfsurrogate> and a Shiny App is available at <https://parastlab.shinyapps.io/CMFsurrogateApp/>.
Implementations of several multiple testing procedures that control the family-wise error rate (FWER) designed specifically for discrete tests. Included are discrete adaptations of the Bonferroni, Holm, Hochberg and Šidák procedures as described in the papers Döhler (2010) "Validation of credit default probabilities using multiple-testing procedures" <doi:10.21314/JRMV.2010.062> and Zhu & Guo (2019) "Family-Wise Error Rate Controlling Procedures for Discrete Data" <doi:10.1080/19466315.2019.1654912>. The main procedures of this package take as input the results of a test procedure from package DiscreteTests or a set of observed p-values and their discrete support under their nulls. A shortcut function to apply discrete procedures directly to data is also provided.
Differential exon usage test for RNA-Seq data via an empirical Bayes shrinkage method for the dispersion parameter the utilizes inclusion-exclusion data to analyze the propensity to skip an exon across groups. The input data consists of two matrices where each row represents an exon and the columns represent the biological samples. The first matrix is the count of the number of reads expressing the exon for each sample. The second matrix is the count of the number of reads that either express the exon or explicitly skip the exon across the samples, a.k.a. the total count matrix. Dividing the two matrices yields proportions representing the propensity to express the exon versus skipping the exon for each sample.
Presents two methods to estimate the parameters mu', sigma', and tau of an ex-Gaussian distribution. Those methods are Quantile Maximization Likelihood Estimation ('QMLE') and Bayesian. The QMLE method allows a choice between three different estimation algorithms for these parameters : neldermead ('NEMD'), fminsearch ('FMIN'), and nlminb ('NLMI'). For more details about the methods you can refer at the following list: Brown, S., & Heathcote, A. (2003) <doi:10.3758/BF03195527>; McCormack, P. D., & Wright, N. M. (1964) <doi:10.1037/h0083285>; Van Zandt, T. (2000) <doi:10.3758/BF03214357>; El Haj, A., Slaoui, Y., Solier, C., & Perret, C. (2021) <doi:10.19139/soic-2310-5070-1251>; Gilks, W. R., Best, N. G., & Tan, K. K. C. (1995) <doi:10.2307/2986138>.
This package provides a fast C++ implementation of the design-based, Diffusion Decision Model (DDM) and the Linear Ballistic Accumulation (LBA) model. It enables the user to optimise the choice response time model by connecting with the Differential Evolution Markov Chain Monte Carlo (DE-MCMC) sampler implemented in the ggdmc package. The package fuses the hierarchical modelling, Bayesian inference, choice response time models and factorial designs, allowing users to build their own design-based models. For more information on the underlying models, see the works by Voss, Rothermund, and Voss (2004) <doi:10.3758/BF03196893>, Ratcliff and McKoon (2008) <doi:10.1162/neco.2008.12-06-420>, and Brown and Heathcote (2008) <doi:10.1016/j.cogpsych.2007.12.002>.
This package provides a simple to use, intuitive, and extensible interface to several stochastic simulation algorithms for generating simulated trajectories of finite population continuous-time model. Currently it implements Gillespie's exact stochastic simulation algorithm (Direct method) and several approximate methods (Explicit tau-leap, Binomial tau-leap, and Optimized tau-leap). The package also contains a library of template models that can be run as demo models and can easily be customized and extended. Currently the following models are included, Decaying-Dimerization reaction set, linear chain system, logistic growth model, Lotka predator-prey model, Rosenzweig-MacArthur predator-prey model, Kermack-McKendrick SIR model, and a metapopulation SIRS model. Pineda-Krch et al. (2008) <doi:10.18637/jss.v025.i12>.
An integrative toolbox of word embedding research that provides: (1) a collection of pre-trained static word vectors in the .RData compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a group of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; and (4) a set of training methods to locally train (static) word vectors from text corpora, including Word2Vec <doi:10.48550/arXiv.1301.3781>, GloVe <doi:10.3115/v1/D14-1162>, and FastText <doi:10.48550/arXiv.1607.04606>.
This package provides a web-based shiny interface for the StepReg package enables stepwise regression analysis across linear, generalized linear (including logistic, Poisson, Gamma, and negative binomial), and Cox models. It supports forward, backward, bidirectional, and best-subset selection under a range of criteria. The package also supports stepwise regression to multivariate settings, allowing multiple dependent variables to be modeled simultaneously. Users can explore and combine multiple selection strategies and criteria to optimize model selection. For enhanced robustness, the package offers optional randomized forward selection to reduce overfitting, and a data-splitting workflow for more reliable post-selection inference. Additional features include logging and visualization of the selection process, as well as the ability to export results in common formats.
This tool fits a non-parametric Bayesian model called a "hierarchically coupled mixture model with local dependence (HCMM-LD)" to the original microdata in order to generate synthetic microdata for privacy protection. The non-parametric feature of the adopted model is useful for capturing the joint distribution of the original input data in a highly flexible manner, leading to the generation of synthetic data whose distributional features are similar to that of the input data. The package allows the original input data to have missing values and impute them with the posterior predictive distribution, so no missing values exist in the synthetic data output. The method builds on the work of Murray and Reiter (2016) <doi:10.1080/01621459.2016.1174132>.