This package provides the functions to perform a Welch's one-way Anova with fixed effects based on summary statistics (sample size, means, standard deviation) and the Games-Howell post hoc test for multiple comparisons and provides the effect size estimator adjusted omega squared. In addition sample size estimation can be computed based on Levy's method, and a Monte Carlo simulation is included to bootstrap residual normality and homoscedasticity Welch, B. L. (1951) <doi:10.1093/biomet/38.3-4.330> Kirk, R. E. (1996) <doi:10.1177/0013164496056005002> Carroll, R. M., & Nordholm, L. A. (1975) <doi:10.1177/001316447503500304> Albers, C., & Lakens, D. (2018) <doi:10.1016/j.jesp.2017.09.004> Games, P. A., & Howell, J. F. (1976) <doi:10.2307/1164979> Levy, K. J. (1978a) <doi:10.1080/00949657808810246> Show-Li, J., & Gwowen, S. (2014) <doi:10.1111/bmsp.12006>.
Bayesian optimal interval based on both efficacy and toxicity outcomes (BOIN-ET) design is a model-assisted oncology phase I/II trial design, aiming to establish an optimal biological dose accounting for efficacy and toxicity in the framework of dose-finding. Some extensions of BOIN-ET design are also available to allow for time-to-event efficacy and toxicity outcomes based on cumulative and pending data (time-to-event BOIN-ET: TITE-BOIN-ET), ordinal graded efficacy and toxicity outcomes (generalized BOIN-ET: gBOIN-ET
), and their combination (TITE-gBOIN-ET
). boinet is a package to implement the BOIN-ET design family and supports the conduct of simulation studies to assess operating characteristics of BOIN-ET, TITE-BOIN-ET, gBOIN-ET
, and TITE-gBOIN-ET
, where users can choose design parameters in flexible and straightforward ways depending on their own application.
Calculate the bark beetle phenology based on raster data or point-related data. There are multiple models implemented for two bark beetle species. The models can be customized and their submodels (onset of infestation, beetle development, diapause initiation, mortality) can be combined. The following models are available in the package: PHENIPS-Clim (first-time release in this package), PHENIPS (Baier et al. 2007) <doi:10.1016/j.foreco.2007.05.020>, RITY (Ogris et al. 2019) <doi:10.1016/j.ecolmodel.2019.108775>, CHAPY (Ogris et al. 2020) <doi:10.1016/j.ecolmodel.2020.109137>, BSO (Jakoby et al. 2019) <doi:10.1111/gcb.14766>, Lange et al. (2008) <doi:10.1007/978-3-540-85081-6_32>, Jönsson et al. (2011) <doi:10.1007/s10584-011-0038-4>. The package may be expanded by models for other bark beetle species in the future.
Includes wrapper functions around existing functions for the analysis of categorical data and introduces functions for calculating risk differences and matched odds ratios. R currently supports a wide variety of tools for the analysis of categorical data. However, many functions are spread across a variety of packages with differing syntax and poor compatibility with each another. prop_test()
combines the functions binom.test()
, prop.test()
and BinomCI()
into one output. prop_power()
allows for power and sample size calculations for both balanced and unbalanced designs. riskdiff()
is used for calculating risk differences and matched_or()
is used for calculating matched odds ratios. For further information on methods used that are not documented in other packages see Nathan Mantel and William Haenszel (1959) <doi:10.1093/jnci/22.4.719> and Alan Agresti (2002) <ISBN:0-471-36093-7>.
Interpreting the differences between mean scale scores across various forms of an assessment can be challenging. This difficulty arises from different mappings between raw scores and scale scores, complex mathematical relationships, adjustments based on judgmental procedures, and diverse equating functions applied to different assessment forms. An alternative method involves running simulations to explore the effect of incrementing raw scores on mean scale scores. The idmact package provides an implementation of this approach based on the algorithm detailed in Schiel (1998) <https://www.act.org/content/dam/act/unsecured/documents/ACT_RR98-01.pdf> which was developed to help interpret differences between mean scale scores on the American College Testing (ACT) assessment. The function idmact_subj()
within the package offers a framework for running simulations on subject-level scores. In contrast, the idmact_comp()
function provides a framework for conducting simulations on composite scores.
This package provides a Graphical user interface to calculate the rainfall-runoff relation using the Natural Resources Conservation Service - Curve Number method (NRCS-CN method) but include modifications by Hawkins et al., (2002) about the Initial Abstraction. This GUI follows the programming logic of a previously published software (Hernandez-Guzman et al., 2011)<doi:10.1016/j.envsoft.2011.07.006>. It is a raster-based GIS tool that outputs runoff estimates from Land use/land cover and hydrologic soil group maps. This package has already been published in Journal of Hydroinformatics (Hernandez-Guzman et al., 2021)<doi:10.2166/hydro.2020.087> but it is under constant development at the Institute about Natural Resources Research (INIRENA) from the Universidad Michoacana de San Nicolas de Hidalgo and represents a collaborative effort between the Hydro-Geomatic Lab (INIRENA) with the Environmental Management Lab (CIAD, A.C.).
This package provides a post-estimation method for categorical response models (CRM). Inputs from objects of class serp()
, clm()
, polr()
, multinom()
, mlogit()
, vglm()
and glm()
are currently supported. Available tests include the Hosmer-Lemeshow tests for the binary, multinomial and ordinal logistic regression; the Lipsitz and the Pulkstenis-Robinson tests for the ordinal models. The proportional odds, adjacent-category, and constrained continuation-ratio models are particularly supported at ordinal level. Tests for the proportional odds assumptions in ordinal models are also possible with the Brant and the Likelihood-Ratio tests. Moreover, several summary measures of predictive strength (Pseudo R-squared), and some useful error metrics, including, the brier score, misclassification rate and logloss are also available for the binary, multinomial and ordinal models. Ugba, E. R. and Gertheiss, J. (2018) <http://www.statmod.org/workshops_archive_proceedings_2018.html>.
The saemix package implements the Stochastic Approximation EM algorithm for parameter estimation in (non)linear mixed effects models. It (i) computes the maximum likelihood estimator of the population parameters, without any approximation of the model (linearisation, quadrature approximation,...), using the Stochastic Approximation Expectation Maximization (SAEM) algorithm, (ii) provides standard errors for the maximum likelihood estimator (iii) estimates the conditional modes, the conditional means and the conditional standard deviations of the individual parameters, using the Hastings-Metropolis algorithm (see Comets et al. (2017) <doi:10.18637/jss.v080.i03>). Many applications of SAEM in agronomy, animal breeding and PKPD analysis have been published by members of the Monolix group. The full PDF documentation for the package including references about the algorithm and examples can be downloaded on the github of the IAME research institute for saemix': <https://github.com/iame-researchCenter/saemix/blob/7638e1b09ccb01cdff173068e01c266e906f76eb/docsaem.pdf>
.
This natural language processing toolkit provides language-agnostic tokenization', parts of speech tagging', lemmatization and dependency parsing of raw text. Next to text parsing, the package also allows you to train annotation models based on data of treebanks in CoNLL-U
format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
This natural language processing toolkit provides language-agnostic tokenization, parts of speech tagging, lemmatization and dependency parsing of raw text. Next to text parsing, the package also allows you to train annotation models based on data of treebanks in CoNLL-U format as provided at https://universaldependencies.org/format.html. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at doi:10.18653/v1/K17-3009. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Learning, manipulation and evaluation of mixtures of truncated basis functions (MoTBFs
), which include mixtures of polynomials (MOPs) and mixtures of truncated exponentials (MTEs). MoTBFs
are a flexible framework for modelling hybrid Bayesian networks (I. Pérez-Bernabé, A. Salmerón, H. Langseth (2015) <doi:10.1007/978-3-319-20807-7_36>; H. Langseth, T.D. Nielsen, I. Pérez-Bernabé, A. Salmerón (2014) <doi:10.1016/j.ijar.2013.09.012>; I. Pérez-Bernabé, A. Fernández, R. Rumà , A. Salmerón (2016) <doi:10.1007/s10618-015-0429-7>). The package provides functionality for learning univariate, multivariate and conditional densities, with the possibility of incorporating prior knowledge. Structural learning of hybrid Bayesian networks is also provided. A set of useful tools is provided, including plotting, printing and likelihood evaluation. This package makes use of S3 objects, with two new classes called motbf and jointmotbf'.
Power and sample size calculations for genetic association studies allowing for misspecification of the model of genetic susceptibility. "Hum Hered. 2019;84(6):256-271.<doi:10.1159/000508558>. Epub 2020 Jul 28." Power and/or sample size can be calculated for logistic (case/control study design) and linear (continuous phenotype) regression models, using additive, dominant, recessive or degree of freedom coding of the genetic covariate while assuming a true dominant, recessive or additive genetic effect. In addition, power and sample size calculations can be performed for gene by environment interactions. These methods are extensions of Gauderman (2002) <doi:10.1093/aje/155.5.478> and Gauderman (2002) <doi:10.1002/sim.973> and are described in: Moore CM, Jacobson S, Fingerlin TE. Power and Sample Size Calculations for Genetic Association Studies in the Presence of Genetic Model Misspecification. American Society of Human Genetics. October 2018, San Diego.
It covers several Bayesian Analysis of Variance (BANOVA) models used in analysis of experimental designs in which both within- and between- subjects factors are manipulated. They can be applied to data that are common in the behavioral and social sciences. The package includes: Hierarchical Bayes ANOVA models with normal response, t response, Binomial (Bernoulli) response, Poisson response, ordered multinomial response and multinomial response variables. All models accommodate unobserved heterogeneity by including a normal distribution of the parameters across individuals. Outputs of the package include tables of sums of squares, effect sizes and p-values, and tables of predictions, which are easily interpretable for behavioral and social researchers. The floodlight analysis and mediation analysis based on these models are also provided. BANOVA uses Stan and JAGS as the computational platform. References: Dong and Wedel (2017) <doi:10.18637/jss.v081.i09>; Wedel and Dong (2020) <doi:10.1002/jcpy.1111>.
This package provides a collection of tools and data for analyzing the Gause microcosm experiments, and for fitting Lotka-Volterra models to time series data. Includes methods for fitting single-species logistic growth, and multi-species interaction models, e.g. of competition, predator/prey relationships, or mutualism. See documentation for individual functions for examples. In general, see the lv_optim()
function for examples of how to fit parameter values in multi-species systems. Note that the general methods applied here, as well as the form of the differential equations that we use, are described in detail in the Quantitative Ecology textbook by Lehman et al., available at <http://hdl.handle.net/11299/204551>, and in Lina K. Mühlbauer, Maximilienne Schulze, W. Stanley Harpole, and Adam T. Clark. gauseR
': Simple methods for fitting Lotka-Volterra models describing Gause's Struggle for Existence in the journal Ecology and Evolution.
This package provides a collection of clinical trial designs and methods, implemented in rstan and R, including: the Continual Reassessment Method by O'Quigley et al. (1990) <doi:10.2307/2531628>; EffTox
by Thall & Cook (2004) <doi:10.1111/j.0006-341X.2004.00218.x>; the two-parameter logistic method of Neuenschwander, Branson & Sponer (2008) <doi:10.1002/sim.3230>; and the Augmented Binary method by Wason & Seaman (2013) <doi:10.1002/sim.5867>; and more. We provide functions to aid model-fitting and analysis. The rstan implementations may also serve as a cookbook to anyone looking to extend or embellish these models. We hope that this package encourages the use of Bayesian methods in clinical trials. There is a preponderance of early phase trial designs because this is where Bayesian methods are used most. If there is a method you would like implemented, please get in touch.
This package provides a simple approach to using a probit or logit analysis to calculate lethal concentration (LC) or time (LT) and the appropriate fiducial confidence limits desired for selected LC or LT for ecotoxicology studies (Finney 1971; Wheeler et al. 2006; Robertson et al. 2007). The simplicity of ecotox comes from the syntax it implies within its functions which are similar to functions like glm()
and lm()
. In addition to the simplicity of the syntax, a comprehensive data frame is produced which gives the user a predicted LC or LT value for the desired level and a suite of important parameters such as fiducial confidence limits and slope. Finney, D.J. (1971, ISBN: 052108041X); Wheeler, M.W., Park, R.M., and Bailer, A.J. (2006) <doi:10.1897/05-320R.1>; Robertson, J.L., Savin, N.E., Russell, R.M., and Preisler, H.K. (2007, ISBN: 0849323312).
The Gene Ontology (GO) Consortium <https://geneontology.org/> organizes genes into hierarchical categories based on biological process (BP), molecular function (MF) and cellular component (CC, i.e., subcellular localization). Tools such as GoMiner
(see Zeeberg, B.R., Feng, W., Wang, G. et al. (2003) <doi:10.1186/gb-2003-4-4-r28>) can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Microarray studies are usually analyzed with BP, whereas proteomics researchers often prefer CC. To capture the benefit of both of those ontologies, I developed a two-dimensional version of High-Throughput GoMiner
('HTGM2D'). I generate a 2D heat map whose axes are any two of BP, MF, or CC, and the value within a picture element of the heat map reflects the Jaccard metric p-value for the number of genes in common for the corresponding pair.
This package creates realistic random trajectories in a 3-D space between two given fix points, so-called conditional empirical random walks (CERWs). The trajectory generation is based on empirical distribution functions extracted from observed trajectories (training data) and thus reflects the geometrical movement characteristics of the mover. A digital elevation model (DEM), representing the Earth's surface, and a background layer of probabilities (e.g. food sources, uplift potential, waterbodies, etc.) can be used to influence the trajectories. Unterfinger M (2018). "3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk". Master's thesis, University of Zurich. <https://www.geo.uzh.ch/dam/jcr:6194e41e-055c-4635-9807-53c5a54a3be7/MasterThesis_Unterfinger_2018.pdf>
. Technitis G, Weibel R, Kranstauber B, Safi K (2016). "An algorithm for empirically informed random trajectory generation between two endpoints". GIScience 2016: Ninth International Conference on Geographic Information Science, 9, online. <doi:10.5167/uzh-130652>.
This package provides a toolkit that allows scientists to work with data from single cell sequencing technologies such as scRNA-seq
, scVDJ-seq
, scATAC-seq
, CITE-Seq and Spatial Transcriptomics (ST). Single (i) Cell R package ('iCellR
') provides unprecedented flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, imputation, visualization, and so on. Users can design both unsupervised and supervised models to best suit their research. In addition, the toolkit provides 2D and 3D interactive visualizations, differential expression analysis, filters based on cells, genes and clusters, data merging, normalizing for dropouts, data imputation methods, correcting for batch differences, pathway analysis, tools to find marker genes for clusters and conditions, predict cell types and pseudotime analysis. See Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.05.05.078550> and Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.03.31.019109> for more details.
This package provides a weather generator to simulate precipitation and temperature for regions with seasonality. Users input training data containing precipitation, temperature, and seasonality (up to 26 seasons). Including weather season as a training variable allows users to explore the effects of potential changes in season duration as well as average start- and end-time dates due to phenomena like climate change. Data for training should be a single time series but can originate from station data, basin averages, grid cells, etc. Bearup, L., Gangopadhyay, S., & Mikkelson, K. (2021). "Hydroclimate Analysis Lower Santa Cruz River Basin Study (Technical Memorandum No ENV-2020-056)." Bureau of Reclamation. Gangopadhyay, S., Bearup, L. A., Verdin, A., Pruitt, T., Halper, E., & Shamir, E. (2019, December 1). "A collaborative stochastic weather generator for climate impacts assessment in the Lower Santa Cruz River Basin, Arizona." Fall Meeting 2019, American Geophysical Union. <https://ui.adsabs.harvard.edu/abs/2019AGUFMGC41G1267G>.
This package performs iterative extrapolation of species haplotype accumulation curves using a nonparametric stochastic (Monte Carlo) optimization method for assessment of specimen sampling completeness based on the approach of Phillips et al. (2015) <doi:10.1515/dna-2015-0008>, Phillips et al. (2019) <doi:10.1002/ece3.4757> and Phillips et al. (2020) <doi: 10.7717/peerj-cs.243>. HACSim outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species haplotypes. Any genomic marker can be targeted to assess likely required specimen sample sizes for genetic diversity assessment. The method is particularly well-suited to assess sampling sufficiency for DNA barcoding initiatives. Users can also simulate their own DNA sequences according to various models of nucleotide substitution. A Shiny app is also available.
Incorporates Approximate Bayesian Computation to get a posterior distribution and to select a model optimal parameter for an observation point. Additionally, the meta-sampling heuristic algorithm is realized for parameter estimation, which requires no model runs and is dimension-independent. A sampling scheme is also presented that allows model runs and uses the meta-sampling for point generation. A predictor is realized as the meta-sampling for the model output. All the algorithms leverage a machine learning method utilizing the maxima weighted Isolation Kernel approach, or MaxWiK
'. The method involves transforming raw data to a Hilbert space (mapping) and measuring the similarity between simulated points and the maxima weighted Isolation Kernel mapping corresponding to the observation point. Comprehensive details of the methodology can be found in the papers Iurii Nagornov (2024) <doi:10.1007/978-3-031-66431-1_16> and Iurii Nagornov (2023) <doi:10.1007/978-3-031-29168-5_18>.
Pathway Analysis is statistically linking observations on the molecular level to biological processes or pathways on the systems(i.e., organism, organ, tissue, cell) level. Traditionally, pathway analysis methods regard pathways as collections of single genes and treat all genes in a pathway as equally informative. However, this can lead to identifying spurious pathways as statistically significant since components are often shared amongst pathways. SIGORA seeks to avoid this pitfall by focusing on genes or gene pairs that are (as a combination) specific to a single pathway. In relying on such pathway gene-pair signatures (Pathway-GPS), SIGORA inherently uses the status of other genes in the experimental context to identify the most relevant pathways. The current version allows for pathway analysis of human and mouse datasets. In addition, it contains pre-computed Pathway-GPS data for pathways in the KEGG and Reactome pathway repositories and mechanisms for extracting GPS for user-supplied repositories.
This package implements the conditionally symmetric multidimensional Gaussian mixture model (csmGmm
) for large-scale testing of composite null hypotheses in genetic association applications such as mediation analysis, pleiotropy analysis, and replication analysis. In such analyses, we typically have J sets of K test statistics where K is a small number (e.g. 2 or 3) and J is large (e.g. 1 million). For each one of the J sets, we want to know if we can reject all K individual nulls. Please see the vignette for a quickstart guide. The paper describing these methods is "Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies" by Sun R, McCaw
Z, & Lin X (2024, <doi:10.1080/01621459.2024.2422124>). The paper is accepted and published online (but not yet in print) in the Journal of the American Statistical Association as of Dec 1 2024.