This package provides a multiple-testing procedure for high-dimensional mediation hypotheses. Mediation analysis is of rising interest in epidemiology and clinical trials. Among existing methods for mediation analyses, the popular joint significance (JS) test yields an overly conservative type I error rate and therefore low power. In the R package HDMT we implement a multiple-testing procedure that accurately controls the family-wise error rate (FWER) and the false discovery rate (FDR) when using JS for testing high-dimensional mediation hypotheses. The core of our procedure is based on estimating the proportions of three component null hypotheses and deriving the corresponding mixture distribution of null p-values. Results of the data examples include better-behaved quantile-quantile plots and improved detection of novel mediation relationships on the role of DNA methylation in genetic regulation of gene expression. With increasing interest in mediation by molecular intermediaries such as gene expression, the proposed method addresses an unmet methodological challenge. Methods used in the package refer to James Y. Dai, Janet L. Stanford & Michael LeBlanc
(2020) <doi:10.1080/01621459.2020.1765785>.
This package implements the covariate balancing propensity score (CBPS) proposed by Imai and Ratkovic (2014) <DOI:10.1111/rssb.12027>. The propensity score is estimated such that it maximizes the resulting covariate balance as well as the prediction of treatment assignment. The method, therefore, avoids an iteration between model fitting and balance checking. The package also implements optimal CBPS from Fan et al. (in-press) <DOI:10.1080/07350015.2021.2002159>, several extensions of the CBPS beyond the cross-sectional, binary treatment setting. They include the CBPS for longitudinal settings so that it can be used in conjunction with marginal structural models from Imai and Ratkovic (2015) <DOI:10.1080/01621459.2014.956872>, treatments with three- and four-valued treatment variables, continuous-valued treatments from Fong, Hazlett, and Imai (2018) <DOI:10.1214/17-AOAS1101>, propensity score estimation with a large number of covariates from Ning, Peng, and Imai (2020) <DOI:10.1093/biomet/asaa020>, and the situation with multiple distinct binary treatments administered simultaneously. In the future it will be extended to other settings including the generalization of experimental and instrumental variable estimates.
Evaluates the probability density function, cumulative distribution function, quantile function, random numbers, survival function, hazard rate function, and maximum likelihood estimates for the following distributions: Bell exponential, Bell extended exponential, Bell Weibull, Bell extended Weibull, Bell-Fisk, Bell-Lomax, Bell Burr-XII, Bell Burr-X, complementary Bell exponential, complementary Bell extended exponential, complementary Bell Weibull, complementary Bell extended Weibull, complementary Bell-Fisk, complementary Bell-Lomax, complementary Bell Burr-XII and complementary Bell Burr-X distribution. Related work includes: a) Fayomi A., Tahir M. H., Algarni A., Imran M. and Jamal F. (2022). "A new useful exponential model with applications to quality control and actuarial data". Computational Intelligence and Neuroscience, 2022. <doi:10.1155/2022/2489998>. b) Alanzi, A. R., Imran M., Tahir M. H., Chesneau C., Jamal F. Shakoor S. and Sami, W. (2023). "Simulation analysis, properties and applications on a new Burr XII model based on the Bell-X functionalities". AIMS Mathematics, 8(3): 6970-7004. <doi:10.3934/math.2023352>. c) Algarni A. (2022). "Group Acceptance Sampling Plan Based on New Compounded Three-Parameter Weibull Model". Axioms, 11(9): 438. <doi:10.3390/axioms11090438>.
Determination of absolute protein quantities is necessary for multiple applications, such as mechanistic modeling of biological systems. Quantitative liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics can measure relative protein abundance on a system-wide scale. To estimate absolute quantitative information using these relative abundance measurements requires additional information such as heavy-labeled references of known concentration. Multiple methods have been using different references and strategies; some are easily available whereas others require more effort on the users end. Hence, we believe the field might benefit from making some of these methods available under an automated framework, which also facilitates validation of the chosen strategy. We have implemented the most commonly used absolute label-free protein abundance estimation methods for LC-MS/MS modes quantifying on either MS1-, MS2-levels or spectral counts together with validation algorithms to enable automated data analysis and error estimation. Specifically, we used Monte-carlo cross-validation and bootstrapping for model selection and imputation of proteome-wide absolute protein quantity estimation. Our open-source software is written in the statistical programming language R and validated and demonstrated on a synthetic sample.
Species Distribution Modeling (SDM) is a practical methodology that aims to estimate the area of distribution of a species. However, most of the work has focused on estimating static expressions of the correlation between environmental variables. The outputs of correlative species distribution models can be interpreted as maps of the suitable environment for a species but not generally as maps of its actual distribution. Soberón and Peterson (2005) <doi:10.17161/bi.v2i0.4> presented the BAM scheme, a heuristic framework that states that the occupied area of a species occurs on sites that have been accessible through dispersal (M) and have both favorable biotic (B) and abiotic conditions (A). The bamm package implements classes and functions to operate on each element of the BAM and by using a cellular automata model where the occupied area of a species at time t is estimated by the multiplication of three binary matrices: one matrix represents movements (M), another abiotic -niche- tolerances (A), and a third, biotic interactions (B). The theoretical background of the package can be found in Soberón and Osorio-Olvera (2023) <doi:10.1111/jbi.14587>.
Supporting functionality to run caret with spatial or spatial-temporal data. caret is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using caret'. It includes the newly suggested Nearest neighbor distance matching cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>
.
Measure of the Effect ('MOTE') is an effect size calculator, including a wide variety of effect sizes in the mean differences family (all versions of d) and the variance overlap family (eta, omega, epsilon, r). MOTE provides non-central confidence intervals for each effect size, relevant test statistics, and output for reporting in APA Style (American Psychological Association, 2010, <ISBN:1433805618>) with LaTeX
'. In research, an over-reliance on p-values may conceal the fact that a study is under-powered (Halsey, Curran-Everett, Vowler, & Drummond, 2015 <doi:10.1038/nmeth.3288>). A test may be statistically significant, yet practically inconsequential (Fritz, Scherndl, & Kühberger, 2012 <doi:10.1177/0959354312436870>). Although the American Psychological Association has long advocated for the inclusion of effect sizes (Wilkinson & American Psychological Association Task Force on Statistical Inference, 1999 <doi:10.1037/0003-066X.54.8.594>), the vast majority of peer-reviewed, published academic studies stop short of reporting effect sizes and confidence intervals (Cumming, 2013, <doi:10.1177/0956797613504966>). MOTE simplifies the use and interpretation of effect sizes and confidence intervals. For more information, visit <https://www.aggieerin.com/shiny-server>.
This package provides functions that support estimating, assessing and mapping regional disaggregated indicators. So far, estimation methods comprise direct estimation, the model-based unit-level approach Empirical Best Prediction (see "Small area estimation of poverty indicators" by Molina and Rao (2010) <doi:10.1002/cjs.10051>), the area-level model (see "Estimates of income for small places: An application of James-Stein procedures to Census Data" by Fay and Herriot (1979) <doi:10.1080/01621459.1979.10482505>) and various extensions of it (adjusted variance estimation methods, log and arcsin transformation, spatial, robust and measurement error models), as well as their precision estimates. The assessment of the used model is supported by a summary and diagnostic plots. For a suitable presentation of estimates, map plots can be easily created. Furthermore, results can easily be exported to excel. For a detailed description of the package and the methods used see "The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators" by Kreutzmann et al. (2019) <doi:10.18637/jss.v091.i07> and the second package vignette "A Framework for Producing Small Area Estimates Based on Area-Level Models in R".
In practice, it is difficult to determine the number of decomposition modes, K, for Variational Mode Decomposition (VMD). To overcome this issue, this study offers Spearman Variational Mode Decomposition (SVMD), a method that uses the Spearman correlation coefficient to calculate the ideal mode number. Unlike the Pearson correlation coefficient, which only returns a perfect value when X and Y are linearly connected, the Spearman correlation can be calculated without knowing the probability distributions of X and Y. The Spearman correlation coefficient, also called Spearman's rank correlation coefficient, is a subset of a wider correlation coefficient. As VMD decomposes a signal, the Spearman correlation coefficient between the reconstructed and original sequences rises as the mode number K increases. Once the signal has been fully decomposed, subsequent increases in K cause the correlation to gradually level off. When the correlation reaches a specific level, VMD is said to have adequately decomposed the signal. Numerous experiments revealed that a threshold of 0.997 produces the best denoising effect, so the threshold is set at 0.997. This package has been developed using concept of Yang et al. (2021)<doi:10.1016/j.aej.2021.01.055>.
Abstract of Manuscript. Differential gene expression analysis using RNA sequencing (RNA-seq) data is a standard approach for making biological discoveries. Ongoing large-scale efforts to process and normalize publicly available gene expression data enable rapid and systematic reanalysis. While several powerful tools systematically process RNA-seq data, enabling their reanalysis, few resources systematically recompute differentially expressed genes (DEGs) generated from individual studies. We developed a robust differential expression analysis pipeline to recompute 3162 human DEG lists from The Cancer Genome Atlas, Genotype-Tissue Expression Consortium, and 142 studies within the Sequence Read Archive. After measuring the accuracy of the recomputed DEG lists, we built the Differential Expression Enrichment Tool (DEET), which enables users to interact with the recomputed DEG lists. DEET, available through CRAN and RShiny, systematically queries which of the recomputed DEG lists share similar genes, pathways, and TF targets to their own gene lists. DEET identifies relevant studies based on shared results with the userâ s gene lists, aiding in hypothesis generation and data-driven literature review. Sokolowski, Dustin J., et al. "Differential Expression Enrichment Tool (DEET): an interactive atlas of human differential gene expression." Nucleic Acids Research Genomics and Bioinformatics (2023).
Intensive longitudinal data have become increasingly prevalent in various scientific disciplines. Many such data sets are noisy, multivariate, and multi-subject in nature. The change functions may also be continuous, or continuous but interspersed with periods of discontinuities (i.e., showing regime switches). The package dynr (Dynamic Modeling in R) is an R package that implements a set of computationally efficient algorithms for handling a broad class of linear and nonlinear discrete- and continuous-time models with regime-switching properties under the constraint of linear Gaussian measurement functions. The discrete-time models can generally take on the form of a state-space or difference equation model. The continuous-time models are generally expressed as a set of ordinary or stochastic differential equations. All estimation and computations are performed in C, but users are provided with the option to specify the model of interest via a set of simple and easy-to-learn model specification functions in R. Model fitting can be performed using single-subject time series data or multiple-subject longitudinal data. Ou, Hunter, & Chow (2019) <doi:10.32614%2FRJ-2019-012> provided a detailed introduction to the interface and more information on the algorithms.
Genetic predisposition for complex traits is often manifested through multiple tissues of interest at different time points in the development. As an example, the genetic predisposition for obesity could be manifested through inherited variants that control metabolism through regulation of genes expressed in the brain and/or through the control of fat storage in the adipose tissue by dysregulation of genes expressed in adipose tissue. We present a method eGST
(eQTL-based
genetic subtyper) that integrates tissue-specific eQTLs
with GWAS data for a complex trait to probabilistically assign a tissue of interest to the phenotype of each individual in the study. eGST
estimates the posterior probability that an individual's phenotype can be assigned to a tissue based on individual-level genotype data of tissue-specific eQTLs
and marginal phenotype data in a genome-wide association study (GWAS) cohort. Under a Bayesian framework of mixture model, eGST
employs a maximum a posteriori (MAP) expectation-maximization (EM) algorithm to estimate the tissue-specific posterior probability across individuals. Methodology is available from: A Majumdar, C Giambartolomei, N Cai, MK Freund, T Haldar, T Schwarz, J Flint, B Pasaniuc (2019) <doi:10.1101/674226>.
In many studies across different disciplines, detailed measures of the variables of interest are available. If assumptions can be made regarding the direction of effects between the assessed variables, this has to be considered in the analysis. The functions in this package implement the novel approach CIEE (causal inference using estimating equations; Konigorski et al., 2018, <DOI:10.1002/gepi.22107>) for estimating and testing the direct effect of an exposure variable on a primary outcome, while adjusting for indirect effects of the exposure on the primary outcome through a secondary intermediate outcome and potential factors influencing the secondary outcome. The underlying directed acyclic graph (DAG) of this considered model is described in the vignette. CIEE can be applied to studies in many different fields, and it is implemented here for the analysis of a continuous primary outcome and a time-to-event primary outcome subject to censoring. CIEE uses estimating equations to obtain estimates of the direct effect and robust sandwich standard error estimates. Then, a large-sample Wald-type test statistic is computed for testing the absence of the direct effect. Additionally, standard multiple regression, regression of residuals, and the structural equation modeling approach are implemented for comparison.
This function takes a vector or matrix of data and smooths the data with an improved Savitzky Golay transform. The Savitzky-Golay method for data smoothing and differentiation calculates convolution weights using Gram polynomials that exactly reproduce the results of least-squares polynomial regression. Use of the Savitzky-Golay method requires specification of both filter length and polynomial degree to calculate convolution weights. For maximum smoothing of statistical noise in data, polynomials with low degrees are desirable, while a high polynomial degree is necessary for accurate reproduction of peaks in the data. Extension of the least-squares regression formalism with statistical testing of additional terms of polynomial degree to a heuristically chosen minimum for each data window leads to an adaptive-degree polynomial filter (ADPF). Based on noise reduction for data that consist of pure noise and on signal reproduction for data that is purely signal, ADPF performed nearly as well as the optimally chosen fixed-degree Savitzky-Golay filter and outperformed sub-optimally chosen Savitzky-Golay filters. For synthetic data consisting of noise and signal, ADPF outperformed both optimally chosen and sub-optimally chosen fixed-degree Savitzky-Golay filters. See Barak, P. (1995) <doi:10.1021/ac00113a006> for more information.
Commodity pricing models are (systems of) stochastic differential equations that are utilized for the valuation and hedging of commodity contingent claims (i.e. derivative products on the commodity) and other commodity related investments. Commodity pricing models that capture market dynamics are of great importance to commodity market participants in order to exercise sound investment and risk-management strategies. Parameters of commodity pricing models are estimated through maximum likelihood estimation, using available term structure futures data of a commodity. NFCP (n-factor commodity pricing) provides a framework for the modeling, parameter estimation, probabilistic forecasting, option valuation and simulation of commodity prices through state space and Monte Carlo methods, risk-neutral valuation and Kalman filtering. NFCP allows the commodity pricing model to consist of n correlated factors, with both random walk and mean-reverting elements. The n-factor commodity pricing model framework was first presented in the work of Cortazar and Naranjo (2006) <doi:10.1002/fut.20198>. Examples presented in NFCP replicate the two-factor crude oil commodity pricing model presented in the prolific work of Schwartz and Smith (2000) <doi:10.1287/mnsc.46.7.893.12034> with the approximate term structure futures data applied within this study provided in the NFCP package.
Provide functions to estimate the coefficients in high-dimensional linear regressions via a tuning-free and robust approach. The method was published in Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020), "A Tuning-free Robust and Efficient Approach to High-dimensional Regression", Journal of the American Statistical Association, 115:532, 1700-1714(JASAâ s discussion paper), <doi:10.1080/01621459.2020.1840989>. See also Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020), "Rejoinder to â A tuning-free robust and efficient approach to high-dimensional regression". Journal of the American Statistical Association, 115, 1726-1729, <doi:10.1080/01621459.2020.1843865>; Peng, B. and Wang, L. (2015), "An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression", Journal of Computational and Graphical Statistics, 24:3, 676-694, <doi:10.1080/10618600.2014.913516>; Clémençon, S., Colin, I., and Bellet, A. (2016), "Scaling-up empirical risk minimization: optimization of incomplete u-statistics", The Journal of Machine Learning Research, 17(1):2682â 2717; Fan, J. and Li, R. (2001), "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties", Journal of the American Statistical Association, 96:456, 1348-1360, <doi:10.1198/016214501753382273>.
Fit unidimensional item response theory (IRT) models to test data, which includes both dichotomous and polytomous items, calibrate pretest item parameters, estimate examinees abilities, and examine the IRT model-data fit on item-level in different ways as well as provide useful functions related to IRT analyses such as IRT model-data fit evaluation and differential item functioning analysis. The bring.flexmirt()
and write.flexmirt()
functions were written by modifying the read.flexmirt()
function (Pritikin & Falk (2022) <doi:10.1177/0146621620929431>). The bring.bilog()
and bring.parscale()
functions were written by modifying the read.bilog()
and read.parscale()
functions, respectively (Weeks (2010) <doi:10.18637/jss.v035.i12>). The bisection()
function was written by modifying the bisection()
function (Howard (2017, ISBN:9780367657918)). The code of the inverse test characteristic curve scoring in the est_score()
function was written by modifying the irt.eq.tse()
function (González (2014) <doi:10.18637/jss.v059.i07>). In est_score()
function, the code of weighted likelihood estimation method was written by referring to the Pi()
, Ji()
, and Ii()
functions of the catR
package (Magis & Barrada (2017) <doi:10.18637/jss.v076.c01>).
The Variable Infiltration Capacity (VIC) model is a macroscale hydrologic model that solves full water and energy balances, originally developed by Xu Liang at the University of Washington (UW). The version of VIC source code used is of 5.0.1 on <https://github.com/UW-Hydro/VIC/>, see Hamman et al. (2018). Development and maintenance of the current official version of the VIC model at present is led by the UW Hydro (Computational Hydrology group) in the Department of Civil and Environmental Engineering at UW. VIC is a research model and in its various forms it has been applied to most of the major river basins around the world, as well as globally <http://vic.readthedocs.io/en/master/Documentation/References/>. References: "Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges (1994), A simple hydrologically based model of land surface water and energy fluxes for general circulation models, J. Geophys. Res., 99(D7), 14415-14428, <doi:10.1029/94JD00483>"; "Hamman, J. J., Nijssen, B., Bohn, T. J., Gergel, D. R., and Mao, Y. (2018), The Variable Infiltration Capacity model version 5 (VIC-5): infrastructure improvements for new applications and reproducibility, Geosci. Model Dev., 11, 3481-3496, <doi:10.5194/gmd-11-3481-2018>".
Fits relative survival regression models with or without proportional excess hazards and with the additional possibility to correct for background mortality by one or more parameter(s). These models are relevant when the observed mortality in the studied group is not comparable to that of the general population or in population-based studies where the available life tables used for net survival estimation are insufficiently stratified. In the latter case, the proposed model by Touraine et al. (2020) <doi:10.1177/0962280218823234> can be used. The user can also fit a model that relaxes the proportional expected hazards assumption considered in the Touraine et al. excess hazard model. This extension was proposed by Mba et al. (2020) <doi:10.1186/s12874-020-01139-z> to allow non-proportional effects of the additional variable on the general population mortality. In non-population-based studies, researchers can identify non-comparability source of bias in terms of expected mortality of selected individuals. An excess hazard model correcting this selection bias is presented in Goungounga et al. (2019) <doi:10.1186/s12874-019-0747-3>. This class of model with a random effect at the cluster level on excess hazard is presented in Goungounga et al. (2023) <doi:10.1002/bimj.202100210>.
This package provides a tool that allows users to generate various indices for evaluating statistical models. The fitstat()
function computes indices based on the fitting data. The valstat()
function computes indices based on the validation data set. Both fitstat()
and valstat()
will return 16 indices SSR: residual sum of squares, TRE: total relative error, Bias: mean bias, MRB: mean relative bias, MAB: mean absolute bias, MAPE: mean absolute percentage error, MSE: mean squared error, RMSE: root mean square error, Percent.RMSE: percentage root mean squared error, R2: coefficient of determination, R2adj: adjusted coefficient of determination, APC: Amemiya's prediction criterion, logL
: Log-likelihood, AIC: Akaike information criterion, AICc: corrected Akaike information criterion, BIC: Bayesian information criterion, HQC: Hannan-Quin information criterion. The lower the better for the SSR, TRE, Bias, MRB, MAB, MAPE, MSE, RMSE, Percent.RMSE, APC, AIC, AICc, BIC and HQC indices. The higher the better for R2 and R2adj indices. Petre Stoica, P., Selén, Y. (2004) <doi:10.1109/MSP.2004.1311138>\n Zhou et al. (2023) <doi:10.3389/fpls.2023.1186250>\n Ogana, F.N., Ercanli, I. (2021) <doi:10.1007/s11676-021-01373-1>\n Musabbikhah et al. (2019) <doi:10.1088/1742-6596/1175/1/012270>.
This package provides a multi-visit clinical trial may collect participant responses on an ordinal scale and may utilize a stratified design, such as randomization within centers, to assess treatment efficacy across multiple visits. Baseline characteristics may be strongly associated with the outcome, and adjustment for them can improve power. The win ratio (ignores ties) and the win odds (accounts for ties) can be useful when analyzing these types of data from randomized controlled trials. This package provides straightforward functions for adjustment of the win ratio and win odds for stratification and baseline covariates, facilitating the comparison of test and control treatments in multi-visit clinical trials. For additional information concerning the methodologies and applied examples within this package, please refer to the following publications: 1. Weideman, A.M.K., Kowalewski, E.K., & Koch, G.G. (2024). â Randomization-based covariance adjustment of win ratios and win odds for randomized multi-visit studies with ordinal outcomes.â Journal of Statistical Research, 58(1), 33â 48. <doi:10.3329/jsr.v58i1.75411>. 2. Kowalewski, E.K., Weideman, A.M.K., & Koch, G.G. (2023). â SAS macro for randomization-based methods for covariance and stratified adjustment of win ratios and win odds for ordinal outcomes.â SESUG 2023 Proceedings, Paper 139-2023.
Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>
, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.
The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). Paper: Albrecht, Marco, et al. (2017)<DOI:10.1186/s12859-016-1440-8>.
The tsgc package provides comprehensive tools for the analysis and forecasting of epidemic trajectories. It is designed to model the progression of an epidemic over time while accounting for the various uncertainties inherent in real-time data. Underpinned by a dynamic Gompertz model, the package adopts a state space approach, using the Kalman filter for flexible and robust estimation of the non-linear growth pattern commonly observed in epidemic data. The reinitialization feature enhances the modelâ s ability to adapt to the emergence of new waves. The forecasts generated by the package are of value to public health officials and researchers who need to understand and predict the course of an epidemic to inform decision-making. Beyond its application in public health, the package is also a useful resource for researchers and practitioners in fields where the trajectories of interest resemble those of epidemics, such as innovation diffusion. The package includes functionalities for data preprocessing, model fitting, and forecast visualization, as well as tools for evaluating forecast accuracy. The core methodologies implemented in tsgc are based on well-established statistical techniques as described in Harvey and Kattuman (2020) <doi:10.1162/99608f92.828f40de>, Harvey and Kattuman (2021) <doi:10.1098/rsif.2021.0179>, and Ashby, Harvey, Kattuman, and Thamotheram (2024) <https://www.jbs.cam.ac.uk/wp-content/uploads/2024/03/cchle-tsgc-paper-2024.pdf>.