Univariate feature selection and compound covariate methods under the Cox model with high-dimensional features (e.g., gene expressions). Available are survival data for non-small-cell lung cancer patients with gene expressions (Chen et al 2007 New Engl J Med) <DOI:10.1056/NEJMoa060096>, statistical methods in Emura et al (2012 PLoS ONE) <DOI:10.1371/journal.pone.0047627>, Emura & Chen (2016 Stat Methods Med Res) <DOI:10.1177/0962280214533378>, and Emura et al (2019)<DOI:10.1016/j.cmpb.2018.10.020>. Algorithms for generating correlated gene expressions are also available. Estimation of survival functions via copula-graphic (CG) estimators is also implemented, which is useful for sensitivity analyses under dependent censoring (Yeh et al 2023 Biomedicines) <DOI:10.3390/biomedicines11030797> and factorial survival analyses (Emura et al 2024 Stat Methods Med Res) <DOI:10.1177/09622802231215805>.
If results from a meta-GWAS are used for validation in one of the cohorts that was included in the meta-analysis, this will yield biased (i.e. too optimistic) results. The validation cohort needs to be independent from the meta-Genome-Wide-Association-Study (meta-GWAS) results. MetaSubtract will subtract the results of the respective cohort from the meta-GWAS results analytically without having to redo the meta-GWAS analysis using the leave-one-out methodology. It can handle different meta-analyses methods and takes into account if single or double genomic control correction was applied to the original meta-analysis. It can also handle different meta-analysis methods. It can be used for whole GWAS, but also for a limited set of genetic markers. See for application: Nolte I.M. et al. (2017); <doi: 10.1038/ejhg.2017.50>.
Plan optimal sample size allocation and go/no-go decision rules for phase II/III drug development programs with time-to-event, binary or normally distributed endpoints when assuming fixed treatment effects or a prior distribution for the treatment effect, using methods from Kirchner et al. (2016) <doi:10.1002/sim.6624> and Preussler (2020). Optimal is in the sense of maximal expected utility, where the utility is a function taking into account the expected cost and benefit of the program. It is possible to extend to more complex settings with bias correction (Preussler S et al. (2020) <doi:10.1186/s12874-020-01093-w>), multiple phase III trials (Preussler et al. (2019) <doi:10.1002/bimj.201700241>), multi-arm trials (Preussler et al. (2019) <doi:10.1080/19466315.2019.1702092>), and multiple endpoints (Kieser et al. (2018) <doi:10.1002/pst.1861>).
This package implements novel nonparametric approaches to address biases and confounding when comparing treatments or exposures in observational studies of outcomes. While designed and appropriate for use in studies involving medicine and the life sciences, the package can be used in other situations involving outcomes with multiple confounders. The package implements a family of methods for non-parametric bias correction when comparing treatments in observational studies, including survival analysis settings, where competing risks and/or censoring may be present. The approach extends to bias-corrected personalized predictions of treatment outcome differences, and analysis of heterogeneity of treatment effect-sizes across patient subgroups. For further details, please see: Lauve NR, Nelson SJ, Young SS, Obenchain RL, Lambert CG. LocalControl: An R Package for Comparative Safety and Effectiveness Research. Journal of Statistical Software. 2020. p. 1â 32. Available from <doi:10.18637/jss.v096.i04>.
Function and data sets in the book entitled "Nonlinear Time Series Analysis with R Applications" B.Guris (2020). The book will be published in Turkish and the original name of this book will be "R Uygulamali Dogrusal Olmayan Zaman Serileri Analizi". It is possible to perform nonlinearity tests, nonlinear unit root tests, nonlinear cointegration tests and estimate nonlinear error correction models by using the functions written in this package. The Momentum Threshold Autoregressive (MTAR), the Smooth Threshold Autoregressive (STAR) and the Self Exciting Threshold Autoregressive (SETAR) type unit root tests can be performed using the functions written. In addition, cointegration tests using the Momentum Threshold Autoregressive (MTAR), the Smooth Threshold Autoregressive (STAR) and the Self Exciting Threshold Autoregressive (SETAR) models can be applied. It is possible to estimate nonlinear error correction models. The Granger causality test performed using nonlinear models can also be applied.
This package provides methods for frontier analysis, Data Envelopment Analysis (DEA), under different technology assumptions (fdh, vrs, drs, crs, irs, add/frh, and fdh+), and using different efficiency measures (input based, output based, hyperbolic graph, additive, super, and directional efficiency). Peers and slacks are available, partial price information can be included, and optimal cost, revenue and profit can be calculated. Evaluation of mergers is also supported. Methods for graphing the technology sets are also included. There is also support for comparative methods based on Stochastic Frontier Analyses (SFA) and for convex nonparametric least squares of convex functions (STONED). In general, the methods can be used to solve not only standard models, but also many other model variants. It complements the book, Bogetoft and Otto, Benchmarking with DEA, SFA, and R, Springer-Verlag, 2011, but can of course also be used as a stand-alone package.
Includes a collection of functions presented in "Measuring stability in ecological systems without static equilibria" by Clark et al. (2022) <doi:10.1002/ecs2.4328> in Ecosphere. These can be used to estimate the parameters of a stochastic state space model (i.e. a model where a time series is observed with error). The goal of this package is to estimate the variability around a deterministic process, both in terms of observation error - i.e. variability due to imperfect observations that does not influence system state - and in terms of process noise - i.e. stochastic variation in the actual state of the process. Unlike classical methods for estimating variability, this package does not necessarily assume that the deterministic state is fixed (i.e. a fixed-point equilibrium), meaning that variability around a dynamic trajectory can be estimated (e.g. stochastic fluctuations during predator-prey dynamics).
The BayesDLMfMRI package performs statistical analysis for task-based functional magnetic resonance imaging (fMRI) data at both individual and group levels. The analysis to detect brain activation at the individual level is based on modeling the fMRI signal using Matrix-Variate Dynamic Linear Models (MDLM). The analysis for the group stage is based on posterior distributions of the state parameter obtained from the modeling at the individual level. In this way, this package offers several R functions with different algorithms to perform inference on the state parameter to assess brain activation for both individual and group stages. Those functions allow for parallel computation when the analysis is performed for the entire brain as well as analysis at specific voxels when it is required. References: Cardona-Jiménez (2021) <doi:10.1016/j.csda.2021.107297>; Cardona-Jiménez (2021) <arXiv:2111.01318>.
The TissueEnrich package is used to calculate enrichment of tissue-specific genes in a set of input genes. For example, the user can input the most highly expressed genes from RNA-Seq data, or gene co-expression modules to determine which tissue-specific genes are enriched in those datasets. Tissue-specific genes were defined by processing RNA-Seq data from the Human Protein Atlas (HPA) (Uhlén et al. 2015), GTEx (Ardlie et al. 2015), and mouse ENCODE (Shen et al. 2012) using the algorithm from the HPA (Uhlén et al. 2015).The hypergeometric test is being used to determine if the tissue-specific genes are enriched among the input genes. Along with tissue-specific gene enrichment, the TissueEnrich package can also be used to define tissue-specific genes from expression datasets provided by the user, which can then be used to calculate tissue-specific gene enrichments.
Plot idiograms of karyotypes, plasmids, circular chr. having a set of data.frames for chromosome data and optionally mark data. Two styles of chromosomes can be used: without or with visible chromatids. Supports micrometers, cM and Mb or any unit. Three styles of centromeres are available: triangle, rounded and inProtein; and six styles of marks are available: square (squareLeft), dots, cM (cMLeft), cenStyle, upArrow (downArrow), exProtein (inProtein); its legend (label) can be drawn inline or to the right of karyotypes. Idiograms can also be plotted in concentric circles. It is possible to calculate chromosome indices by Levan et al. (1964) <doi:10.1111/j.1601-5223.1964.tb01953.x>, karyotype indices of Watanabe et al. (1999) <doi:10.1007/PL00013869> and Romero-Zarco (1986) <doi:10.2307/1221906> and classify chromosomes by morphology Guerra (1986) and Levan et al. (1964).
This package provides a compilation of functions designed to assist users on the correlation analysis of crop yield and soil test values. Functions to estimate crop response patterns to soil nutrient availability and critical soil test values using various approaches such as: 1) the modified arcsine-log calibration curve (Correndo et al. (2017) <doi:10.1071/CP16444>); 2) the graphical Cate-Nelson quadrants analysis (Cate & Nelson (1965)), 3) the statistical Cate-Nelson quadrants analysis (Cate & Nelson (1971) <doi:10.2136/sssaj1971.03615995003500040048x>), 4) the linear-plateau regression (Anderson & Nelson (1975) <doi:10.2307/2529422>), 5) the quadratic-plateau regression (Bullock & Bullock (1994) <doi:10.2134/agronj1994.00021962008600010033x>), and 6) the Mitscherlich-type exponential regression (Melsted & Peck (1977) <doi:10.2134/asaspecpub29.c1>). The package development stemmed from ongoing work with the Fertilizer Recommendation Support Tool (FRST) and Feed the Future Innovation Lab for Collaborative Research on Sustainable Intensification (SIIL) projects.
All the seeds do not germinate at a single point in time due to physiological mechanisms determined by temperature which vary among individual seeds in the population. Seeds germinate by following accumulation of thermal time in degree days/hours, quantified by multiplying the time of germination with excess of base temperature required by each seed for its germination, which follows log-normal distribution. The theoretical germination course can be obtained by regressing the rate of germination at various fractions against temperature (Garcia et al., 1982), where the fraction-wise regression lines intersect the temperature axis at base temperature and the methodology of determining optimum base temperature has been described by Ellis et al. (1987). This package helps to find the base temperature of seed germination using algorithms of Garcia et al. (1982) and Ellis et al. (1982) <doi:10.1093/JXB/38.6.1033> <doi:10.1093/jxb/33.2.288>.
This package provides a standard test is observed on all specimens. We treat the second test (or sampled test) as being conducted on only a stratified sample of specimens. Verification Bias is this situation when the specimens for doing the second (sampled) test is not under investigator control. We treat the total sample as stratified two-phase sampling and use inverse probability weighting. We estimate diagnostic accuracy (category-specific classification probabilities; for binary tests reduces to specificity and sensitivity, and also predictive values) and agreement statistics (percent agreement, percent agreement by category, Kappa (unweighted), Kappa (quadratic weighted) and symmetry tests (reduces to McNemar's test for binary tests)). See: Katki HA, Li Y, Edelstein DW, Castle PE. Estimating the agreement and diagnostic accuracy of two diagnostic tests when one test is conducted on only a subsample of specimens. Stat Med. 2012 Feb 28; 31(5) <doi:10.1002/sim.4422>.
Single cell resolution data has been valuable in learning about tissue microenvironments and interactions between cells or spots. This package allows for the simulation of this level of data, be it single cell or â spotsâ , in both a univariate (single metric or cell type) and bivariate (2 or more metrics or cell types) ways. As more technologies come to marker, more methods will be developed to derive spatial metrics from the data which will require a way to benchmark methods against each other. Additionally, as the field currently stands, there is not a gold standard method to be compared against. We set out to develop an R package that will allow users to simulate point patterns that can be biologically informed from different tissue domains, holes, and varying degrees of clustering/colocalization. The data can be exported as spatial files and a summary file (like HALO'). <https://github.com/FridleyLab/scSpatialSIM/>.
This package provides functions to prepare rankings data and fit the Plackett-Luce model jointly attributed to Plackett (1975) <doi:10.2307/2346567> and Luce (1959, ISBN:0486441369). The standard Plackett-Luce model is generalized to accommodate ties of any order in the ranking. Partial rankings, in which only a subset of items are ranked in each ranking, are also accommodated in the implementation. Disconnected/weakly connected networks implied by the rankings may be handled by adding pseudo-rankings with a hypothetical item. Optionally, a multivariate normal prior may be set on the log-worth parameters and ranker reliabilities may be incorporated as proposed by Raman and Joachims (2014) <doi:10.1145/2623330.2623654>. Maximum a posteriori estimation is used when priors are set. Methods are provided to estimate standard errors or quasi-standard errors for inference as well as to fit Plackett-Luce trees. See the package website or vignette for further details.
Simultaneous clustering of rows and columns, usually designated by biclustering, co-clustering or block clustering, is an important technique in two way data analysis. It consists of estimating a mixture model which takes into account the block clustering problem on both the individual and variables sets. The blockcluster package provides a bridge between the C++ core library build on top of the STK++ library, and the R statistical computing environment. This package allows to co-cluster binary <doi:10.1016/j.csda.2007.09.007>, contingency <doi:10.1080/03610920903140197>, continuous <doi:10.1007/s11634-013-0161-3> and categorical data-sets <doi:10.1007/s11222-014-9472-2>. It also provides utility functions to visualize the results. This package may be useful for various applications in fields of Data mining, Information retrieval, Biology, computer vision and many more. More information about the project and comprehensive tutorial can be found on the link mentioned in URL.
Evaluates the probability density function (PDF), cumulative distribution function (CDF), quantile function (QF), random numbers and maximum likelihood estimates (MLEs) of well-known complementary binomial-G, complementary negative binomial-G and complementary geometric-G families of distributions taking baseline models such as exponential, extended exponential, Weibull, extended Weibull, Fisk, Lomax, Burr-XII and Burr-X. The functions also allow computing the goodness-of-fit measures namely the Akaike-information-criterion (AIC), the Bayesian-information-criterion (BIC), the minimum value of the negative log-likelihood (-2L) function, Anderson-Darling (A) test, Cramer-Von-Mises (W) test, Kolmogorov-Smirnov test, P-value and convergence status. Moreover, some commonly used data sets from the fields of actuarial, reliability, and medical science are also provided. Related works include: a) Tahir, M. H., & Cordeiro, G. M. (2016). Compounding of distributions: a survey and new generalized classes. Journal of Statistical Distributions and Applications, 3, 1-35. <doi:10.1186/s40488-016-0052-1>.
CRAN packages DoE.base and Rmosek and non-'CRAN package gurobi are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. The package writes MPS (Mathematical Programming System) files for use with any mixed integer optimization software that can process such files. If at least one of the commercial products Gurobi or Mosek (free academic licenses available for both) is available, the package also creates arrays by optimization. For installing Gurobi and its R package gurobi', follow instructions at <https://support.gurobi.com/hc/en-us/articles/14462206790033-How-do-I-install-Gurobi-for-R>. For installing Mosek and its R package Rmosek', follow instructions at <https://www.mosek.com/downloads/> and <https://docs.mosek.com/8.1/rmosek/install-interface.html>, or use the functionality in the stump CRAN R package Rmosek'.
This package provides covariate-adjusted comparison of two groups of right censored data, where the binary group variable has separate short-term and long-term effects on the hazard function, while effects of covariates such as age, blood pressure, etc. are proportional on the hazard. The model was studied in Yang and Prentice (2015) <doi:10.1002/sim.6453> and it extends the two sample version of the short-term and long-term hazard ratio model proposed in Yang and Prentice (2005) <doi:10.1093/biomet/92.1.1>. The model extends the usual Cox proportional hazards model to allow more flexible hazard ratio patterns, such as gradual onset of effect, diminishing effect, and crossing hazard or survival functions. This package provides the following: 1) point estimates and confidence intervals for model parameters; 2) point estimate and confidence interval of the average hazard ratio; and 3) plots of estimated hazard ratio function with point-wise and simultaneous confidence bands.
Estimates and plots (as a single plot and as a heat map) the rolling window correlation coefficients between two time series and computes their statistical significance, which is carried out through a non-parametric computing-intensive method. This method addresses the effects due to the multiple testing (inflation of the Type I error) when the statistical significance is estimated for the rolling window correlation coefficients. The method is based on Monte Carlo simulations by permuting one of the variables (e.g., the dependent) under analysis and keeping fixed the other variable (e.g., the independent). We improve the computational efficiency of this method to reduce the computation time through parallel computing. The NonParRolCor package also provides examples with synthetic and real-life environmental time series to exemplify its use. Methods derived from R. Telford (2013) <https://quantpalaeo.wordpress.com/2013/01/04/> and J.M. Polanco-Martinez and J.L. Lopez-Martinez (2021) <doi:10.1016/j.ecoinf.2021.101379>.
Helps a clinical trial team discuss the clinical goals of a well-defined biomarker with a diagnostic, staging, prognostic, or predictive purpose. From this discussion will come a statistical plan for a (non-randomized) validation trial. Both prospective and retrospective trials are supported. In a specific focused discussion, investigators should determine the range of "discomfort" for the NNT, number needed to treat. The meaning of the discomfort range, [NNTlower, NNTupper], is that within this range most physicians would feel discomfort either in treating or withholding treatment. A pair of NNT values bracketing that range, NNTpos and NNTneg, become the targets of the study's design. If the trial can demonstrate that a positive biomarker test yields an NNT less than NNTlower, and that a negative biomarker test yields an NNT less than NNTlower, then the biomarker may be useful for patients. A highlight of the package is visualization of a "contra-Bayes" theorem, which produces criteria for retrospective case-controls studies.
This package provides support for automation and visualization of flow cytometry data analysis pipelines. In the current state, the package focuses on the preprocessing and quality control part. The framework is based on two main S4 classes, i.e. CytoPipeline and CytoProcessingStep. The pipeline steps are linked to corresponding R functions - that are either provided in the CytoPipeline package itself, or exported from a third party package, or coded by the user her/himself. The processing steps need to be specified centrally and explicitly using either a json input file or through step by step creation of a CytoPipeline object with dedicated methods. After having run the pipeline, obtained results at all steps can be retrieved and visualized thanks to file caching (the running facility uses a BiocFileCache implementation). The package provides also specific visualization tools like pipeline workflow summary display, and 1D/2D comparison plots of obtained flowFrames at various steps of the pipeline.
This package provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) http://www.statcan.gc.ca/pub/12-001-x/2002002/article/9058-eng.pdf and developed further by Pustejovsky and Tipton (2017) doi:10.1080/07350015.2016.1247004. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple-contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple-contrast hypotheses use an approximation to Hotelling's T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg (from package AER), plm() (from package plm), gls() and lme() (from nlme), robu() (from robumeta), and rma.uni() and rma.mv() (from metafor).
Rolling Window Multiple Correlation ('RolWinMulCor') estimates the rolling (running) window correlation for the bi- and multi-variate cases between regular (sampled on identical time points) time series, with especial emphasis to ecological data although this can be applied to other kinds of data sets. RolWinMulCor is based on the concept of rolling, running or sliding window and is useful to evaluate the evolution of correlation through time and time-scales. RolWinMulCor contains six functions. The first two focus on the bi-variate case: (1) rolwincor_1win() and (2) rolwincor_heatmap(), which estimate the correlation coefficients and the their respective p-values for only one window-length (time-scale) and considering all possible window-lengths or a band of window-lengths, respectively. The second two functions: (3) rolwinmulcor_1win() and (4) rolwinmulcor_heatmap() are designed to analyze the multi-variate case, following the bi-variate case to visually display the results, but these two approaches are methodologically different. That is, the multi-variate case estimates the adjusted coefficients of determination instead of the correlation coefficients. The last two functions: (5) plot_1win() and (6) plot_heatmap() are used to represent graphically the outputs of the four aforementioned functions as simple plots or as heat maps. The functions contained in RolWinMulCor are highly flexible since these contains several parameters to control the estimation of correlation and the features of the plot output, e.g. to remove the (linear) trend contained in the time series under analysis, to choose different p-value correction methods (which are used to address the multiple comparison problem) or to personalise the plot outputs. The RolWinMulCor package also provides examples with synthetic and real-life ecological time series to exemplify its use. Methods derived from H. Abdi. (2007) <https://personal.utdallas.edu/~herve/Abdi-MCC2007-pretty.pdf>, R. Telford (2013) <https://quantpalaeo.wordpress.com/2013/01/04/, J. M. Polanco-Martinez (2019) <doi:10.1007/s11071-019-04974-y>, and J. M. Polanco-Martinez (2020) <doi:10.1016/j.ecoinf.2020.101163>.