Includes a collection of functions presented in "Measuring stability in ecological systems without static equilibria" by Clark et al. (2022) <doi:10.1002/ecs2.4328> in Ecosphere. These can be used to estimate the parameters of a stochastic state space model (i.e. a model where a time series is observed with error). The goal of this package is to estimate the variability around a deterministic process, both in terms of observation error - i.e. variability due to imperfect observations that does not influence system state - and in terms of process noise - i.e. stochastic variation in the actual state of the process. Unlike classical methods for estimating variability, this package does not necessarily assume that the deterministic state is fixed (i.e. a fixed-point equilibrium), meaning that variability around a dynamic trajectory can be estimated (e.g. stochastic fluctuations during predator-prey dynamics).
The BayesDLMfMRI package performs statistical analysis for task-based functional magnetic resonance imaging (fMRI) data at both individual and group levels. The analysis to detect brain activation at the individual level is based on modeling the fMRI signal using Matrix-Variate Dynamic Linear Models (MDLM). The analysis for the group stage is based on posterior distributions of the state parameter obtained from the modeling at the individual level. In this way, this package offers several R functions with different algorithms to perform inference on the state parameter to assess brain activation for both individual and group stages. Those functions allow for parallel computation when the analysis is performed for the entire brain as well as analysis at specific voxels when it is required. References: Cardona-Jiménez (2021) <doi:10.1016/j.csda.2021.107297>; Cardona-Jiménez (2021) <arXiv:2111.01318>.
The TissueEnrich package is used to calculate enrichment of tissue-specific genes in a set of input genes. For example, the user can input the most highly expressed genes from RNA-Seq data, or gene co-expression modules to determine which tissue-specific genes are enriched in those datasets. Tissue-specific genes were defined by processing RNA-Seq data from the Human Protein Atlas (HPA) (Uhlén et al. 2015), GTEx (Ardlie et al. 2015), and mouse ENCODE (Shen et al. 2012) using the algorithm from the HPA (Uhlén et al. 2015).The hypergeometric test is being used to determine if the tissue-specific genes are enriched among the input genes. Along with tissue-specific gene enrichment, the TissueEnrich package can also be used to define tissue-specific genes from expression datasets provided by the user, which can then be used to calculate tissue-specific gene enrichments.
Plot idiograms of karyotypes, plasmids, circular chr. having a set of data.frames for chromosome data and optionally mark data. Two styles of chromosomes can be used: without or with visible chromatids. Supports micrometers, cM and Mb or any unit. Three styles of centromeres are available: triangle, rounded and inProtein; and six styles of marks are available: square (squareLeft), dots, cM (cMLeft), cenStyle, upArrow (downArrow), exProtein (inProtein); its legend (label) can be drawn inline or to the right of karyotypes. Idiograms can also be plotted in concentric circles. It is possible to calculate chromosome indices by Levan et al. (1964) <doi:10.1111/j.1601-5223.1964.tb01953.x>, karyotype indices of Watanabe et al. (1999) <doi:10.1007/PL00013869> and Romero-Zarco (1986) <doi:10.2307/1221906> and classify chromosomes by morphology Guerra (1986) and Levan et al. (1964).
This package provides a compilation of functions designed to assist users on the correlation analysis of crop yield and soil test values. Functions to estimate crop response patterns to soil nutrient availability and critical soil test values using various approaches such as: 1) the modified arcsine-log calibration curve (Correndo et al. (2017) <doi:10.1071/CP16444>); 2) the graphical Cate-Nelson quadrants analysis (Cate & Nelson (1965)), 3) the statistical Cate-Nelson quadrants analysis (Cate & Nelson (1971) <doi:10.2136/sssaj1971.03615995003500040048x>), 4) the linear-plateau regression (Anderson & Nelson (1975) <doi:10.2307/2529422>), 5) the quadratic-plateau regression (Bullock & Bullock (1994) <doi:10.2134/agronj1994.00021962008600010033x>), and 6) the Mitscherlich-type exponential regression (Melsted & Peck (1977) <doi:10.2134/asaspecpub29.c1>). The package development stemmed from ongoing work with the Fertilizer Recommendation Support Tool (FRST) and Feed the Future Innovation Lab for Collaborative Research on Sustainable Intensification (SIIL) projects.
All the seeds do not germinate at a single point in time due to physiological mechanisms determined by temperature which vary among individual seeds in the population. Seeds germinate by following accumulation of thermal time in degree days/hours, quantified by multiplying the time of germination with excess of base temperature required by each seed for its germination, which follows log-normal distribution. The theoretical germination course can be obtained by regressing the rate of germination at various fractions against temperature (Garcia et al., 1982), where the fraction-wise regression lines intersect the temperature axis at base temperature and the methodology of determining optimum base temperature has been described by Ellis et al. (1987). This package helps to find the base temperature of seed germination using algorithms of Garcia et al. (1982) and Ellis et al. (1982) <doi:10.1093/JXB/38.6.1033> <doi:10.1093/jxb/33.2.288>.
This package provides a standard test is observed on all specimens. We treat the second test (or sampled test) as being conducted on only a stratified sample of specimens. Verification Bias is this situation when the specimens for doing the second (sampled) test is not under investigator control. We treat the total sample as stratified two-phase sampling and use inverse probability weighting. We estimate diagnostic accuracy (category-specific classification probabilities; for binary tests reduces to specificity and sensitivity, and also predictive values) and agreement statistics (percent agreement, percent agreement by category, Kappa (unweighted), Kappa (quadratic weighted) and symmetry tests (reduces to McNemar's test for binary tests)). See: Katki HA, Li Y, Edelstein DW, Castle PE. Estimating the agreement and diagnostic accuracy of two diagnostic tests when one test is conducted on only a subsample of specimens. Stat Med. 2012 Feb 28; 31(5) <doi:10.1002/sim.4422>.
Single cell resolution data has been valuable in learning about tissue microenvironments and interactions between cells or spots. This package allows for the simulation of this level of data, be it single cell or â spotsâ , in both a univariate (single metric or cell type) and bivariate (2 or more metrics or cell types) ways. As more technologies come to marker, more methods will be developed to derive spatial metrics from the data which will require a way to benchmark methods against each other. Additionally, as the field currently stands, there is not a gold standard method to be compared against. We set out to develop an R package that will allow users to simulate point patterns that can be biologically informed from different tissue domains, holes, and varying degrees of clustering/colocalization. The data can be exported as spatial files and a summary file (like HALO'). <https://github.com/FridleyLab/scSpatialSIM/>.
This package provides functions to prepare rankings data and fit the Plackett-Luce model jointly attributed to Plackett (1975) <doi:10.2307/2346567> and Luce (1959, ISBN:0486441369). The standard Plackett-Luce model is generalized to accommodate ties of any order in the ranking. Partial rankings, in which only a subset of items are ranked in each ranking, are also accommodated in the implementation. Disconnected/weakly connected networks implied by the rankings may be handled by adding pseudo-rankings with a hypothetical item. Optionally, a multivariate normal prior may be set on the log-worth parameters and ranker reliabilities may be incorporated as proposed by Raman and Joachims (2014) <doi:10.1145/2623330.2623654>. Maximum a posteriori estimation is used when priors are set. Methods are provided to estimate standard errors or quasi-standard errors for inference as well as to fit Plackett-Luce trees. See the package website or vignette for further details.
Simultaneous clustering of rows and columns, usually designated by biclustering, co-clustering or block clustering, is an important technique in two way data analysis. It consists of estimating a mixture model which takes into account the block clustering problem on both the individual and variables sets. The blockcluster package provides a bridge between the C++ core library build on top of the STK++ library, and the R statistical computing environment. This package allows to co-cluster binary <doi:10.1016/j.csda.2007.09.007>, contingency <doi:10.1080/03610920903140197>, continuous <doi:10.1007/s11634-013-0161-3> and categorical data-sets <doi:10.1007/s11222-014-9472-2>. It also provides utility functions to visualize the results. This package may be useful for various applications in fields of Data mining, Information retrieval, Biology, computer vision and many more. More information about the project and comprehensive tutorial can be found on the link mentioned in URL.
Evaluates the probability density function (PDF), cumulative distribution function (CDF), quantile function (QF), random numbers and maximum likelihood estimates (MLEs) of well-known complementary binomial-G, complementary negative binomial-G and complementary geometric-G families of distributions taking baseline models such as exponential, extended exponential, Weibull, extended Weibull, Fisk, Lomax, Burr-XII and Burr-X. The functions also allow computing the goodness-of-fit measures namely the Akaike-information-criterion (AIC), the Bayesian-information-criterion (BIC), the minimum value of the negative log-likelihood (-2L) function, Anderson-Darling (A) test, Cramer-Von-Mises (W) test, Kolmogorov-Smirnov test, P-value and convergence status. Moreover, some commonly used data sets from the fields of actuarial, reliability, and medical science are also provided. Related works include: a) Tahir, M. H., & Cordeiro, G. M. (2016). Compounding of distributions: a survey and new generalized classes. Journal of Statistical Distributions and Applications, 3, 1-35. <doi:10.1186/s40488-016-0052-1>.
CRAN packages DoE.base and Rmosek and non-'CRAN package gurobi are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. The package writes MPS (Mathematical Programming System) files for use with any mixed integer optimization software that can process such files. If at least one of the commercial products Gurobi or Mosek (free academic licenses available for both) is available, the package also creates arrays by optimization. For installing Gurobi and its R package gurobi', follow instructions at <https://support.gurobi.com/hc/en-us/articles/14462206790033-How-do-I-install-Gurobi-for-R>. For installing Mosek and its R package Rmosek', follow instructions at <https://www.mosek.com/downloads/> and <https://docs.mosek.com/8.1/rmosek/install-interface.html>, or use the functionality in the stump CRAN R package Rmosek'.
This package provides covariate-adjusted comparison of two groups of right censored data, where the binary group variable has separate short-term and long-term effects on the hazard function, while effects of covariates such as age, blood pressure, etc. are proportional on the hazard. The model was studied in Yang and Prentice (2015) <doi:10.1002/sim.6453> and it extends the two sample version of the short-term and long-term hazard ratio model proposed in Yang and Prentice (2005) <doi:10.1093/biomet/92.1.1>. The model extends the usual Cox proportional hazards model to allow more flexible hazard ratio patterns, such as gradual onset of effect, diminishing effect, and crossing hazard or survival functions. This package provides the following: 1) point estimates and confidence intervals for model parameters; 2) point estimate and confidence interval of the average hazard ratio; and 3) plots of estimated hazard ratio function with point-wise and simultaneous confidence bands.
This package provides support for automation and visualization of flow cytometry data analysis pipelines. In the current state, the package focuses on the preprocessing and quality control part. The framework is based on two main S4 classes, i.e. CytoPipeline and CytoProcessingStep. The pipeline steps are linked to corresponding R functions - that are either provided in the CytoPipeline package itself, or exported from a third party package, or coded by the user her/himself. The processing steps need to be specified centrally and explicitly using either a json input file or through step by step creation of a CytoPipeline object with dedicated methods. After having run the pipeline, obtained results at all steps can be retrieved and visualized thanks to file caching (the running facility uses a BiocFileCache implementation). The package provides also specific visualization tools like pipeline workflow summary display, and 1D/2D comparison plots of obtained flowFrames at various steps of the pipeline.
Estimates and plots (as a single plot and as a heat map) the rolling window correlation coefficients between two time series and computes their statistical significance, which is carried out through a non-parametric computing-intensive method. This method addresses the effects due to the multiple testing (inflation of the Type I error) when the statistical significance is estimated for the rolling window correlation coefficients. The method is based on Monte Carlo simulations by permuting one of the variables (e.g., the dependent) under analysis and keeping fixed the other variable (e.g., the independent). We improve the computational efficiency of this method to reduce the computation time through parallel computing. The NonParRolCor package also provides examples with synthetic and real-life environmental time series to exemplify its use. Methods derived from R. Telford (2013) <https://quantpalaeo.wordpress.com/2013/01/04/> and J.M. Polanco-Martinez and J.L. Lopez-Martinez (2021) <doi:10.1016/j.ecoinf.2021.101379>.
Helps a clinical trial team discuss the clinical goals of a well-defined biomarker with a diagnostic, staging, prognostic, or predictive purpose. From this discussion will come a statistical plan for a (non-randomized) validation trial. Both prospective and retrospective trials are supported. In a specific focused discussion, investigators should determine the range of "discomfort" for the NNT, number needed to treat. The meaning of the discomfort range, [NNTlower, NNTupper], is that within this range most physicians would feel discomfort either in treating or withholding treatment. A pair of NNT values bracketing that range, NNTpos and NNTneg, become the targets of the study's design. If the trial can demonstrate that a positive biomarker test yields an NNT less than NNTlower, and that a negative biomarker test yields an NNT less than NNTlower, then the biomarker may be useful for patients. A highlight of the package is visualization of a "contra-Bayes" theorem, which produces criteria for retrospective case-controls studies.
This package provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) http://www.statcan.gc.ca/pub/12-001-x/2002002/article/9058-eng.pdf and developed further by Pustejovsky and Tipton (2017) doi:10.1080/07350015.2016.1247004. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple-contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple-contrast hypotheses use an approximation to Hotelling's T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg (from package AER), plm() (from package plm), gls() and lme() (from nlme), robu() (from robumeta), and rma.uni() and rma.mv() (from metafor).
Rolling Window Multiple Correlation ('RolWinMulCor') estimates the rolling (running) window correlation for the bi- and multi-variate cases between regular (sampled on identical time points) time series, with especial emphasis to ecological data although this can be applied to other kinds of data sets. RolWinMulCor is based on the concept of rolling, running or sliding window and is useful to evaluate the evolution of correlation through time and time-scales. RolWinMulCor contains six functions. The first two focus on the bi-variate case: (1) rolwincor_1win() and (2) rolwincor_heatmap(), which estimate the correlation coefficients and the their respective p-values for only one window-length (time-scale) and considering all possible window-lengths or a band of window-lengths, respectively. The second two functions: (3) rolwinmulcor_1win() and (4) rolwinmulcor_heatmap() are designed to analyze the multi-variate case, following the bi-variate case to visually display the results, but these two approaches are methodologically different. That is, the multi-variate case estimates the adjusted coefficients of determination instead of the correlation coefficients. The last two functions: (5) plot_1win() and (6) plot_heatmap() are used to represent graphically the outputs of the four aforementioned functions as simple plots or as heat maps. The functions contained in RolWinMulCor are highly flexible since these contains several parameters to control the estimation of correlation and the features of the plot output, e.g. to remove the (linear) trend contained in the time series under analysis, to choose different p-value correction methods (which are used to address the multiple comparison problem) or to personalise the plot outputs. The RolWinMulCor package also provides examples with synthetic and real-life ecological time series to exemplify its use. Methods derived from H. Abdi. (2007) <https://personal.utdallas.edu/~herve/Abdi-MCC2007-pretty.pdf>, R. Telford (2013) <https://quantpalaeo.wordpress.com/2013/01/04/, J. M. Polanco-Martinez (2019) <doi:10.1007/s11071-019-04974-y>, and J. M. Polanco-Martinez (2020) <doi:10.1016/j.ecoinf.2020.101163>.
Constructs treatment and block designs for linear treatment models with crossed or nested block factors. The treatment design can be any feasible linear model and the block design can be any feasible combination of crossed or nested block factors. The block design is a sum of one or more block factors and the block design is optimized sequentially with the levels of each successive block factor optimized conditional on all previously optimized block factors. D-optimality is used throughout except for square or rectangular lattice block designs which are constructed algebraically using mutually orthogonal Latin squares. Crossed block designs with interaction effects are optimized using a weighting scheme which allows for differential weighting of first and second-order block effects. Outputs include a table showing the allocation of treatments to blocks and tables showing the achieved D-efficiency factors for each block and treatment design. Edmondson, R.N. Multi-level Block Designs for Comparative Experiments. JABES 25, 500â 522 (2020) <doi:10.1007/s13253-020-00416-0>.
Biotechnology in spatial omics has advanced rapidly over the past few years, enhancing both throughput and resolution. However, existing annotation pipelines in spatial omics predominantly rely on clustering methods, lacking the flexibility to integrate extensive annotated information from single-cell RNA sequencing (scRNA-seq) due to discrepancies in spatial resolutions, species, or modalities. Here we introduce the CAESAR suite, an open-source software package that provides image-based spatial co-embedding of locations and genomic features. It uniquely transfers labels from scRNA-seq reference, enabling the annotation of spatial omics datasets across different technologies, resolutions, species, and modalities, based on the conserved relationship between signature genes and cells/locations at an appropriate level of granularity. Notably, CAESAR enriches location-level pathways, allowing for the detection of gradual biological pathway activation within spatially defined domain types. More details on the methods related to our paper currently under submission. A full reference to the paper will be provided in future versions once the paper is published.
This package provides a set of procedures for estimating risks related to extreme events via risk measures such as Expectile, Value-at-Risk, etc. is provided. Estimation methods for univariate independent observations and temporal dependent observations are available. The methodology is extended to the case of independent multidimensional observations. The statistical inference is performed through parametric and non-parametric estimators. Inferential procedures such as confidence intervals, confidence regions and hypothesis testing are obtained by exploiting the asymptotic theory. Adapts the methodologies derived in Padoan and Stupfler (2022) <doi:10.3150/21-BEJ1375>, Davison et al. (2023) <doi:10.1080/07350015.2022.2078332>, Daouia et al. (2018) <doi:10.1111/rssb.12254>, Drees (2000) <doi:10.1214/aoap/1019487617>, Drees (2003) <doi:10.3150/bj/1066223272>, de Haan and Ferreira (2006) <doi:10.1007/0-387-34471-3>, de Haan et al. (2016) <doi:10.1007/s00780-015-0287-6>, Padoan and Rizzelli (2024) <doi:10.3150/23-BEJ1668>, Daouia et al. (2024) <doi:10.3150/23-BEJ1632>.
Alluvial plots are similar to sankey diagrams and visualise categorical data over multiple dimensions as flows. (Rosvall M, Bergstrom CT (2010) Mapping Change in Large Networks. PLoS ONE 5(1): e8694. <doi:10.1371/journal.pone.0008694> Their graphical grammar however is a bit more complex then that of a regular x/y plots. The ggalluvial package made a great job of translating that grammar into ggplot2 syntax and gives you many options to tweak the appearance of an alluvial plot, however there still remains a multi-layered complexity that makes it difficult to use ggalluvial for explorative data analysis. easyalluvial provides a simple interface to this package that allows you to produce a decent alluvial plot from any dataframe in either long or wide format from a single line of code while also handling continuous data. It is meant to allow a quick visualisation of entire dataframes with a focus on different colouring options that can make alluvial plots a great tool for data exploration.
MHC (major histocompatibility complex) molecules are cell surface complexes that present antigens to T cells. The repertoire of antigens presented in a given genetic background largely depends on the sequence of the encoded MHC molecules, and thus, in humans, on the highly variable HLA (human leukocyte antigen) genes of the hyperpolymorphic HLA locus. More than 28,000 different HLA alleles have been reported, with significant differences in allele frequencies between human populations worldwide. Reproducible and consistent annotation of HLA alleles in large-scale bioinformatics workflows remains challenging, because the available reference databases and software tools often use different HLA naming schemes. The package immunotation provides tools for consistent annotation of HLA genes in typical immunoinformatics workflows such as for example the prediction of MHC-presented peptides in different human donors. Converter functions that provide mappings between different HLA naming schemes are based on the MHC restriction ontology (MRO). The package also provides automated access to HLA alleles frequencies in worldwide human reference populations stored in the Allele Frequency Net Database.
Set of generalised tools for the flexible computation of climate related indicators defined by the user. Each method represents a specific mathematical approach which is combined with the possibility to select an arbitrary time period to define the indicator. This enables a wide range of possibilities to tailor the most suitable indicator for each particular climate service application (agriculture, food security, energy, water management, health...). This package is intended for sub-seasonal, seasonal and decadal climate predictions, but its methods are also applicable to other time-scales, provided the dimensional structure of the input is maintained. Additionally, the outputs of the functions in this package are compatible with CSTools'. This package is described in Pérez-Zanón et al. (2023) <doi:10.1016/j.cliser.2023.100393> and it was developed in the context of H2020 MED-GOLD (776467) and S2S4E (776787) projects. See Lledó et al. (2019) <doi:10.1016/j.renene.2019.04.135> and Chou et al., 2023 <doi:10.1016/j.cliser.2023.100345> for details.