This package provides a modeling tool dedicated to biological network modeling (Bertrand and others 2020, <doi:10.1093/bioinformatics/btaa855>). It allows for single or joint modeling of, for instance, genes and proteins. It starts with the selection of the actors that will be the used in the reverse engineering upcoming step. An actor can be included in that selection based on its differential measurement (for instance gene expression or protein abundance) or on its time course profile. Wrappers for actors clustering functions and cluster analysis are provided. It also allows reverse engineering of biological networks taking into account the observed time course patterns of the actors. Many inference functions are provided and dedicated to get specific features for the inferred network such as sparsity, robust links, high confidence links or stable through resampling links. Some simulation and prediction tools are also available for cascade networks (Jung and others 2014, <doi:10.1093/bioinformatics/btt705>). Example of use with microarray or RNA-Seq data are provided.
This package provides functions to develop simulated continuous data (e.g., gene expression) from a sigma covariance matrix derived from a graph structure in igraph objects. Intended to extend mvtnorm to take igraph structures rather than sigma matrices as input. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression. For example methods to infer biological pathways and gene regulatory networks from gene expression data can be tested on simulated datasets using this framework. This also allows for pathway structures to be considered as a confounding variable when simulating gene expression data to test the performance of genomic analyses.
Simplifies the process of estimating above ground biomass components for teak trees using a few basic inputs, based on the equations taken from the journal "Allometric equations for estimating above ground biomass and leaf area of planted teak (Tectona grandis) forests under agroforestry management in East Java, Indonesia" (Purwanto & Shiba, 2006) <doi:10.60409/forestresearch.76.0_1>. This function is most reliable when applied to trees from the same region where the equations were developed, specifically East Java, Indonesia. This function help to estimate the stem diameter at the lowest major living branch (DB) using the stem diameter at breast height with R^2 = 0.969. Estimate the branch dry weight (WB) using the stem diameter at breast height and tree height (R^2 = 0.979). Estimate the stem weight (WS) using the stem diameter at breast height and tree height (R^2 = 0.997. Also estimate the leaf dry weight (WL) using the stem diameter at the lowest major living branch (R^2 = 0.996).
This package provides a Boolean network is a particular kind of discrete dynamical system where the variables are simple binary switches. Despite its simplicity, Boolean network modeling has been a successful method to describe the behavioral pattern of various phenomena. Applying stochastic noise to Boolean networks is a useful approach for representing the effects of various perturbing stimuli on complex systems. A number of methods have been developed to control noise effects on Boolean networks using parameters integrated into the update rules. This package provides functions to examine three such methods: Boolean network with perturbations (BNp), described by Trairatphisan et al. (2013) <doi:10.1186/1478-811X-11-46>, stochastic discrete dynamical systems (SDDS), proposed by Murrugarra et al. (2012) <doi:10.1186/1687-4153-2012-5>, and Boolean network with probabilistic edge weights (PEW), presented by Deritei et al. (2022) <doi:10.1371/journal.pcbi.1010536>. This package includes source code derived from the BoolNet
package, which is licensed under the Artistic License 2.0.
Optimizers for torch deep learning library. These functions include recent results published in the literature and are not part of the optimizers offered in torch'. Prospective users should test these optimizers with their data, since performance depends on the specific problem being solved. The packages includes the following optimizers: (a) adabelief by Zhuang et al (2020), <arXiv:2010.07468>
; (b) adabound by Luo et al.(2019), <arXiv:1902.09843>
; (c) adahessian by Yao et al.(2021) <arXiv:2006.00719>
; (d) adamw by Loshchilov & Hutter (2019), <arXiv:1711.05101>
; (e) madgrad by Defazio and Jelassi (2021), <arXiv:2101.11075>
; (f) nadam by Dozat (2019), <https://openreview.net/pdf/OM0jvwB8jIp57ZJjtNEZ.pdf>
; (g) qhadam by Ma and Yarats(2019), <arXiv:1810.06801>
; (h) radam by Liu et al. (2019), <arXiv:1908.03265>
; (i) swats by Shekar and Sochee (2018), <arXiv:1712.07628>
; (j) yogi by Zaheer et al.(2019), <https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization>.
Datasets from most recent CCIIO DIY entry in a tidy format. These support the Centers for Medicare and Medicaid Services (CMS) risk adjustment Do-It-Yourself (DIY) process, which allows health insurance issuers to calculate member risk profiles under the Health and Human Services-Hierarchical Condition Categories (HHS-HCC) regression model. This regression model is used to calculate risk adjustment transfers. Risk adjustment is a selection mitigation program implemented under the Patient Protection and Affordable Care Act (ACA or Obamacare) in the USA. Under the ACA, health insurance issuers submit claims data to CMS in order for CMS to calculate a risk score under the HHS-HCC regression model. However, CMS does not inform issuers of their average risk score until after the data submission deadline. These data sets can be used by issuers to calculate their average risk score mid-year. More information about risk adjustment and the HHS-HCC model can be found here: <https://www.cms.gov/mmrr/Articles/A2014/MMRR2014_004_03_a03.html>.
Implementation of trigonometric functions to calculate the exposure of flat, tilted surfaces, such as leaves and slopes, to direct solar radiation. It implements the equations in A.G. Escribano-Rocafort, A. Ventre-Lespiaucq, C. Granado-Yela, et al. (2014) <doi:10.1111/2041-210X.12141> in a few user-friendly R functions. All functions handle data obtained with Ahmes 1.0 for Android, as well as more traditional data sources (compass, protractor, inclinometer). The main function (star()
) calculates the potential exposure of flat, tilted surfaces to direct solar radiation (silhouette to area ratio, STAR). It is equivalent to the ratio of the leaf projected area to total leaf area, but instead of using area data it uses spatial position angles, such as pitch, roll and course, and information on the geographical coordinates, hour, and date. The package includes additional functions to recalculate STAR with custom settings of location and time, to calculate the tilt angle of a surface, and the minimum angle between two non-orthogonal planes.
Splines are efficiently represented through their Taylor expansion at the knots. The representation accounts for the support sets and is thus suitable for sparse functional data. Two cases of boundary conditions are considered: zero-boundary or periodic-boundary for all derivatives except the last. The periodical splines are represented graphically using polar coordinates. The B-splines and orthogonal bases of splines that reside on small total support are implemented. The orthogonal bases are referred to as splinets and are utilized for functional data analysis. Random spline generator is implemented as well as all fundamental algebraic and calculus operations on splines. The optimal, in the least square sense, functional fit by splinets to data consisting of sampled values of functions as well as splines build over another set of knots is obtained and used for functional data analysis. The S4-version of the object oriented R is used. <doi:10.48550/arXiv.2102.00733>
, <doi:10.1016/j.cam.2022.114444>, <doi:10.48550/arXiv.2302.07552>
.
This package provides Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of RcppArmadillo
to speed up the computationally intensive parts of the functions. For more information, see
"Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, https://doi.org/10.18637/jss.v001.i04;
"Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, https://doi.org/10.1145/1772690.1772862;
"Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, https://doi.org/10.21105/joss.00026;
"Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, https://doi.org/10.1126/science.1136800.
The theory of cooperative games with transferable utility offers useful insights into the way parties can share gains from cooperation and secure sustainable agreements, see e.g. one of the books by Chakravarty, Mitra and Sarkar (2015, ISBN:978-1107058798) or by Driessen (1988, ISBN:978-9027727299) for more details. A comprehensive set of tools for cooperative game theory with transferable utility is provided. Users can create special families of cooperative games, like e.g. bankruptcy games, cost sharing games and weighted voting games. There are functions to check various game properties and to compute five different set-valued solution concepts for cooperative games. A large number of point-valued solution concepts is available reflecting the diverse application areas of cooperative game theory. Some of these point-valued solution concepts can be used to analyze weighted voting games and measure the influence of individual voters within a voting body. There are routines for visualizing both set-valued and point-valued solutions in the case of three or four players.
This package provides a design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. For additional details, see Dumelle et al. (2023) <doi:10.18637/jss.v105.i03>.
This package provides useful tools for cognitive diagnosis modeling (CDM). The package includes functions for empirical Q-matrix estimation and validation, such as the Hull method (Nájera, Sorrel, de la Torre, & Abad, 2021, <doi:10.1111/bmsp.12228>) and the discrete factor loading method (Wang, Song, & Ding, 2018, <doi:10.1007/978-3-319-77249-3_29>). It also contains dimensionality assessment procedures for CDM, including parallel analysis and automated fit comparison as explored in Nájera, Abad, and Sorrel (2021, <doi:10.3389/fpsyg.2021.614470>). Other relevant methods and features for CDM applications, such as the restricted DINA model (Nájera et al., 2023; <doi:10.3102/10769986231158829>), the general nonparametric classification method (Chiu et al., 2018; <doi:10.1007/s11336-017-9595-4>), and corrected estimation of the classification accuracy via multiple imputation (Kreitchmann et al., 2022; <doi:10.3758/s13428-022-01967-5>) are also available. Lastly, the package provides some useful functions for CDM simulation studies, such as random Q-matrix generation and detection of complete/identified Q-matrices.
This package provides a suite of functions that fit models that use PPM type priors for partitions. Models include hierarchical Gaussian and probit ordinal models with a (covariate dependent) PPM. If a covariate dependent product partition model is selected, then all the options detailed in Page, G.L.; Quintana, F.A. (2018) <doi:10.1007/s11222-017-9777-z> are available. If covariate values are missing, then the approach detailed in Page, G.L.; Quintana, F.A.; Mueller, P (2020) <doi:10.1080/10618600.2021.1999824> is employed. Also included in the package is a function that fits a Gaussian likelihood spatial product partition model that is detailed in Page, G.L.; Quintana, F.A. (2016) <doi:10.1214/15-BA971>, and multivariate PPM change point models that are detailed in Quinlan, J.J.; Page, G.L.; Castro, L.M. (2023) <doi:10.1214/22-BA1344>. In addition, a function that fits a univariate or bivariate functional data model that employs a PPM or a PPMx to cluster curves based on B-spline coefficients is provided.
This package provides a scaling method to obtain a standardized Moran's I measure. Moran's I is a measure for the spatial autocorrelation of a data set, it gives a measure of similarity between data and its surrounding. The range of this value must be [-1,1], but this does not happen in practice. This package scale the Moran's I value and map it into the theoretical range of [-1,1]. Once the Moran's I value is rescaled, it facilitates the comparison between projects, for instance, a researcher can calculate Moran's I in a city in China, with a sample size of n1 and area of interest a1. Another researcher runs a similar experiment in a city in Mexico with different sample size, n2, and an area of interest a2. Due to the differences between the conditions, it is not possible to compare Moran's I in a straightforward way. In this version of the package, the spatial autocorrelation Moran's I is calculated as proposed in Chen(2013) <arXiv:1606.03658>
.
This package provides functions for fitting discrete distribution models to count data. Included are the Poisson, the negative binomial, the Poisson-inverse gaussian and, most importantly, a new implementation of the Poisson-beta distribution (density, distribution and quantile functions, and random number generator) together with a needed new implementation of Kummer's function (also: confluent hypergeometric function of the first kind). Three different implementations of the Gillespie algorithm allow data simulation based on the basic, switching or bursting mRNA
generating processes. Moreover, likelihood functions for four variants of each of the three aforementioned distributions are also available. The variants include one population and two population mixtures, both with and without zero-inflation. The package depends on the MPFR libraries (<https://www.mpfr.org/>) which need to be installed separately (see description at <https://github.com/fuchslab/scModels>
). This package is supplement to the paper "A mechanistic model for the negative binomial distribution of single-cell mRNA
counts" by Lisa Amrhein, Kumar Harsha and Christiane Fuchs (2019) <doi:10.1101/657619> available on bioRxiv
.
This package provides a comprehensive toolset for any useR
conducting topological data analysis, specifically via the calculation of persistent homology in a Vietoris-Rips complex. The tools this package currently provides can be conveniently split into three main sections: (1) calculating persistent homology; (2) conducting statistical inference on persistent homology calculations; (3) visualizing persistent homology and statistical inference. The published form of TDAstats can be found in Wadhwa et al. (2018) <doi:10.21105/joss.00860>. For a general background on computing persistent homology for topological data analysis, see Otter et al. (2017) <doi:10.1140/epjds/s13688-017-0109-5>. To learn more about how the permutation test is used for nonparametric statistical inference in topological data analysis, read Robinson & Turner (2017) <doi:10.1007/s41468-017-0008-7>. To learn more about how TDAstats calculates persistent homology, you can visit the GitHub
repository for Ripser, the software that works behind the scenes at <https://github.com/Ripser/ripser>. This package has been published as Wadhwa et al. (2018) <doi:10.21105/joss.00860>.
This package provides a number of statistical tests have been proposed to compare two survival curves, including the difference in (or ratio of) t-year survival, difference in (or ratio of) p-th percentile survival, difference in (or ratio of) restricted mean survival time, and the weighted log-rank test. Despite the multitude of options, the convention in survival studies is to assume proportional hazards and to use the unweighted log-rank test for design and analysis. This package provides sample size and power calculation for all of the above statistical tests with allowance for flexible accrual, censoring, and survival (eg. Weibull, piecewise-exponential, mixture cure). It is the companion R package to the paper by Yung and Liu (2020) <doi:10.1111/biom.13196>. Specific to the weighted log-rank test, users may specify which approximations they wish to use to estimate the large-sample mean and variance. The default option has been shown to provide substantial improvement over the conventional sample size and power equations based on Schoenfeld (1981) <doi:10.1093/biomet/68.1.316>.
This package provides the tools to produce catseye plots, principally by catseyesplot()
function which calls R's standard plot()
function internally, or alternatively by the catseyes()
function to overlay the catseye plot onto an existing R plot window. Catseye plots illustrate the normal distribution of the mean (picture a normal bell curve reflected over its base and rotated 90 degrees), with a shaded confidence interval; they are an intuitive way of illustrating and comparing normally distributed estimates, and are arguably a superior alternative to standard confidence intervals, since they show the full distribution rather than fixed quantile bounds. The catseyesplot and catseyes functions require pre-calculated means and standard errors (or standard deviations), provided as numeric vectors; this allows the flexibility of obtaining this information from a variety of sources, such as direct calculation or prediction from a model. Catseye plots, as illustrations of the normal distribution of the means, are described in Cumming (2013 & 2014). Cumming, G. (2013). The new statistics: Why and how. Psychological Science, 27, 7-29. <doi:10.1177/0956797613504966> pmid:24220629.
This package provides tools to calculate the theoretical hydrodynamic response of an aquifer undergoing harmonic straining or pressurization, or analyze measured responses. There are two classes of models here, designed for use with confined aquifers: (1) for sealed wells, based on the model of Kitagawa et al (2011, <doi:10.1029/2010JB007794>), and (2) for open wells, based on the models of Cooper et al (1965, <doi:10.1029/JZ070i016p03915>), Hsieh et al (1987, <doi:10.1029/WR023i010p01824>), Rojstaczer (1988, <doi:10.1029/JB093iB11p13619>
), Liu et al (1989, <doi:10.1029/JB094iB07p09453>
), and Wang et al (2018, <doi:10.1029/2018WR022793>). Wang's solution is a special exception which allows for leakage out of the aquifer (semi-confined); it is equivalent to Hsieh's model when there is no leakage (the confined case). These models treat strain (or aquifer head) as an input to the physical system, and fluid-pressure (or water height) as the output. The applicable frequency band of these models is characteristic of seismic waves, atmospheric pressure fluctuations, and solid earth tides.
Randomized clinical trials commonly follow participants for a time-to-event efficacy endpoint for a fixed period of time. Consequently, at the time when the last enrolled participant completes their follow-up, the number of observed endpoints is a random variable. Assuming data collected through an interim timepoint, simulation-based estimation and inferential procedures in the standard right-censored failure time analysis framework are conducted for the distribution of the number of endpoints--in total as well as by treatment arm--at the end of the follow-up period. The future (i.e., yet unobserved) enrollment, endpoint, and dropout times are generated according to mechanisms specified in the simTrial()
function in the seqDesign
package. A Bayesian model for the endpoint rate, offering the option to specify a robust mixture prior distribution, is used for generating future data (see the vignette for details). Inference can be restricted to participants who received treatment according to the protocol and are observed to be at risk for the endpoint at a specified timepoint. Plotting functions are provided for graphical display of results.
This package performs combination tests and sample size calculation for fixed design with survival endpoints using combination tests under either proportional hazards or non-proportional hazards. The combination tests include maximum weighted log-rank test and projection test. The sample size calculation procedure is very flexible, allowing for user-defined hazard ratio function and considering various trial conditions like staggered entry, drop-out etc. The sample size calculation also applies to various cure models such as proportional hazards cure model, cure model with (random) delayed treatments effects. Trial simulation function is also provided to facilitate the empirical power calculation. The references for projection test and maximum weighted logrank test include Brendel et al. (2014) <doi:10.1111/sjos.12059> and Cheng and He (2021) <arXiv:2110.03833>
. The references for sample size calculation under proportional hazard include Schoenfeld (1981) <doi:10.1093/biomet/68.1.316> and Freedman (1982) <doi:10.1002/sim.4780010204>. The references for calculation under non-proportional hazards include Lakatos (1988) <doi:10.2307/2531910> and Cheng and He (2023) <doi:10.1002/bimj.202100403>.
The aim of postpack is to provide the infrastructure for a standardized workflow for mcmc.list objects. These objects can be used to store output from models fitted with Bayesian inference using JAGS', WinBUGS
', OpenBUGS
', NIMBLE', Stan', or even custom MCMC algorithms. Although the coda R package provides some methods for these objects, it is somewhat limited in easily performing post-processing tasks for specific nodes. Models are ever increasing in their complexity and the number of tracked nodes, and oftentimes a user may wish to summarize/diagnose sampling behavior for only a small subset of nodes at a time for a particular question or figure. Thus, many postpack functions support performing tasks on a subset of nodes, where the subset is specified with regular expressions. The functions in postpack streamline the extraction, summarization, and diagnostics of specific monitored nodes after model fitting. Further, because there is rarely only ever one model under consideration, postpack scales efficiently to perform the same tasks on output from multiple models simultaneously, facilitating rapid assessment of model sensitivity to changes in assumptions.
Analysis of relative cell type proportions in bulk gene expression data. Provides a well-validated set of brain cell type-specific marker genes derived from multiple types of experiments, as described in McKenzie
(2018) <doi:10.1038/s41598-018-27293-5>. For brain tissue data sets, there are marker genes available for astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte precursor cells, derived from each of human, mice, and combination human/mouse data sets. However, if you have access to your own marker genes, the functions can be applied to bulk gene expression data from any tissue. Also implements multiple options for relative cell type proportion estimation using these marker genes, adapting and expanding on approaches from the CellCODE
R package described in Chikina (2015) <doi:10.1093/bioinformatics/btv015>. The number of cell type marker genes used in a given analysis can be increased or decreased based on your preferences and the data set. Finally, provides functions to use the estimates to adjust for variability in the relative proportion of cell types across samples prior to downstream analyses.
Assess the calibration of an existing (i.e. previously developed) multistate model through calibration plots. Calibration is assessed using one of three methods. 1) Calibration methods for binary logistic regression models applied at a fixed time point in conjunction with inverse probability of censoring weights. 2) Calibration methods for multinomial logistic regression models applied at a fixed time point in conjunction with inverse probability of censoring weights. 3) Pseudo-values estimated using the Aalen-Johansen estimator of observed risk. All methods are applied in conjunction with landmarking when required. These calibration plots evaluate the calibration (in a validation cohort of interest) of the transition probabilities estimated from an existing multistate model. While package development has focused on multistate models, calibration plots can be produced for any model which utilises information post baseline to update predictions (e.g. dynamic models); competing risks models; or standard single outcome survival models, where predictions can be made at any landmark time. Please see Pate et al. (2024) <doi:10.1002/sim.10094> and Pate et al. (2024) <https://alexpate30.github.io/calibmsm/articles/Overview.html>.