Splines are efficiently represented through their Taylor expansion at the knots. The representation accounts for the support sets and is thus suitable for sparse functional data. Two cases of boundary conditions are considered: zero-boundary or periodic-boundary for all derivatives except the last. The periodical splines are represented graphically using polar coordinates. The B-splines and orthogonal bases of splines that reside on small total support are implemented. The orthogonal bases are referred to as splinets and are utilized for functional data analysis. Random spline generator is implemented as well as all fundamental algebraic and calculus operations on splines. The optimal, in the least square sense, functional fit by splinets to data consisting of sampled values of functions as well as splines build over another set of knots is obtained and used for functional data analysis. The S4-version of the object oriented R is used. <doi:10.48550/arXiv.2102.00733>
, <doi:10.1016/j.cam.2022.114444>, <doi:10.48550/arXiv.2302.07552>
.
The theory of cooperative games with transferable utility offers useful insights into the way parties can share gains from cooperation and secure sustainable agreements, see e.g. one of the books by Chakravarty, Mitra and Sarkar (2015, ISBN:978-1107058798) or by Driessen (1988, ISBN:978-9027727299) for more details. A comprehensive set of tools for cooperative game theory with transferable utility is provided. Users can create special families of cooperative games, like e.g. bankruptcy games, cost sharing games and weighted voting games. There are functions to check various game properties and to compute five different set-valued solution concepts for cooperative games. A large number of point-valued solution concepts is available reflecting the diverse application areas of cooperative game theory. Some of these point-valued solution concepts can be used to analyze weighted voting games and measure the influence of individual voters within a voting body. There are routines for visualizing both set-valued and point-valued solutions in the case of three or four players.
This package provides a design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. For additional details, see Dumelle et al. (2023) <doi:10.18637/jss.v105.i03>.
This package provides Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of RcppArmadillo
to speed up the computationally intensive parts of the functions. For more information, see
"Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, https://doi.org/10.18637/jss.v001.i04;
"Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, https://doi.org/10.1145/1772690.1772862;
"Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, https://doi.org/10.21105/joss.00026;
"Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, https://doi.org/10.1126/science.1136800.
Seahtrue organizes oxygen consumption and extracellular acidification analysis data from experiments performed on an XF analyzer into structured nested tibbles.This allows for detailed processing of raw data and advanced data visualization and statistics. Seahtrue introduces an open and reproducible way to analyze these XF experiments. It uses file paths to .xlsx files. These .xlsx files are supplied by the userand are generated by the user in the Wave software from Agilent from the assay result files (.asyr). The .xlsx file contains different sheets of important data for the experiment; 1. Assay Information - Details about how the experiment was set up. 2. Rate Data - Information about the OCR and ECAR rates. 3. Raw Data - The original raw data collected during the experiment. 4. Calibration Data - Data related to calibrating the instrument. Seahtrue focuses on getting the specific data needed for analysis. Once this data is extracted, it is prepared for calculations through preprocessing. To make sure everything is accurate, both the initial data and the preprocessed data go through thorough checks.
The EMDomics algorithm is used to perform a supervised multi-class analysis to measure the magnitude and statistical significance of observed continuous genomics data between groups. Usually the data will be gene expression values from array-based or sequence-based experiments, but data from other types of experiments can also be analyzed (e.g. copy number variation). Traditional methods like Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA) use significance tests based on summary statistics (mean and standard deviation) of the distributions. This approach lacks power to identify expression differences between groups that show high levels of intra-group heterogeneity. The Earth Mover's Distance (EMD) algorithm instead computes the "work" needed to transform one distribution into another, thus providing a metric of the overall difference in shape between two distributions. Permutation of sample labels is used to generate q-values for the observed EMD scores. This package also incorporates the Komolgorov-Smirnov (K-S) test and the Cramer von Mises test (CVM), which are both common distribution comparison tests.
This package provides useful tools for cognitive diagnosis modeling (CDM). The package includes functions for empirical Q-matrix estimation and validation, such as the Hull method (Nájera, Sorrel, de la Torre, & Abad, 2021, <doi:10.1111/bmsp.12228>) and the discrete factor loading method (Wang, Song, & Ding, 2018, <doi:10.1007/978-3-319-77249-3_29>). It also contains dimensionality assessment procedures for CDM, including parallel analysis and automated fit comparison as explored in Nájera, Abad, and Sorrel (2021, <doi:10.3389/fpsyg.2021.614470>). Other relevant methods and features for CDM applications, such as the restricted DINA model (Nájera et al., 2023; <doi:10.3102/10769986231158829>), the general nonparametric classification method (Chiu et al., 2018; <doi:10.1007/s11336-017-9595-4>), and corrected estimation of the classification accuracy via multiple imputation (Kreitchmann et al., 2022; <doi:10.3758/s13428-022-01967-5>) are also available. Lastly, the package provides some useful functions for CDM simulation studies, such as random Q-matrix generation and detection of complete/identified Q-matrices.
This package provides a suite of functions that fit models that use PPM type priors for partitions. Models include hierarchical Gaussian and probit ordinal models with a (covariate dependent) PPM. If a covariate dependent product partition model is selected, then all the options detailed in Page, G.L.; Quintana, F.A. (2018) <doi:10.1007/s11222-017-9777-z> are available. If covariate values are missing, then the approach detailed in Page, G.L.; Quintana, F.A.; Mueller, P (2020) <doi:10.1080/10618600.2021.1999824> is employed. Also included in the package is a function that fits a Gaussian likelihood spatial product partition model that is detailed in Page, G.L.; Quintana, F.A. (2016) <doi:10.1214/15-BA971>, and multivariate PPM change point models that are detailed in Quinlan, J.J.; Page, G.L.; Castro, L.M. (2023) <doi:10.1214/22-BA1344>. In addition, a function that fits a univariate or bivariate functional data model that employs a PPM or a PPMx to cluster curves based on B-spline coefficients is provided.
This package provides a scaling method to obtain a standardized Moran's I measure. Moran's I is a measure for the spatial autocorrelation of a data set, it gives a measure of similarity between data and its surrounding. The range of this value must be [-1,1], but this does not happen in practice. This package scale the Moran's I value and map it into the theoretical range of [-1,1]. Once the Moran's I value is rescaled, it facilitates the comparison between projects, for instance, a researcher can calculate Moran's I in a city in China, with a sample size of n1 and area of interest a1. Another researcher runs a similar experiment in a city in Mexico with different sample size, n2, and an area of interest a2. Due to the differences between the conditions, it is not possible to compare Moran's I in a straightforward way. In this version of the package, the spatial autocorrelation Moran's I is calculated as proposed in Chen(2013) <arXiv:1606.03658>
.
This package provides functions for fitting discrete distribution models to count data. Included are the Poisson, the negative binomial, the Poisson-inverse gaussian and, most importantly, a new implementation of the Poisson-beta distribution (density, distribution and quantile functions, and random number generator) together with a needed new implementation of Kummer's function (also: confluent hypergeometric function of the first kind). Three different implementations of the Gillespie algorithm allow data simulation based on the basic, switching or bursting mRNA
generating processes. Moreover, likelihood functions for four variants of each of the three aforementioned distributions are also available. The variants include one population and two population mixtures, both with and without zero-inflation. The package depends on the MPFR libraries (<https://www.mpfr.org/>) which need to be installed separately (see description at <https://github.com/fuchslab/scModels>
). This package is supplement to the paper "A mechanistic model for the negative binomial distribution of single-cell mRNA
counts" by Lisa Amrhein, Kumar Harsha and Christiane Fuchs (2019) <doi:10.1101/657619> available on bioRxiv
.
This package provides a comprehensive toolset for any useR
conducting topological data analysis, specifically via the calculation of persistent homology in a Vietoris-Rips complex. The tools this package currently provides can be conveniently split into three main sections: (1) calculating persistent homology; (2) conducting statistical inference on persistent homology calculations; (3) visualizing persistent homology and statistical inference. The published form of TDAstats can be found in Wadhwa et al. (2018) <doi:10.21105/joss.00860>. For a general background on computing persistent homology for topological data analysis, see Otter et al. (2017) <doi:10.1140/epjds/s13688-017-0109-5>. To learn more about how the permutation test is used for nonparametric statistical inference in topological data analysis, read Robinson & Turner (2017) <doi:10.1007/s41468-017-0008-7>. To learn more about how TDAstats calculates persistent homology, you can visit the GitHub
repository for Ripser, the software that works behind the scenes at <https://github.com/Ripser/ripser>. This package has been published as Wadhwa et al. (2018) <doi:10.21105/joss.00860>.
This package provides a number of statistical tests have been proposed to compare two survival curves, including the difference in (or ratio of) t-year survival, difference in (or ratio of) p-th percentile survival, difference in (or ratio of) restricted mean survival time, and the weighted log-rank test. Despite the multitude of options, the convention in survival studies is to assume proportional hazards and to use the unweighted log-rank test for design and analysis. This package provides sample size and power calculation for all of the above statistical tests with allowance for flexible accrual, censoring, and survival (eg. Weibull, piecewise-exponential, mixture cure). It is the companion R package to the paper by Yung and Liu (2020) <doi:10.1111/biom.13196>. Specific to the weighted log-rank test, users may specify which approximations they wish to use to estimate the large-sample mean and variance. The default option has been shown to provide substantial improvement over the conventional sample size and power equations based on Schoenfeld (1981) <doi:10.1093/biomet/68.1.316>.
In order to facilitate the adjustment of the sample selection models existing in the literature, we created the ssmodels package. Our package allows the adjustment of the classic Heckman model (Heckman (1976), Heckman (1979) <doi:10.2307/1912352>), and the estimation of the parameters of this model via the maximum likelihood method and two-step method, in addition to the adjustment of the Heckman-t models, introduced in the literature by Marchenko and Genton (2012) <doi:10.1080/01621459.2012.656011> and the Heckman-Skew model introduced in the literature by Ogundimu and Hutton (2016) <doi:10.1111/sjos.12171>. We also implemented functions to adjust the generalized version of the Heckman model, introduced by Bastos, Barreto-Souza, and Genton (2021) <doi:10.5705/ss.202021.0068>, that allows the inclusion of covariables to the dispersion and correlation parameters and a function to adjust the Heckman-BS model introduced by Bastos and Barreto-Souza (2020) <doi:10.1080/02664763.2020.1780570> that uses the Birnbaum-Saunders distribution as a joint distribution of the selection and primary regression variables.
This package provides the tools to produce catseye plots, principally by catseyesplot()
function which calls R's standard plot()
function internally, or alternatively by the catseyes()
function to overlay the catseye plot onto an existing R plot window. Catseye plots illustrate the normal distribution of the mean (picture a normal bell curve reflected over its base and rotated 90 degrees), with a shaded confidence interval; they are an intuitive way of illustrating and comparing normally distributed estimates, and are arguably a superior alternative to standard confidence intervals, since they show the full distribution rather than fixed quantile bounds. The catseyesplot and catseyes functions require pre-calculated means and standard errors (or standard deviations), provided as numeric vectors; this allows the flexibility of obtaining this information from a variety of sources, such as direct calculation or prediction from a model. Catseye plots, as illustrations of the normal distribution of the means, are described in Cumming (2013 & 2014). Cumming, G. (2013). The new statistics: Why and how. Psychological Science, 27, 7-29. <doi:10.1177/0956797613504966> pmid:24220629.
This package provides tools to calculate the theoretical hydrodynamic response of an aquifer undergoing harmonic straining or pressurization, or analyze measured responses. There are two classes of models here, designed for use with confined aquifers: (1) for sealed wells, based on the model of Kitagawa et al (2011, <doi:10.1029/2010JB007794>), and (2) for open wells, based on the models of Cooper et al (1965, <doi:10.1029/JZ070i016p03915>), Hsieh et al (1987, <doi:10.1029/WR023i010p01824>), Rojstaczer (1988, <doi:10.1029/JB093iB11p13619>
), Liu et al (1989, <doi:10.1029/JB094iB07p09453>
), and Wang et al (2018, <doi:10.1029/2018WR022793>). Wang's solution is a special exception which allows for leakage out of the aquifer (semi-confined); it is equivalent to Hsieh's model when there is no leakage (the confined case). These models treat strain (or aquifer head) as an input to the physical system, and fluid-pressure (or water height) as the output. The applicable frequency band of these models is characteristic of seismic waves, atmospheric pressure fluctuations, and solid earth tides.
Randomized clinical trials commonly follow participants for a time-to-event efficacy endpoint for a fixed period of time. Consequently, at the time when the last enrolled participant completes their follow-up, the number of observed endpoints is a random variable. Assuming data collected through an interim timepoint, simulation-based estimation and inferential procedures in the standard right-censored failure time analysis framework are conducted for the distribution of the number of endpoints--in total as well as by treatment arm--at the end of the follow-up period. The future (i.e., yet unobserved) enrollment, endpoint, and dropout times are generated according to mechanisms specified in the simTrial()
function in the seqDesign
package. A Bayesian model for the endpoint rate, offering the option to specify a robust mixture prior distribution, is used for generating future data (see the vignette for details). Inference can be restricted to participants who received treatment according to the protocol and are observed to be at risk for the endpoint at a specified timepoint. Plotting functions are provided for graphical display of results.
This package performs combination tests and sample size calculation for fixed design with survival endpoints using combination tests under either proportional hazards or non-proportional hazards. The combination tests include maximum weighted log-rank test and projection test. The sample size calculation procedure is very flexible, allowing for user-defined hazard ratio function and considering various trial conditions like staggered entry, drop-out etc. The sample size calculation also applies to various cure models such as proportional hazards cure model, cure model with (random) delayed treatments effects. Trial simulation function is also provided to facilitate the empirical power calculation. The references for projection test and maximum weighted logrank test include Brendel et al. (2014) <doi:10.1111/sjos.12059> and Cheng and He (2021) <arXiv:2110.03833>
. The references for sample size calculation under proportional hazard include Schoenfeld (1981) <doi:10.1093/biomet/68.1.316> and Freedman (1982) <doi:10.1002/sim.4780010204>. The references for calculation under non-proportional hazards include Lakatos (1988) <doi:10.2307/2531910> and Cheng and He (2023) <doi:10.1002/bimj.202100403>.
The aim of postpack is to provide the infrastructure for a standardized workflow for mcmc.list objects. These objects can be used to store output from models fitted with Bayesian inference using JAGS', WinBUGS
', OpenBUGS
', NIMBLE', Stan', or even custom MCMC algorithms. Although the coda R package provides some methods for these objects, it is somewhat limited in easily performing post-processing tasks for specific nodes. Models are ever increasing in their complexity and the number of tracked nodes, and oftentimes a user may wish to summarize/diagnose sampling behavior for only a small subset of nodes at a time for a particular question or figure. Thus, many postpack functions support performing tasks on a subset of nodes, where the subset is specified with regular expressions. The functions in postpack streamline the extraction, summarization, and diagnostics of specific monitored nodes after model fitting. Further, because there is rarely only ever one model under consideration, postpack scales efficiently to perform the same tasks on output from multiple models simultaneously, facilitating rapid assessment of model sensitivity to changes in assumptions.
Analysis of relative cell type proportions in bulk gene expression data. Provides a well-validated set of brain cell type-specific marker genes derived from multiple types of experiments, as described in McKenzie
(2018) <doi:10.1038/s41598-018-27293-5>. For brain tissue data sets, there are marker genes available for astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte precursor cells, derived from each of human, mice, and combination human/mouse data sets. However, if you have access to your own marker genes, the functions can be applied to bulk gene expression data from any tissue. Also implements multiple options for relative cell type proportion estimation using these marker genes, adapting and expanding on approaches from the CellCODE
R package described in Chikina (2015) <doi:10.1093/bioinformatics/btv015>. The number of cell type marker genes used in a given analysis can be increased or decreased based on your preferences and the data set. Finally, provides functions to use the estimates to adjust for variability in the relative proportion of cell types across samples prior to downstream analyses.
Assess the calibration of an existing (i.e. previously developed) multistate model through calibration plots. Calibration is assessed using one of three methods. 1) Calibration methods for binary logistic regression models applied at a fixed time point in conjunction with inverse probability of censoring weights. 2) Calibration methods for multinomial logistic regression models applied at a fixed time point in conjunction with inverse probability of censoring weights. 3) Pseudo-values estimated using the Aalen-Johansen estimator of observed risk. All methods are applied in conjunction with landmarking when required. These calibration plots evaluate the calibration (in a validation cohort of interest) of the transition probabilities estimated from an existing multistate model. While package development has focused on multistate models, calibration plots can be produced for any model which utilises information post baseline to update predictions (e.g. dynamic models); competing risks models; or standard single outcome survival models, where predictions can be made at any landmark time. Please see Pate et al. (2024) <doi:10.1002/sim.10094> and Pate et al. (2024) <https://alexpate30.github.io/calibmsm/articles/Overview.html>.
This package implements estimation and testing procedures for evaluating an intermediate biomarker response as a principal surrogate of a clinical response to treatment (i.e., principal stratification effect modification analysis), as described in Juraska M, Huang Y, and Gilbert PB (2020), Inference on treatment effect modification by biomarker response in a three-phase sampling design, Biostatistics, 21(3): 545-560 <doi:10.1093/biostatistics/kxy074>. The methods avoid the restrictive placebo structural risk modeling assumption common to past methods and further improve robustness by the use of nonparametric kernel smoothing for biomarker density estimation. A randomized controlled two-group clinical efficacy trial is assumed with an ordered categorical or continuous univariate biomarker response measured at a fixed timepoint post-randomization and with a univariate baseline surrogate measure allowed to be observed in only a subset of trial participants with an observed biomarker response (see the flexible three-phase sampling design in the paper for details). Bootstrap-based procedures are available for pointwise and simultaneous confidence intervals and testing of four relevant hypotheses. Summary and plotting functions are provided for estimation results.
This package provides functions that fit two modern education-based value-added models. One of these models is the quantile value-added model. This model permits estimating a school's value-added based on specific quantiles of the post-test distribution. Estimating value-added based on quantiles of the post-test distribution provides a more complete picture of an education institution's contribution to learning for students of all abilities. See Page, G.L.; San Martà n, E.; Orellana, J.; Gonzalez, J. (2017) <doi:10.1111/rssa.12195> for more details. The second model is a temporally dependent value-added model. This model takes into account the temporal dependence that may exist in school performance between two cohorts in one of two ways. The first is by modeling school random effects with a non-stationary AR(1) process. The second is by modeling school effects based on previous cohort's post-test performance. In addition to more efficiently estimating value-added, this model permits making statements about the persistence of a schools effectiveness. The standard value-added model is also an option.
US VAERS vaccine data for 01/01/2018 - 06/14/2018. If you want to explore the full VAERS data for 1990 - Present (data, symptoms, and vaccines), then check out the vaers package from the URL below. The URL and BugReports
below correspond to the vaers package, of which vaersvax is a small subset (2018 only). vaers is not hosted on CRAN due to the large size of the data set. To install the Suggested vaers and vaersND
packages, use the following R code: devtools::install_git("<https://gitlab.com/iembry/vaers.git>", build_vignettes = TRUE) and devtools::install_git("<https://gitlab.com/iembry/vaersND.git>
", build_vignettes = TRUE)'. "The Vaccine Adverse Event Reporting System (VAERS) is a national early warning system to detect possible safety problems in U.S.-licensed vaccines. VAERS is co-managed by the Centers for Disease Control and Prevention (CDC) and the U.S. Food and Drug Administration (FDA)." For more information about the data, visit <https://vaers.hhs.gov/>. For information about vaccination/immunization hazards, visit <http://www.questionuniverse.com/rethink.html#vaccine>.
This package provides a comprehensive Shiny application for analyzing Whole Genome Duplication ('WGD') events. This package provides a user-friendly Shiny web application for non-experienced researchers to prepare input data and execute command lines for several well-known WGD analysis tools, including wgd', ksrates', i-ADHoRe
', OrthoFinder
', and Whale'. This package also provides the source code for experienced researchers to adjust and install the package to their own server. Key Features 1) Input Data Preparation This package allows users to conveniently upload and format their data, making it compatible with various WGD analysis tools. 2) Command Line Generation This package automatically generates the necessary command lines for selected WGD analysis tools, reducing manual errors and saving time. 3) Visualization This package offers interactive visualizations to explore and interpret WGD results, facilitating in-depth WGD analysis. 4) Comparative Genomics Users can study and compare WGD events across different species, aiding in evolutionary and comparative genomics studies. 5) User-Friendly Interface This Shiny web application provides an intuitive and accessible interface, making WGD analysis accessible to researchers and bioinformaticians of all levels.