This package provides functions that fit two modern education-based value-added models. One of these models is the quantile value-added model. This model permits estimating a school's value-added based on specific quantiles of the post-test distribution. Estimating value-added based on quantiles of the post-test distribution provides a more complete picture of an education institution's contribution to learning for students of all abilities. See Page, G.L.; San Martà n, E.; Orellana, J.; Gonzalez, J. (2017) <doi:10.1111/rssa.12195> for more details. The second model is a temporally dependent value-added model. This model takes into account the temporal dependence that may exist in school performance between two cohorts in one of two ways. The first is by modeling school random effects with a non-stationary AR(1) process. The second is by modeling school effects based on previous cohort's post-test performance. In addition to more efficiently estimating value-added, this model permits making statements about the persistence of a schools effectiveness. The standard value-added model is also an option.
US VAERS vaccine data for 01/01/2018 - 06/14/2018. If you want to explore the full VAERS data for 1990 - Present (data, symptoms, and vaccines), then check out the vaers package from the URL below. The URL and BugReports below correspond to the vaers package, of which vaersvax is a small subset (2018 only). vaers is not hosted on CRAN due to the large size of the data set. To install the Suggested vaers and vaersND packages, use the following R code: devtools::install_git("<https://gitlab.com/iembry/vaers.git>", build_vignettes = TRUE) and devtools::install_git("<https://gitlab.com/iembry/vaersND.git>", build_vignettes = TRUE)'. "The Vaccine Adverse Event Reporting System (VAERS) is a national early warning system to detect possible safety problems in U.S.-licensed vaccines. VAERS is co-managed by the Centers for Disease Control and Prevention (CDC) and the U.S. Food and Drug Administration (FDA)." For more information about the data, visit <https://vaers.hhs.gov/>. For information about vaccination/immunization hazards, visit <http://www.questionuniverse.com/rethink.html#vaccine>.
This package provides a comprehensive Shiny application for analyzing Whole Genome Duplication ('WGD') events. This package provides a user-friendly Shiny web application for non-experienced researchers to prepare input data and execute command lines for several well-known WGD analysis tools, including wgd', ksrates', i-ADHoRe', OrthoFinder', and Whale'. This package also provides the source code for experienced researchers to adjust and install the package to their own server. Key Features 1) Input Data Preparation This package allows users to conveniently upload and format their data, making it compatible with various WGD analysis tools. 2) Command Line Generation This package automatically generates the necessary command lines for selected WGD analysis tools, reducing manual errors and saving time. 3) Visualization This package offers interactive visualizations to explore and interpret WGD results, facilitating in-depth WGD analysis. 4) Comparative Genomics Users can study and compare WGD events across different species, aiding in evolutionary and comparative genomics studies. 5) User-Friendly Interface This Shiny web application provides an intuitive and accessible interface, making WGD analysis accessible to researchers and bioinformaticians of all levels.
Testing homogeneity of k multivariate distributions is a classical and challenging problem in statistics, and this becomes even more challenging when the dimension of the data exceeds the sample size. We construct some tests for this purpose which are exact level (size) alpha tests based on clustering. These tests are easy to implement and distribution-free in finite sample situations. Under appropriate regularity conditions, these tests have the consistency property in HDLSS asymptotic regime, where the dimension of data grows to infinity while the sample size remains fixed. We also consider a multiscale approach, where the results for different number of partitions are aggregated judiciously. Details are in Biplab Paul, Shyamal K De and Anil K Ghosh (2021) <doi:10.1016/j.jmva.2021.104897>; Soham Sarkar and Anil K Ghosh (2019) <doi:10.1109/TPAMI.2019.2912599>; William M Rand (1971) <doi:10.1080/01621459.1971.10482356>; Cyrus R Mehta and Nitin R Patel (1983) <doi:10.2307/2288652>; Joseph C Dunn (1973) <doi:10.1080/01969727308546046>; Sture Holm (1979) <doi:10.2307/4615733>; Yoav Benjamini and Yosef Hochberg (1995) <doi: 10.2307/2346101>.
distinct is a statistical method to perform differential testing between two or more groups of distributions; differential testing is performed via hierarchical non-parametric permutation tests on the cumulative distribution functions (cdfs) of each sample. While most methods for differential expression target differences in the mean abundance between conditions, distinct, by comparing full cdfs, identifies, both, differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean (e.g., unimodal vs. bi-modal distributions with the same mean). distinct is a general and flexible tool: due to its fully non-parametric nature, which makes no assumptions on how the data was generated, it can be applied to a variety of datasets. It is particularly suitable to perform differential state analyses on single cell data (i.e., differential analyses within sub-populations of cells), such as single cell RNA sequencing (scRNA-seq) and high-dimensional flow or mass cytometry (HDCyto) data. To use distinct one needs data from two or more groups of samples (i.e., experimental conditions), with at least 2 samples (i.e., biological replicates) per group.
Statistical methods that quantify the conditions necessary to alter inferences, also known as sensitivity analysis, are becoming increasingly important to a variety of quantitative sciences. A series of recent works, including Frank (2000) <doi:10.1177/0049124100029002001> and Frank et al. (2013) <doi:10.3102/0162373713493129> extend previous sensitivity analyses by considering the characteristics of omitted variables or unobserved cases that would change an inference if such variables or cases were observed. These analyses generate statements such as "an omitted variable would have to be correlated at xx with the predictor of interest (e.g., the treatment) and outcome to invalidate an inference of a treatment effect". Or "one would have to replace pp percent of the observed data with nor which the treatment had no effect to invalidate the inference". We implement these recent developments of sensitivity analysis and provide modules to calculate these two robustness indices and generate such statements in R. In particular, the functions konfound(), pkonfound() and mkonfound() allow users to calculate the robustness of inferences for a user's own model, a single published study and multiple studies respectively.
The developed function is designed for the generation of spatial grids based on user-specified longitude and latitude coordinates. The function first validates the input longitude and latitude values, ensuring they fall within the appropriate geographic ranges. It then creates a polygon from the coordinates and determines the appropriate Universal Transverse Mercator zone based on the provided hemisphere and longitude values. Subsequently, transforming the input Shapefile to the Universal Transverse Mercator projection when necessary. Finally, a spatial grid is generated with the specified interval and saved as a Shapefile. For method details see, Brus,D.J.(2022).<DOI:10.1201/9781003258940>. The function takes into account crucial parameters such as the hemisphere (north or south), desired grid interval, and the output Shapefile path. The developed function is an efficient tool, simplifying the process of empty spatial grid generation for applications such as, geo-statistical analysis, digital soil mapping product generation, etc. Whether for environmental studies, urban planning, or any other geo-spatial analysis, this package caters to the diverse needs of users working with spatial data, enhancing the accessibility and ease of spatial data processing and visualization.
We provide a solution for performing permutation tests on linear and mixed linear regression models. It allows users to obtain accurate p-values without making distributional assumptions about the data. By generating a null distribution of the test statistics through repeated permutations of the response variable, permutation tests provide a powerful alternative to traditional parameter tests (Holt et al. (2023) <doi:10.1007/s10683-023-09799-6>). In this early version, we focus on the permutation tests over observed t values of beta coefficients, i.e.original t values generated by parameter tests. After generating a null distribution of the test statistic through repeated permutations of the response variable, each observed t values would be compared to the null distribution to generate a p-value. To improve the efficiency,a stop criterion (Anscombe (1953) <doi:10.1111/j.2517-6161.1953.tb00121.x>) is adopted to force permutation to stop if the estimated standard deviation of the value falls below a fraction of the estimated p-value. By doing so, we avoid the need for massive calculations in exact permutation methods while still generating stable and accurate p-values.
This package provides a tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>. Functions to assess identity and attribute disclosure for the original and for the synthetic data are included in the package, and their use is illustrated in a vignette on disclosure (Practical Privacy Metrics for Synthetic Data).
Presentation-quality tables are displayed as plots on an R graphics device. Although there are other packages that format tables for display, this package is unique in combining two features: (a) It is aware of the logical structure of the table being presented, and makes use of that for automatic layout and styling of the table. This avoids the need for most manual adjustments to achieve an attractive result. (b) It displays tables using ggplot2 graphics. Therefore a table can be presented anywhere a graph could be, with no more effort. External software such as LaTeX or HTML or their viewers is not required. The package provides a full set of tools to control the style and appearance of tables, including titles, footnotes and reference marks, horizontal and vertical rules, and spacing of rows and columns. Methods are included to display matrices; data frames; tables created by R's ftable(), table(), and xtabs() functions; and tables created by the tables and xtable packages. Methods can be added to display other table-like objects. A vignette is included that illustrates usage and options available in the package.
Efficiently and flexibly preprocess data using a set of data filtering, deletion, and interpolation tools. These data preprocessing methods are developed based on the principles of completeness, accuracy, threshold method, and linear interpolation and through the setting of constraint conditions, time completion & recovery, and fast & efficient calculation and grouping. Key preprocessing steps include deletions of variables and observations, outlier removal, and missing values (NA) interpolation, which are dependent on the incomplete and dispersed degrees of raw data. They clean data more accurately, keep more samples, and add no outliers after interpolation, compared with ordinary methods. Auto-identification of consecutive NA via run-length based grouping is used in observation deletion, outlier removal, and NA interpolation; thus, new outliers are not generated in interpolation. Conditional extremum is proposed to realize point-by-point weighed outlier removal that saves non-outliers from being removed. Plus, time series interpolation with values to refer to within short periods further ensures reliable interpolation. These methods are based on and improved from the reference: Liang, C.-S., Wu, H., Li, H.-Y., Zhang, Q., Li, Z. & He, K.-B. (2020) <doi:10.1016/j.scitotenv.2020.140923>.
We developed a package Keyboard for designing single-agent, drug-combination, or phase I/II dose-finding clinical trials. The Keyboard designs are novel early phase trial designs that can be implemented simply and transparently, similar to the 3+3 design, but yield excellent performance, comparable to those of more-complicated, model-based designs (Yan F, Mandrekar SJ, Yuan Y (2017) <doi:10.1158/1078-0432.CCR-17-0220>, Li DH, Whitmore JB, Guo W, Ji Y. (2017) <doi:10.1158/1078-0432.CCR-16-1125>, Liu S, Johnson VE (2016) <doi:10.1093/biostatistics/kxv040>, Zhou Y, Lee JJ, Yuan Y (2019) <doi:10.1002/sim.8475>, Pan H, Lin R, Yuan Y (2020) <doi:10.1016/j.cct.2020.105972>). The Keyboard package provides tools for designing, conducting, and analyzing single-agent, drug-combination, and phase I/II dose-finding clinical trials. For more details about how to use this packge, please refer to Li C, Sun H, Cheng C, Tang L, and Pan H. (2022) "A software tool for both the maximum tolerated dose and the optimal biological dose finding trials in early phase designs". Manuscript submitted for publication.
This package facilitates analysis of both timecourse and steady state flow cytometry experiments. This package was originially developed for quantifying the function of gene regulatory networks in yeast (strain W303) expressing fluorescent reporter proteins using BD Accuri C6 and SORP cytometers. However, the functions are for the most part general and may be adapted for analysis of other organisms using other flow cytometers. Functions in this package facilitate the annotation of flow cytometry data with experimental metadata, as often required for publication and general ease-of-reuse. Functions for creating, saving and loading gate sets are also included. In the past, we have typically generated summary statistics for each flowset for each timepoint and then annotated and analyzed these summary statistics. This method loses a great deal of the power that comes from the large amounts of individual cell data generated in flow cytometry, by essentially collapsing this data into a bulk measurement after subsetting. In addition to these summary functions, this package also contains functions to facilitate annotation and analysis of steady-state or time-lapse data utilizing all of the data collected from the thousands of individual cells in each sample.
There are several non-functional-form-based interaction tests for testing interaction in unreplicated two-way layouts. However, no single test can detect all patterns of possible interaction and the tests are sensitive to a particular pattern of interaction. This package combines six non-functional-form-based interaction tests for testing additivity. These six tests were proposed by Boik (1993) <doi:10.1080/02664769300000004>, Piepho (1994), Kharrati-Kopaei and Sadooghi-Alvandi (2007) <doi:10.1080/03610920701386851>, Franck et al. (2013) <doi:10.1016/j.csda.2013.05.002>, Malik et al. (2016) <doi:10.1080/03610918.2013.870196> and Kharrati-Kopaei and Miller (2016) <doi:10.1080/00949655.2015.1057821>. The p-values of these six tests are combined by Bonferroni, Sidak, Jacobi polynomial expansion, and the Gaussian copula methods to provide researchers with a testing approach which leverages many existing methods to detect disparate forms of non-additivity. This package is based on the following published paper: Shenavari and Kharrati-Kopaei (2018) "A Method for Testing Additivity in Unreplicated Two-Way Layouts Based on Combining Multiple Interaction Tests". In addition, several sentences in help files or descriptions were copied from that paper.
This package provides set of functions aimed at epidemiologists. The package includes commands for measures of association and impact for case control studies and cohort studies. It may be particularly useful for outbreak investigations including univariable analysis and stratified analysis. The functions for cohort studies include the CS(), CSTable() and CSInter() commands. The functions for case control studies include the CC(), CCTable() and CCInter() commands. References - Cornfield, J. 1956. A statistical problem arising from retrospective studies. In Vol. 4 of Proceedings of the Third Berkeley Symposium, ed. J. Neyman, 135-148. Berkeley, CA - University of California Press. Woolf, B. 1955. On estimating the relation between blood group disease. Annals of Human Genetics 19 251-253. Reprinted in Evolution of Epidemiologic Ideas Annotated Readings on Concepts and Methods, ed. S. Greenland, pp. 108-110. Newton Lower Falls, MA Epidemiology Resources. Gilles Desve & Peter Makary, 2007. CSTABLE Stata module to calculate summary table for cohort study Statistical Software Components S456879, Boston College Department of Economics. Gilles Desve & Peter Makary, 2007. CCTABLE Stata module to calculate summary table for case-control study Statistical Software Components S456878, Boston College Department of Economics.
Bayesian regularized quantile regression utilizing two major classes of shrinkage priors (the spike-and-slab priors and the horseshoe family of priors) leads to efficient Bayesian shrinkage estimation, variable selection and valid statistical inference. In this package, we have implemented robust Bayesian variable selection with spike-and-slab priors under high-dimensional linear regression models (Fan et al. (2024) <doi:10.3390/e26090794> and Ren et al. (2023) <doi:10.1111/biom.13670>), and regularized quantile varying coefficient models (Zhou et al.(2023) <doi:10.1016/j.csda.2023.107808>). In particular, valid robust Bayesian inferences under both models in the presence of heavy-tailed errors can be validated on finite samples. Additional models with spike-and-slab priors include robust Bayesian group LASSO and robust binary Bayesian LASSO (Fan and Wu (2025) <doi:10.1002/sta4.70078>). Besides, robust sparse Bayesian regression with the horseshoe family of (horseshoe, horseshoe+ and regularized horseshoe) priors has also been implemented and yielded valid inference results under heavy-tailed model errors(Fan et al.(2025) <doi:10.48550/arXiv.2507.10975>). The Markov chain Monte Carlo (MCMC) algorithms of the proposed and alternative models are implemented in C++.
In agricultural, post-harvest and processing, engineering and industrial experiments factors are often differentiated with ease with which they can change from experimental run to experimental run. This is due to the fact that one or more factors may be expensive or time consuming to change i.e. hard-to-change factors. These factors restrict the use of complete randomization as it may make the experiment expensive and time consuming. Split plot designs can be used for such situations. In general model estimation of split plot designs require the use of generalized least squares (GLS). However for some split-plot designs ordinary least squares (OLS) estimates are equivalent to generalized least squares (GLS) estimates. These types of designs are known in literature as equivalent-estimation split-plot design. For method details see, Macharia, H. and Goos, P.(2010) <doi:10.1080/00224065.2010.11917833>.Balanced split plot designs are designs which have an equal number of subplots within every whole plot. This package used to construct equivalent estimation balanced split plot designs for different experimental set ups along with different statistical criteria to measure the performance of these designs. It consist of the function equivalent_BSPD().
Obtaining accurate and stable estimates of regression coefficients can be challenging when the suggested statistical model has issues related to multicollinearity, convergence, or overfitting. One solution is to use principal component analysis (PCA) results in the regression, as discussed in Chan and Park (2005) <doi:10.1080/01446190500039812>. The swaprinc() package streamlines comparisons between a raw regression model with the full set of raw independent variables and a principal component regression model where principal components are estimated on a subset of the independent variables, then swapped into the regression model in place of those variables. The swaprinc() function compares one raw regression model to one principal component regression model, while the compswap() function compares one raw regression model to many principal component regression models. Package functions include parameters to center, scale, and undo centering and scaling, as described by Harvey and Hansen (2022) <https://cran.r-project.org/package=LearnPCA/vignettes/Vig_03_Step_By_Step_PCA.pdf>. Additionally, the package supports using Gifi methods to extract principal components from categorical variables, as outlined by Rossiter (2021) <https://www.css.cornell.edu/faculty/dgr2/_static/files/R_html/NonlinearPCA.html#2_Package>.
We provide functions to fit finite mixtures of multivariate normal or t-distributions to data with various factor analytic structures adopted for the covariance/scale matrices. The factor analytic structures available include mixtures of factor analyzers and mixtures of common factor analyzers. The latter approach is so termed because the matrix of factor loadings is common to components before the component-specific rotation of the component factors to make them white noise. Note that the component-factor loadings are not common after this rotation. Maximum likelihood estimators of model parameters are obtained via the Expectation-Maximization algorithm. See descriptions of the algorithms used in McLachlan GJ, Peel D (2000) <doi:10.1002/0471721182.ch8> McLachlan GJ, Peel D (2000) <ISBN:1-55860-707-2> McLachlan GJ, Peel D, Bean RW (2003) <doi:10.1016/S0167-9473(02)00183-4> McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007) <doi:10.1016/j.csda.2006.09.015> Baek J, McLachlan GJ, Flack LK (2010) <doi:10.1109/TPAMI.2009.149> Baek J, McLachlan GJ (2011) <doi:10.1093/bioinformatics/btr112> McLachlan GJ, Baek J, Rathnayake SI (2011) <doi:10.1002/9781119995678.ch9>.
Method and tool for generating hybrid time series forecasts using an error remodeling approach. These forecasting approaches utilize a recursive technique for modeling the linearity of the series using a linear method (e.g., ARIMA, Theta, etc.) and then models (forecasts) the residuals of the linear forecaster using non-linear neural networks (e.g., ANN, ARNN, etc.). The hybrid architectures comprise three steps: firstly, the linear patterns of the series are forecasted which are followed by an error re-modeling step, and finally, the forecasts from both the steps are combined to produce the final output. This method additionally provides the confidence intervals as needed. Ten different models can be implemented using this package. This package generates different types of hybrid error correction models for time series forecasting based on the algorithms by Zhang. (2003), Chakraborty et al. (2019), Chakraborty et al. (2020), Bhattacharyya et al. (2021), Chakraborty et al. (2022), and Bhattacharyya et al. (2022) <doi:10.1016/S0925-2312(01)00702-0> <doi:10.1016/j.physa.2019.121266> <doi:10.1016/j.chaos.2020.109850> <doi:10.1109/IJCNN52387.2021.9533747> <doi:10.1007/978-3-030-72834-2_29> <doi:10.1007/s11071-021-07099-3>.
The name of the package is derived from the French, pour ridge, and provides functionality for ridge-type estimation of a potpourri of models. Currently, this estimation concerns that of various Gaussian graphical models from different study designs. Among others it considers the regular Gaussian graphical model and a mixture of such models. The porridge-package implements the estimation of the former either from i) data with replicated observations by penalized loglikelihood maximization using the regular ridge penalty on the parameters (van Wieringen, Chen, 2021) or ii) from non-replicated data by means of either a ridge estimator with multiple shrinkage targets (as presented in van Wieringen et al. 2020, <doi:10.1016/j.jmva.2020.104621>) or the generalized ridge estimator that allows for both the inclusion of quantitative and qualitative prior information on the precision matrix via element-wise penalization and shrinkage (van Wieringen, 2019, <doi:10.1080/10618600.2019.1604374>). Additionally, the porridge-package facilitates the ridge penalized estimation of a mixture of Gaussian graphical models (Aflakparast et al., 2018). On another note, the package also includes functionality for ridge-type estimation of the generalized linear model (as presented in van Wieringen, Binder, 2022, <doi:10.1080/10618600.2022.2035231>).
Fits right-truncated meta-analysis (RTMA), a bias correction for the joint effects of p-hacking (i.e., manipulation of results within studies to obtain significant, positive estimates) and traditional publication bias (i.e., the selective publication of studies with significant, positive results) in meta-analyses [see Mathur MB (2022). "Sensitivity analysis for p-hacking in meta-analyses." <doi:10.31219/osf.io/ezjsx>.]. Unlike publication bias alone, p-hacking that favors significant, positive results (termed "affirmative") can distort the distribution of affirmative results. To bias-correct results from affirmative studies would require strong assumptions on the exact nature of p-hacking. In contrast, joint p-hacking and publication bias do not distort the distribution of published nonaffirmative results when there is stringent p-hacking (e.g., investigators who hack always eventually obtain an affirmative result) or when there is stringent publication bias (e.g., nonaffirmative results from hacked studies are never published). This means that any published nonaffirmative results are from unhacked studies. Under these assumptions, RTMA involves analyzing only the published nonaffirmative results to essentially impute the full underlying distribution of all results prior to selection due to p-hacking and/or publication bias. The package also provides diagnostic plots described in Mathur (2022).
This package provides a collection of tools that allow users to perform critical steps in the process of assessing ecological niche evolution over phylogenies, with uncertainty incorporated explicitly in reconstructions. The method proposed here for ancestral reconstruction of ecological niches characterizes species niches using a bin-based approach that incorporates uncertainty in estimations. Compared to other existing methods, the approaches presented here reduce risk of overestimation of amounts and rates of ecological niche evolution. The main analyses include: initial exploration of environmental data in occurrence records and accessible areas, preparation of data for phylogenetic analyses, executing comparative phylogenetic analyses of ecological niches, and plotting for interpretations. Details on the theoretical background and methods used can be found in: Owens et al. (2020) <doi:10.1002/ece3.6359>, Peterson et al. (1999) <doi:10.1126/science.285.5431.1265>, Soberón and Peterson (2005) <doi:10.17161/bi.v2i0.4>, Peterson (2011) <doi:10.1111/j.1365-2699.2010.02456.x>, Barve et al. (2011) <doi:10.1111/ecog.02671>, Machado-Stredel et al. (2021) <doi:10.21425/F5FBG48814>, Owens et al. (2013) <doi:10.1016/j.ecolmodel.2013.04.011>, Saupe et al. (2018) <doi:10.1093/sysbio/syx084>, and Cobos et al. (2021) <doi:10.1111/jav.02868>.
In order to facilitate the adjustment of the sample selection models existing in the literature, we created the ssmodels package. Our package allows the adjustment of the classic Heckman model (Heckman (1976), Heckman (1979) <doi:10.2307/1912352>), and the estimation of the parameters of this model via the maximum likelihood method and two-step method, in addition to the adjustment of the Heckman-t models introduced in the literature by Marchenko and Genton (2012) <doi:10.1080/01621459.2012.656011> and the Heckman-Skew model introduced in the literature by Ogundimu and Hutton (2016) <doi:10.1111/sjos.12171>. We also implemented functions to adjust the generalized version of the Heckman model, introduced by Bastos, Barreto-Souza, and Genton (2021) <doi:10.5705/ss.202021.0068>, that allows the inclusion of covariables to the dispersion and correlation parameters, and a function to adjust the Heckman-BS model introduced by Bastos and Barreto-Souza (2020) <doi:10.1080/02664763.2020.1780570> that uses the Birnbaum-Saunders distribution as a joint distribution of the selection and primary regression variables. This package extends and complements existing R packages such as sampleSelection (Toomet and Henningsen, 2008) and ssmrob (Zhelonkin et al., 2016), providing additional robust and flexible sample selection models.