This package provides a data generator of multivariate non-normal data in R. It combines two different methods to generate non-normal data, one with user-specified multivariate skewness and kurtosis (more details can be found in the paper: Qu, Liu, & Zhang, 2019 <doi:10.3758/s13428-019-01291-5>), and the other with the given marginal skewness and kurtosis. The latter one is the widely-used Vale and Maurelli's method. It also contains a function to calculate univariate and multivariate (Mardia's Test) skew and kurtosis.
We implement two least-squares estimators under k-monotony constraint using a method based on the Support Reduction Algorithm from Groeneboom et al (2008) <DOI:10.1111/j.1467-9469.2007.00588.x>. The first one is a projection estimator on the set of k-monotone discrete functions. The second one is a projection on the set of k-monotone discrete probabilities. This package provides functions to generate samples from the spline basis from Lefevre and Loisel (2013) <DOI:10.1239/jap/1378401239>, and from mixtures of splines.
This package performs smoothed (and non-smoothed) principal/independent components analysis of functional data. Various functional pre-whitening approaches are implemented as discussed in Vidal and Aguilera (2022) â Novel whitening approaches in functional settings", <doi:10.1002/sta4.516>. Further whitening representations of functional data can be derived in terms of a few principal components, providing an avenue to explore hidden structures in low dimensional settings: see Vidal, Rosso and Aguilera (2021) â Bi-smoothed functional independent component analysis for EEG artifact removalâ , <doi:10.3390/math9111243>.
This package provides tools for performing variable selection in three-way data using N-PLS in combination with L1 penalization, Selectivity Ratio and VIP scores. The N-PLS model (Rasmus Bro, 1996 <DOI:10.1002/(SICI)1099-128X(199601)10:1%3C47::AID-CEM400%3E3.0.CO;2-C>) is the natural extension of PLS (Partial Least Squares) to N-way structures, and tries to maximize the covariance between X and Y data arrays. The package also adds variable selection through L1 penalization, Selectivity Ratio and VIP scores.
Data sets and functions to support the books "Statistics: Data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-book/> and "An R companion to Statistics: data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-r-companion/>. All datasets analysed in these books are provided in this package. In addition, the package provides functions to compute sample statistics (variance, standard deviation, mode), create raincloud and enhanced Q-Q plots, and expand Anova results into omnibus tests and tests of individual contrasts.
This package provides a toolbox for meta-analysis. This package includes: 1,a robust multivariate meta-analysis of continuous or binary outcomes; 2, a bivariate Egger's test for detecting small study effects; 3, Galaxy Plot: A New Visualization Tool of Bivariate Meta-Analysis Studies; 4, a bivariate T&F method accounting for publication bias in bivariate meta-analysis, based on symmetry of the galaxy plot. Hong C. et al(2020) <doi:10.1093/aje/kwz286>, Chongliang L. et al(2020) <doi:10.1101/2020.07.27.20161562>.
DEqMS is developped on top of Limma. However, Limma assumes same prior variance for all genes. In proteomics, the accuracy of protein abundance estimates varies by the number of peptides/PSMs quantified in both label-free and labelled data. Proteins quantification by multiple peptides or PSMs are more accurate. DEqMS package is able to estimate different prior variances for proteins quantified by different number of PSMs/peptides, therefore acchieving better accuracy. The package can be applied to analyze both label-free and labelled proteomics data.
The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.
Three Shiny apps are provided that introduce Harvest Control Rules (HCR) for fisheries management. Introduction to HCRs provides a simple overview to how HCRs work. Users are able to select their own HCR and step through its performance, year by year. Biological variability and estimation uncertainty are introduced. Measuring performance builds on the previous app and introduces the idea of using performance indicators to measure HCR performance. Comparing performance allows multiple HCRs to be created and tested, and their performance compared so that the preferred HCR can be selected.
This package implements a modification to the Random Survival Forests algorithm for obtaining variable importance in high dimensional datasets. The proposed algorithm is appropriate for settings in which a silent event is observed through sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The modified algorithm incorporates a formal likelihood framework that accommodates sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The original Random Survival Forests algorithm is modified by the introduction of a new splitting criterion based on a likelihood ratio test statistic.
Calculate dissolved gas concentrations from raw MIMS (Membrane Inlet Mass Spectrometer) signal data. Use mimsy() on a formatted CSV file to return dissolved gas concentrations (mg and microMole) of N2, O2, Ar based on gas solubility at temperature, pressure, and salinity. See references Benson and Krause (1984), Garcia and Gordon (1992), Stull (1947), and Hamme and Emerson (2004) for more information. Easily save the output to a nicely-formatted multi-tab Excel workbook with mimsy.save(). Supports dual-temperature standard calibration for dual-bath MIMS setups.
Three estimating equation methods are provided in this package for marginal analysis of longitudinal ordinal data with misclassified responses and covariates. The naive analysis which is solely based on the observed data without adjustment may lead to bias. The corrected generalized estimating equations (GEE2) method which is unbiased requires the misclassification parameters to be known beforehand. The corrected generalized estimating equations (GEE2) with validation subsample method estimates the misclassification parameters based on a given validation set. This package is an implementation of Chen (2013) <doi:10.1002/bimj.201200195>.
Three generalizations of the synthetic control method (which has already an implementation in package Synth') are implemented: first, MSCMT allows for using multiple outcome variables, second, time series can be supplied as economic predictors, and third, a well-defined cross-validation approach can be used. Much effort has been taken to make the implementation as stable as possible (including edge cases) without losing computational efficiency. A detailed description of the main algorithms is given in Becker and Klöà ner (2018) <doi:10.1016/j.ecosta.2017.08.002>.
This package provides functions to calculate estimates of intrinsic and extrinsic noise from the two-reporter single-cell experiment, as in Elowitz, M. B., A. J. Levine, E. D. Siggia, and P. S. Swain (2002) Stochastic gene expression in a single cell. Science, 297, 1183-1186. Functions implement multiple estimators developed for unbiasedness or min Mean Squared Error (MSE) in Fu, A. Q. and Pachter, L. (2016). Estimating intrinsic and extrinsic noise from single-cell gene expression measurements. Statistical Applications in Genetics and Molecular Biology, 15(6), 447-471.
Perform an exploration and a preliminary analysis on the dose- response relationship of nanomaterial toxicity. Several functions are provided for data exploration, including functions for creating a subset of dataset, frequency tables and plots. Inference for order restricted dose- response data is performed by testing the significance of monotonic dose-response relationship, using Williams, Marcus, M, Modified M and Likelihood ratio tests. Several methods of multiplicity adjustment are also provided. Description of the methods can be found in <https://github.com/rahmasarina/dose-response-analysis/blob/main/Methodology.pdf>.
Offers a gene-based meta-analysis test with filtering to detect gene-environment interactions (GxE) with association data, proposed by Wang et al. (2018) <doi:10.1002/gepi.22115>. It first conducts a meta-filtering test to filter out unpromising SNPs by combining all samples in the consortia data. It then runs a test of omnibus-filtering-based GxE meta-analysis (ofGEM) that combines the strengths of the fixed- and random-effects meta-analysis with meta-filtering. It can also analyze data from multiple ethnic groups.
This package performs elementary probability calculations on finite sample spaces, which may be represented by data frames or lists. This package is meant to rescue some widely used functions from the archived prob package (see <https://cran.r-project.org/src/contrib/Archive/prob/>). Functionality includes setting up sample spaces, counting tools, defining probability spaces, performing set algebra, calculating probability and conditional probability, tools for simulation and checking the law of large numbers, adding random variables, and finding marginal distributions. Characteristic functions for all base R distributions are included.
Generic interface for the PX-Web/PC-Axis API. The PX-Web/PC-Axis API is used by organizations such as Statistics Sweden and Statistics Finland to disseminate data. The R package can interact with all PX-Web/PC-Axis APIs to fetch information about the data hierarchy, extract metadata and extract and parse statistics to R data.frame format. PX-Web is a solution to disseminate PC-Axis data files in dynamic tables on the web. Since 2013 PX-Web contains an API to disseminate PC-Axis files.
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying Python wrapper ('shaprpy') is available through the GitHub repository.
This package provides functions that provide statistical methods for interval-censored (grouped) data. The package supports the estimation of linear and linear mixed regression models with interval-censored dependent variables. Parameter estimates are obtained by a stochastic expectation maximization algorithm. Furthermore, the package enables the direct (without covariates) estimation of statistical indicators from interval-censored data via an iterative kernel density algorithm. Survey and Organisation for Economic Co-operation and Development (OECD) weights can be included into the direct estimation (see, Walter, P. (2019) <doi:10.17169/refubium-1621>).
We propose a general ensemble classification framework, RaSE algorithm, for the sparse classification problem. In RaSE algorithm, for each weak learner, some random subspaces are generated and the optimal one is chosen to train the model on the basis of some criterion. To be adapted to the problem, a novel criterion, ratio information criterion (RIC) is put up with based on Kullback-Leibler divergence. Besides minimizing RIC, multiple criteria can be applied, for instance, minimizing extended Bayesian information criterion (eBIC), minimizing training error, minimizing the validation error, minimizing the cross-validation error, minimizing leave-one-out error. There are various choices of base classifier, for instance, linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbour, logistic regression, decision trees, random forest, support vector machines. RaSE algorithm can also be applied to do feature ranking, providing us the importance of each feature based on the selected percentage in multiple subspaces. RaSE framework can be extended to the general prediction framework, including both classification and regression. We can use the selected percentages of variables for variable screening. The latest version added the variable screening function for both regression and classification problems.
mastR is an R package designed for automated screening of signatures of interest for specific research questions. The package is developed for generating refined lists of signature genes from multiple group comparisons based on the results from edgeR and limma differential expression (DE) analysis workflow. It also takes into account the background noise of tissue-specificity, which is often ignored by other marker generation tools. This package is particularly useful for the identification of group markers in various biological and medical applications, including cancer research and developmental biology.
This package provides a compendium of new geometries, coordinate systems, statistical transformations, scales and fonts for ggplot2, including splines, 1d and 2d densities, univariate average shifted histograms, a new map coordinate system based on the PROJ.4-library along with geom_cartogram() that mimics the original functionality of geom_map(), formatters for "bytes", a stat_stepribbon() function, increased plotly compatibility and the StateFace open source font ProPublica. Further new functionality includes lollipop charts, dumbbell charts, the ability to encircle points and coordinate-system-based text annotations.
This package works as a prelude replacement for Haskell, providing more functionality and types out of the box than the standard prelude (such as common data types like ByteString and Text), as well as removing common ``gotchas'', like partial functions and lazy I/O. The guiding principle here is:
If something is safe to use in general and has no expected naming conflicts, expose it.
If something should not always be used, or has naming conflicts, expose it from another module in the hierarchy.