Provide functions for overlaps clustering, fuzzy clustering and interval-valued data manipulation. The package implement the following algorithms: OKM (Overlapping Kmeans) from Cleuziou, G. (2007) <doi:10.1109/icpr.2008.4761079> ; NEOKM (Non-exhaustive overlapping Kmeans) from Whang, J. J., Dhillon, I. S., and Gleich, D. F. (2015) <doi:10.1137/1.9781611974010.105> ; Fuzzy Cmeans from Bezdek, J. C. (1981) <doi:10.1007/978-1-4757-0450-1> ; Fuzzy I-Cmeans from de A.T. De Carvalho, F. (2005) <doi:10.1016/j.patrec.2006.08.014>.
This package provides a general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.
This package provides a novel framework to able to automatically develop and deploy an accurate Multiple Classifier System based on the feature-clustering distribution achieved from an input dataset. D2MCS was developed focused on four main aspects: (i) the ability to determine an effective method to evaluate the independence of features, (ii) the identification of the optimal number of feature clusters, (iii) the training and tuning of ML models and (iv) the execution of voting schemes to combine the outputs of each classifier comprising the Multiple Classifier System.
The fftab package stores Fourier coefficients in a tibble and allows their manipulation in various ways. Functions are available for converting between complex, rectangular ('re', im'), and polar ('mod', arg') representations, as well as for extracting components as vectors or matrices. Inputs can include vectors, time series, and arrays of arbitrary dimensions, which are restored to their original form when inverting the transform. Since fftab stores Fourier frequencies as columns in the tibble, many standard operations on spectral data can be easily performed using tidy packages like dplyr'.
We implement various tests for the composite hypothesis of testing the fit to the family of inverse Gaussian distributions. Included are methods presented by Allison, J.S., Betsch, S., Ebner, B., and Visagie, I.J.H. (2022) <doi:10.48550/arXiv.1910.14119>
, as well as two tests from Henze and Klar (2002) <doi:10.1023/A:1022442506681>. Additionally, the package implements a test proposed by Baringhaus and Gaigall (2015) <doi:10.1016/j.jmva.2015.05.013>. For each test a parametric bootstrap procedure is implemented.
Lightweight utilities for nucleic acid melting curve analysis are important in life sciences and diagnostics. This software can be used for the analysis and presentation of melting curve data from microbead-based assays (surface melting curve analysis) and reactions in solution (e.g., quantitative PCR (qPCR
), real-time isothermal Amplification). Further information are described in detail in two publications in The R Journal [ <https://journal.r-project.org/archive/2013-2/roediger-bohm-schimke.pdf>; <https://journal.r-project.org/archive/2015-1/RJ-2015-1.pdf>].
This package provides a data generator of multivariate non-normal data in R. It combines two different methods to generate non-normal data, one with user-specified multivariate skewness and kurtosis (more details can be found in the paper: Qu, Liu, & Zhang, 2019 <doi:10.3758/s13428-019-01291-5>), and the other with the given marginal skewness and kurtosis. The latter one is the widely-used Vale and Maurelli's method. It also contains a function to calculate univariate and multivariate (Mardia's Test) skew and kurtosis.
This package performs smoothed (and non-smoothed) principal/independent components analysis of functional data. Various functional pre-whitening approaches are implemented as discussed in Vidal and Aguilera (2022) â Novel whitening approaches in functional settings", <doi:10.1002/sta4.516>. Further whitening representations of functional data can be derived in terms of a few principal components, providing an avenue to explore hidden structures in low dimensional settings: see Vidal, Rosso and Aguilera (2021) â Bi-smoothed functional independent component analysis for EEG artifact removalâ , <doi:10.3390/math9111243>.
We implement two least-squares estimators under k-monotony constraint using a method based on the Support Reduction Algorithm from Groeneboom et al (2008) <DOI:10.1111/j.1467-9469.2007.00588.x>. The first one is a projection estimator on the set of k-monotone discrete functions. The second one is a projection on the set of k-monotone discrete probabilities. This package provides functions to generate samples from the spline basis from Lefevre and Loisel (2013) <DOI:10.1239/jap/1378401239>, and from mixtures of splines.
Data sets and functions to support the books "Statistics: Data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-book/> and "An R companion to Statistics: data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-r-companion/>. All datasets analysed in these books are provided in this package. In addition, the package provides functions to compute sample statistics (variance, standard deviation, mode), create raincloud and enhanced Q-Q plots, and expand Anova results into omnibus tests and tests of individual contrasts.
This package provides tools for performing variable selection in three-way data using N-PLS in combination with L1 penalization, Selectivity Ratio and VIP scores. The N-PLS model (Rasmus Bro, 1996 <DOI:10.1002/(SICI)1099-128X(199601)10:1%3C47::AID-CEM400%3E3.0.CO;2-C>) is the natural extension of PLS (Partial Least Squares) to N-way structures, and tries to maximize the covariance between X and Y data arrays. The package also adds variable selection through L1 penalization, Selectivity Ratio and VIP scores.
This package provides a toolbox for meta-analysis. This package includes: 1,a robust multivariate meta-analysis of continuous or binary outcomes; 2, a bivariate Egger's test for detecting small study effects; 3, Galaxy Plot: A New Visualization Tool of Bivariate Meta-Analysis Studies; 4, a bivariate T&F method accounting for publication bias in bivariate meta-analysis, based on symmetry of the galaxy plot. Hong C. et al(2020) <doi:10.1093/aje/kwz286>, Chongliang L. et al(2020) <doi:10.1101/2020.07.27.20161562>.
Three Shiny apps are provided that introduce Harvest Control Rules (HCR) for fisheries management. Introduction to HCRs provides a simple overview to how HCRs work. Users are able to select their own HCR and step through its performance, year by year. Biological variability and estimation uncertainty are introduced. Measuring performance builds on the previous app and introduces the idea of using performance indicators to measure HCR performance. Comparing performance allows multiple HCRs to be created and tested, and their performance compared so that the preferred HCR can be selected.
This package implements a modification to the Random Survival Forests algorithm for obtaining variable importance in high dimensional datasets. The proposed algorithm is appropriate for settings in which a silent event is observed through sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The modified algorithm incorporates a formal likelihood framework that accommodates sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The original Random Survival Forests algorithm is modified by the introduction of a new splitting criterion based on a likelihood ratio test statistic.
Calculate dissolved gas concentrations from raw MIMS (Membrane Inlet Mass Spectrometer) signal data. Use mimsy()
on a formatted CSV file to return dissolved gas concentrations (mg and microMole
) of N2, O2, Ar based on gas solubility at temperature, pressure, and salinity. See references Benson and Krause (1984), Garcia and Gordon (1992), Stull (1947), and Hamme and Emerson (2004) for more information. Easily save the output to a nicely-formatted multi-tab Excel workbook with mimsy.save()
. Supports dual-temperature standard calibration for dual-bath MIMS setups.
Three estimating equation methods are provided in this package for marginal analysis of longitudinal ordinal data with misclassified responses and covariates. The naive analysis which is solely based on the observed data without adjustment may lead to bias. The corrected generalized estimating equations (GEE2) method which is unbiased requires the misclassification parameters to be known beforehand. The corrected generalized estimating equations (GEE2) with validation subsample method estimates the misclassification parameters based on a given validation set. This package is an implementation of Chen (2013) <doi:10.1002/bimj.201200195>.
Three generalizations of the synthetic control method (which has already an implementation in package Synth') are implemented: first, MSCMT allows for using multiple outcome variables, second, time series can be supplied as economic predictors, and third, a well-defined cross-validation approach can be used. Much effort has been taken to make the implementation as stable as possible (including edge cases) without losing computational efficiency. A detailed description of the main algorithms is given in Becker and Klöà ner (2018) <doi:10.1016/j.ecosta.2017.08.002>.
This package provides functions to calculate estimates of intrinsic and extrinsic noise from the two-reporter single-cell experiment, as in Elowitz, M. B., A. J. Levine, E. D. Siggia, and P. S. Swain (2002) Stochastic gene expression in a single cell. Science, 297, 1183-1186. Functions implement multiple estimators developed for unbiasedness or min Mean Squared Error (MSE) in Fu, A. Q. and Pachter, L. (2016). Estimating intrinsic and extrinsic noise from single-cell gene expression measurements. Statistical Applications in Genetics and Molecular Biology, 15(6), 447-471.
Perform an exploration and a preliminary analysis on the dose- response relationship of nanomaterial toxicity. Several functions are provided for data exploration, including functions for creating a subset of dataset, frequency tables and plots. Inference for order restricted dose- response data is performed by testing the significance of monotonic dose-response relationship, using Williams, Marcus, M, Modified M and Likelihood ratio tests. Several methods of multiplicity adjustment are also provided. Description of the methods can be found in <https://github.com/rahmasarina/dose-response-analysis/blob/main/Methodology.pdf>.
Offers a gene-based meta-analysis test with filtering to detect gene-environment interactions (GxE
) with association data, proposed by Wang et al. (2018) <doi:10.1002/gepi.22115>. It first conducts a meta-filtering test to filter out unpromising SNPs by combining all samples in the consortia data. It then runs a test of omnibus-filtering-based GxE
meta-analysis (ofGEM
) that combines the strengths of the fixed- and random-effects meta-analysis with meta-filtering. It can also analyze data from multiple ethnic groups.
This package performs elementary probability calculations on finite sample spaces, which may be represented by data frames or lists. This package is meant to rescue some widely used functions from the archived prob package (see <https://cran.r-project.org/src/contrib/Archive/prob/>). Functionality includes setting up sample spaces, counting tools, defining probability spaces, performing set algebra, calculating probability and conditional probability, tools for simulation and checking the law of large numbers, adding random variables, and finding marginal distributions. Characteristic functions for all base R distributions are included.
Generic interface for the PX-Web/PC-Axis API. The PX-Web/PC-Axis API is used by organizations such as Statistics Sweden and Statistics Finland to disseminate data. The R package can interact with all PX-Web/PC-Axis APIs to fetch information about the data hierarchy, extract metadata and extract and parse statistics to R data.frame format. PX-Web is a solution to disseminate PC-Axis data files in dynamic tables on the web. Since 2013 PX-Web contains an API to disseminate PC-Axis files.
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying Python wrapper ('shaprpy') is available through the GitHub
repository.
This package provides functions that provide statistical methods for interval-censored (grouped) data. The package supports the estimation of linear and linear mixed regression models with interval-censored dependent variables. Parameter estimates are obtained by a stochastic expectation maximization algorithm. Furthermore, the package enables the direct (without covariates) estimation of statistical indicators from interval-censored data via an iterative kernel density algorithm. Survey and Organisation for Economic Co-operation and Development (OECD) weights can be included into the direct estimation (see, Walter, P. (2019) <doi:10.17169/refubium-1621>).