Nonparametric methods for analysis of covariance (ANCOVA) are distribution-free and provide a flexible statistical framework for situations where the assumptions of parametric ANCOVA are violated or when the response variable is ordinal. This package implements several well-known nonparametric ANCOVA procedures, including Quade, Puri and Sen, McSweeney and Porter, Burnett and Barr, Hettmansperger and McKean, Shirley, and Puri-Sen-Harwell-Serlin. The package provides user-friendly functions to apply these methods in practice. These methods are described in Olejnik et al. (1985) <doi:10.1177/0193841X8500900104> and Harwell et al. (1988) <doi:10.1037/0033-2909.104.2.268>.
This package provides randomization using permutation for applications. To provide a Quality Control (QC) check, QC samples can be randomized within strata. A second function allows for the ability to â switchâ samples to meet set requirements and perform a certain amount of minimization on these switches. The functions are flexible for users by specifying strata size and number of QC samples per strata. The randomization meets the following requirements â ¢ QC sample requirements: QC samples not adjacent, QC samples from same mother must follow certain patterns. â ¢ Matched sample sets must be within a single strata, and next to each other.
The constructs used to study the human psychology have many definitions and corresponding instructions for eliciting and coding qualitative data pertaining to constructs content and for measuring the constructs. This plethora of definitions and instructions necessitates unequivocal reference to specific definitions and instructions in empirical and secondary research. This package implements a human- and machine-readable standard for specifying construct definitions and instructions for measurement and qualitative research based on YAML'. This standard facilitates systematic unequivocal reference to specific construct definitions and corresponding instructions in a decentralized manner (i.e. without requiring central curation; Peters (2020) <doi:10.31234/osf.io/xebhn>).
This package implements variational Bayesian algorithms to perform scalable variable selection for sparse, high-dimensional linear and logistic regression models. Features include a novel prioritized updating scheme, which uses a preliminary estimator of the variational means during initialization to generate an updating order prioritizing large, more relevant, coefficients. Sparsity is induced via spike-and-slab priors with either Laplace or Gaussian slabs. By default, the heavier-tailed Laplace density is used. Formal derivations of the algorithms and asymptotic consistency results may be found in Kolyan Ray and Botond Szabo (JASA 2020) and Kolyan Ray, Botond Szabo, and Gabriel Clara (NeurIPS 2020).
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.
Normalizes a data matrix `data` by raking (using the RAS method by Bacharach, see references) the Nrows by Ncols matrix such that the row means and column means equal 1. The result is a normalized data matrix `K=RAS`, a product of row mulipliers `R` and column multipliers `S` with the original matrix `A`. Missing information needs to be presented as `NA` values and not as zero values, because CONSTANd is able to ignore missing values when calculating the mean. Using CONSTANd normalization allows for the direct comparison of values between samples within the same and even across different CONSTANd-normalized data matrices.
The caroline R library contains dozens of functions useful for: database migration (dbWriteTable2), database style joins & aggregation (nerge, groupBy, & bestBy), data structure conversion (nv, tab2df), legend table making (sstable & leghead), automatic legend positioning for scatter and box plots (), plot annotation (labsegs & mvlabs), data visualization (pies, sparge, confound.grid & raPlot), character string manipulation (m & pad), file I/O (write.delim), batch scripting, data exploration, and more. The package's greatest contributions lie in the database style merge, aggregation and interface functions as well as in it's extensive use and propagation of row, column and vector names in most functions.
This package provides diagnostic graphic tools for GLMs, beta-binomial regression model (estimated by VGAM package), beta regression model (estimated by betareg package) and negative binomial regression model (estimated by MASS package). Since most of functions implemented in glmxdiag already exist in other packages, the aim is to provide the user unique functions that work on almost all regression models previously specified. Details about some of the implemented functions can be found in Brown (1992) <doi:10.2307/2347617>, Dunn and Smyth (1996) <doi:10.2307/1390802>, O'Hara Hines and Carter (1993) <doi:10.2307/2347405>, Wang (1985) <doi:10.2307/1269708>.
With the deprecation of mocking capabilities shipped with testthat as of edition 3 it is left to third-party packages to replace this functionality, which in some test-scenarios is essential in order to run unit tests in limited environments (such as no Internet connection). Mocking in this setting means temporarily substituting a function with a stub that acts in some sense like the original function (for example by serving a HTTP response that has been cached as a file). The only exported function with_mock() is modeled after the eponymous testthat function with the intention of providing a drop-in replacement.
This package provides functions for making run charts [Anhoej, Olesen (2014) <doi:10.1371/journal.pone.0113825>] and basic Shewhart control charts [Mohammed, Worthington, Woodall (2008) <doi:10.1136/qshc.2004.012047>] for measure and count data. The main function, qic(), creates run and control charts and has a simple interface with a rich set of options to control data analysis and plotting, including options for automatic data aggregation by subgroups, easy analysis of before-and-after data, exclusion of one or more data points from analysis, and splitting charts into sequential time periods. Missing values and empty subgroups are handled gracefully.
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Recently, regularized variable selection has emerged as a powerful tool to identify and dissect gene-environment interactions. Nevertheless, in longitudinal studies with high dimensional genetic factors, regularization methods for GÃ E interactions have not been systematically developed. In this package, we provide the implementation of sparse group variable selection, based on both the quadratic inference function (QIF) and generalized estimating equation (GEE), to accommodate the bi-level selection for longitudinal GÃ E studies with high dimensional genomic features. Alternative methods conducting only the group or individual level selection have also been included. The core modules of the package have been developed in C++.
This package provides utilities to create and use lenses to simplify data manipulation. Lenses are composable getter/setter pairs that provide a functional approach to manipulating deeply nested data structures, e.g., elements within list columns in data frames. The implementation is based on the earlier lenses R package <https://github.com/cfhammill/lenses>, which was inspired by the Haskell lens package by Kmett (2012) <https://github.com/ekmett/lens>, one of the most widely referenced implementations of lenses. For additional background and history on the theory of lenses, see the lens package wiki: <https://github.com/ekmett/lens/wiki/History-of-Lenses>.
Rclone is a command line program to sync files and directories to and from different cloud storage providers.
Features include:
MD5/SHA1 hashes checked at all times for file integrity
Timestamps preserved on files
Partial syncs supported on a whole file basis
Copy mode to just copy new/changed files
Sync (one way) mode to make a directory identical
Check mode to check for file hash equality
Can sync to and from network, e.g., two different cloud accounts
Optional encryption (Crypt)
Optional cache (Cache)
Optional FUSE mount (rclone mount)
The DImodels package is suitable for analysing data from biodiversity and ecosystem function studies using the Diversity-Interactions (DI) modelling approach introduced by Kirwan et al. (2009) <doi:10.1890/08-1684.1>. Suitable data will contain proportions for each species and a community-level response variable, and may also include additional factors, such as blocks or treatments. The package can perform data manipulation tasks, such as computing pairwise interactions (the DI_data() function), can perform an automated model selection process (the autoDI() function) and has the flexibility to fit a wide range of user-defined DI models (the DI() function).
This tree-based method deals with high dimensional longitudinal data with correlated features through the use of a piecewise random effect model. FREE tree also exploits the network structure of the features, by first clustering them using Weighted Gene Co-expression Network Analysis ('WGCNA'). It then conducts a screening step within each cluster of features and a selecting step among the surviving features, which provides a relatively unbiased way to do feature selection. By using dominant principle components as regression variables at each leaf and the original features as splitting variables at splitting nodes, FREE tree delivers easily interpretable results while improving computational efficiency.
Integrated Functional Depth for Partially Observed Functional Data and applications to visualization, outlier detection and classification. It implements the methods proposed in: Elà as, A., Jiménez, R., Paganoni, A. M. and Sangalli, L. M., (2023), "Integrated Depth for Partially Observed Functional Data", Journal of Computational and Graphical Statistics, <doi:10.1080/10618600.2022.2070171>. Elà as, A., Jiménez, R., & Shang, H. L. (2023), "Depth-based reconstruction method for incomplete functional data", Computational Statistics, <doi:10.1007/s00180-022-01282-9>. Elà as, A., Nagy, S. (2024), "Statistical properties of partially observed integrated functional depths", TEST, <doi:10.1007/s11749-024-00954-6>.
An interface to the fastText <https://github.com/facebookresearch/fastText> library for efficient learning of word representations and sentence classification. The fastText algorithm is explained in detail in (i) "Enriching Word Vectors with subword Information", Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, 2017, <doi:10.1162/tacl_a_00051>; (ii) "Bag of Tricks for Efficient Text Classification", Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 2017, <doi:10.18653/v1/e17-2068>; (iii) "FastText.zip: Compressing text classification models", Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Herve Jegou, Tomas Mikolov, 2016, <doi:10.48550/arXiv.1612.03651>.
In streaming data analysis, it is crucial to detect significant shifts in the data distribution or the accuracy of predictive models over time, a phenomenon known as concept drift. The package aims to identify when concept drift occurs and provide methodologies for adapting models in non-stationary environments. It offers a range of state-of-the-art techniques for detecting concept drift and maintaining model performance. Additionally, the package provides tools for adapting models in response to these changes, ensuring continuous and accurate predictions in dynamic contexts. Methods for concept drift detection are described in Tavares (2022) <doi:10.1007/s12530-021-09415-z>.
Several functions to calculate two important indexes (IBR (Integrated Biomarker Response) and IBRv2 (Integrated Biological Response version 2)), it also calculates the standardized values for enzyme activity for each index, and it has a graphing function to perform radarplots that make great data visualization for this type of data. Beliaeff, B., & Burgeot, T. (2002). <https://pubmed.ncbi.nlm.nih.gov/12069320/>. Sanchez, W., Burgeot, T., & Porcher, J.-M. (2013).<doi:10.1007/s11356-012-1359-1>. Devin, S., Burgeot, T., Giambérini, L., Minguez, L., & Pain-Devin, S. (2014). <doi:10.1007/s11356-013-2169-9>. Minato N. (2022). <https://minato.sip21c.org/msb/>.
Implementation of popular mortality models using the rstan package, which provides the R interface to the Stan C++ library for Bayesian estimation. The package supports well-known models proposed in the actuarial and demographic literature including the Lee-Carter (1992) <doi:10.1080/01621459.1992.10475265> and the Cairns-Blake-Dowd (2006) <doi:10.1111/j.1539-6975.2006.00195.x> models. By a simple call, the user inputs deaths and exposures and the package outputs the MCMC simulations for each parameter, the log likelihoods and predictions. Moreover, the package includes tools for model selection and Bayesian model averaging by leave future-out validation.
This program realizes a universal estimation approach that accommodates multi-category variables and effect scales, making up for the deficiencies of the existing approaches when dealing with non-binary exposures and complex models. The estimation via bootstrapping can simultaneously provide results of causal mediation on risk difference (RD), odds ratio (OR) and risk ratio (RR) scales with tests of the effects difference. The estimation is also applicable to many other settings, e.g., moderated mediation, inconsistent covariates, panel data, etc. The high flexibility and compatibility make it possible to apply for any type of model, greatly meeting the needs of current empirical researches.
This package is Cytometry dATa anALYSis Tools (CATALYST). Mass cytometry like Cytometry by time of flight (CyTOF) uses heavy metal isotopes rather than fluorescent tags as reporters to label antibodies, thereby substantially decreasing spectral overlap and allowing for examination of over 50 parameters at the single cell level. While spectral overlap is significantly less pronounced in CyTOF than flow cytometry, spillover due to detection sensitivity, isotopic impurities, and oxide formation can impede data interpretability. CATALYST was designed to provide a pipeline for preprocessing of cytometry data, including:
normalization using bead standards;
single-cell deconvolution;
bead-based compensation.
Annotates data from liquid chromatography coupled to mass spectrometry (LC/MS) metabolomics experiments. Based on a network algorithm (O.Senan, A. Aguilar- Mogas, M. Navarro, O. Yanes, R.Guimerà and M. Sales-Pardo, Bioinformatics, 35(20), 2019), CliqueMS builds a weighted similarity network where nodes are features and edges are weighted according to the similarity of this features. Then it searches for the most plausible division of the similarity network into cliques (fully connected components). Finally it annotates metabolites within each clique, obtaining for each annotated metabolite the neutral mass and their features, corresponding to isotopes, ionization adducts and fragmentation adducts of that metabolite.