Color schemes ready for each type of data (qualitative, diverging or sequential), with colors that are distinct for all people, including color-blind readers. This package provides an implementation of Paul Tol (2018) and Fabio Crameri (2018) <doi:10.5194/gmd-11-2541-2018> color schemes for use with graphics or ggplot2'. It provides tools to simulate color-blindness and to test how well the colors of any palette are identifiable. Several scientific thematic schemes (geologic timescale, land cover, FAO soils, etc.) are also implemented.
Incorporates a Bayesian monotonic single-index mixed-effect model with a multivariate skew-t likelihood, specifically designed to handle survey weights adjustments. Features include a simulation program and an associated Gibbs sampler for model estimation. The single-index function is constrained to be monotonic increasing, utilizing a customized Gaussian process prior for precise estimation. The model assumes random effects follow a canonical skew-t distribution, while residuals are represented by a multivariate Student-t distribution. Offers robust Bayesian adjustments to integrate survey weight information effectively.
Values below the limit of detection (LOD) are a problem in several fields of science, and there are numerous approaches for replacing the missing data. We present a new mathematical solution for maximum likelihood estimation that allows us to estimate the true values of the mean and standard deviation for normal distributions and is significantly faster than previous implementations. The article with the details was submitted to JSS and can be currently seen on <https://www2.arnes.si/~tverbo/LOD/Verbovsek_Sega_2_Manuscript.pdf>.
This package provides a collection of various techniques correcting statistical models for sample selection bias is provided. In particular, the resampling-based methods "stochastic inverse-probability oversampling" and "parametric inverse-probability bagging" are placed at the disposal which generate synthetic observations for correcting classifiers for biased samples resulting from stratified random sampling. For further information, see the article Krautenbacher, Theis, and Fuchs (2017) <doi:10.1155/2017/7847531>. The methods may be used for further purposes where weighting and generation of new observations is needed.
This package provides tools for analyzing spatial cell-cell interactions based on ligand-receptor pairs, including functions for local, regional, and global analysis using spatial transcriptomics data. Integrates with databases like CellChat <http://www.cellchat.org/>, CellPhoneDB <https://www.cellphonedb.org/>, Cellinker <https://www.rna-society.org/cellinker/index.html>, ICELLNET <https://github.com/soumelis-lab/ICELLNET>, and ConnectomeDB <https://humanconnectome.org/software/connectomedb/> to identify ligand-receptor pairs, visualize interactions through heatmaps, chord diagrams, and infer interactions on different spatial scales.
Spectral and Average Autocorrelation Zero Distance Density ('sazed') is a method for estimating the season length of a seasonal time series. sazed is aimed at practitioners, as it employs only domain-agnostic preprocessing and does not depend on parameter tuning or empirical constants. The computation of sazed relies on the efficient autocorrelation computation methods suggested by Thibauld Nion (2012, URL: <https://etudes.tibonihoo.net/literate_musing/autocorrelations.html>) and by Bob Carpenter (2012, URL: <https://lingpipe-blog.com/2012/06/08/autocorrelation-fft-kiss-eigen/>).
The goal of trainR is to provide a simple interface to the National Rail Enquiries (NRE) systems. There are few data feeds available, the simplest of them is Darwin, which provides real-time arrival and departure predictions, platform numbers, delay estimates, schedule changes and cancellations. Other data feeds provide historical data, Historic Service Performance (HSP), and much more. trainR simplifies the data retrieval, so that the users can focus on their analyses. For more details visit <https://www.nationalrail.co.uk/46391.aspx>.
Topological data analytic methods in machine learning rely on vectorizations of the persistence diagrams that encode persistent homology, as surveyed by Ali &al (2000) <doi:10.48550/arXiv.2212.09703>. Persistent homology can be computed using TDA and ripserr and vectorized using TDAvec'. The Tidymodels package collection modularizes machine learning in R for straightforward extensibility; see Kuhn & Silge (2022, ISBN:978-1-4920-9644-3). These recipe steps and dials tuners make efficient algorithms for computing and vectorizing persistence diagrams available for Tidymodels workflows.
Calculates total survey error (TSE) for a survey under multiple, different weighting schemes, using both scale-dependent and scale-independent metrics. Package works directly from the data set, with no hand calculations required: just upload a properly structured data set (see TESTWGT and its documentation), properly input column names (see functions documentation), and run your functions. For more on TSE, see: Weisberg, Herbert (2005, ISBN:0-226-89128-3); Biemer, Paul (2010) <doi:10.1093/poq/nfq058>; Biemer, Paul et.al. (2017, ISBN:9781119041672); etc.
This package provides an algorithm to detect and characterize disturbances (start, end dates, intensity) that can occur at different hierarchical levels by studying the dynamics of longitudinal observations at the unit level and group level based on Nadaraya-Watson's smoothing curves, but also a shiny app which allows to visualize the observations and the detected disturbances. Finally the package provides a dataframe mimicking a pig farming system subsected to disturbances simulated according to Le et al.(2022) <doi:10.1016/j.animal.2022.100496>.
ILoReg is a tool for identification of cell populations from scRNA-seq data. In particular, ILoReg is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided.
This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. limpca applies a GLM (General Linear Model) version of ASCA and APCA to analyse multivariate sample profiles generated by an experimental design. ASCA/APCA provide powerful visualization tools for multivariate structures in the space of each effect of the statistical model linked to the experimental design and contrarily to MANOVA, it can deal with mutlivariate datasets having more variables than observations. This method can handle unbalanced design.
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with MatchIt', WeightIt', MatchThem', twang', Matching', optmatch', CBPS', ebal', cem', sbw', and designmatch for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity. Jiang, W., Song, S., Hou, L. and Zhao, H. "A set of efficient methods to generate high-dimensional binary data with specified correlation structures." The American Statistician. See <doi:10.1080/00031305.2020.1816213> for a detailed presentation of the method.
This package provides a collection of functions which fit functional neural network models. In other words, this package will allow users to build deep learning models that have either functional or scalar responses paired with functional and scalar covariates. We implement the theoretical discussion found in Thind, Multani and Cao (2020) <arXiv:2006.09590> through the help of a main fitting and prediction function as well as a number of helper functions to assist with cross-validation, tuning, and the display of estimated functional weights.
It offers comprehensive tools for the analysis of functional time series data, focusing on white noise hypothesis testing and goodness-of-fit evaluations, alongside functions for simulating data and advanced visualization techniques, such as 3D rainbow plots. These methods are described in Kokoszka, Rice, and Shang (2017) <doi:10.1016/j.jmva.2017.08.004>, Yeh, Rice, and Dubin (2023) <doi:10.1214/23-EJS2112>, Kim, Kokoszka, and Rice (2023) <doi:10.1214/23-ss143>, and Rice, Wirjanto, and Zhao (2020) <doi:10.1111/jtsa.12532>.
Imputing blockwise missing data by imprecise imputation, featuring a domain-based, variable-wise, and case-wise strategy. Furthermore, the estimation of lower and upper bounds for unconditional and conditional probabilities based on the obtained imprecise data is implemented. Additionally, two utility functions are supplied: one to check whether variables in a data set contain set-valued observations; and another to merge two already imprecisely imputed data. The method is described in a technical report by Endres, Fink and Augustin (2018, <doi:10.5282/ubm/epub.42423>).
This package provides functions to perform all steps of genome-wide association meta-analysis for studying Genotype x Environment interactions, from collecting the data to the manhattan plot. The procedure accounts for the potential correlation between studies. In addition to the Fixed and Random models, one can investigate the relationship between QTL effects and some qualitative or quantitative covariate via the test of contrast and the meta-regression, respectively. The methodology is available from: (De Walsche, A., et al. (2025) \doi10.1371/journal.pgen.1011553).
PHATE is a tool for visualizing high dimensional single-cell data with natural progressions or trajectories. PHATE uses a novel conceptual framework for learning and visualizing the manifold inherent to biological systems in which smooth transitions mark the progressions of cells from one state to another. To see how PHATE can be applied to single-cell RNA-seq datasets from hematopoietic stem cells, human embryonic stem cells, and bone marrow samples, check out our publication in Nature Biotechnology at <doi:10.1038/s41587-019-0336-3>.
This package provides a system to increase the efficiency of dynamic web-scraping with RSelenium by leveraging parallel processing. You provide a function wrapper for your RSelenium scraping routine with a set of inputs, and parsel runs it in several browser instances. Chunked input processing as well as error catching and logging ensures seamless execution and minimal data loss, even when unforeseen RSelenium errors occur. You can additionally build safe scraping functions with minimal coding by utilizing constructor functions that act as wrappers around RSelenium methods.
Simulate complex data from a given directed acyclic graph and information about each individual node. Root nodes are simply sampled from the specified distribution. Child Nodes are simulated according to one of many implemented regressions, such as logistic regression, linear regression, poisson regression or any other function. Also includes a comprehensive framework for discrete-time simulation, and networks-based simulation which can generate even more complex longitudinal and dependent data. For more details, see Robin Denz, Nina Timmesfeld (2025) <doi:10.48550/arXiv.2506.01498>.
Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.
Time series outlier detection with non parametric test. This is a new outlier detection methodology (washer): efficient for time saving elaboration and implementation procedures, adaptable for general assumptions and for needing very short time series, reliable and effective as involving robust non parametric test. You can find two approaches: single time series (a vector) and grouped time series (a data frame). For other informations: Andrea Venturini (2011) Statistica - Universita di Bologna, Vol.71, pp.329-344. For an informal explanation look at R-bloggers on web.
This package provides an implementation of multilayered visualizations for enhanced graphical representation of functional analysis data. It combines and integrates omics data derived from expression and functional annotation enrichment analyses. Its plotting functions have been developed with an hierarchical structure in mind: starting from a general overview to identify the most enriched categories (modified bar plot, bubble plot) to a more detailed one displaying different types of relevant information for the molecules in a given set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap).