This package provides a Shiny app for visual exploration of omic datasets as compositions, and differential abundance analysis using ALDEx2. Useful for exploring RNA-seq, meta-RNA-seq, 16s rRNA gene sequencing with visualizations such as principal component analysis biplots (coloured using metadata for visualizing each variable), dendrograms and stacked bar plots, and effect plots (ALDEx2). Input is a table of counts and metadata file (if metadata exists), with options to filter data by count or by metadata to remove low counts, or to visualize select samples according to selected metadata.
SpotClean is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.
This package provides R bindings to the Automerge Conflict-free Replicated Data Type ('CRDT') library. Automerge enables automatic merging of concurrent changes without conflicts, making it ideal for distributed systems, collaborative applications, and offline-first architectures. The approach of local-first software was proposed in Kleppmann, M., Wiggins, A., van Hardenberg, P., McGranaghan, M. (2019) <doi:10.1145/3359591.3359737>. This package supports all Automerge data types (maps, lists, text, counters) and provides both low-level and high-level synchronization protocols for seamless interoperability with JavaScript and other Automerge implementations.
This package implements two complementary high-dimensional feature screening methods, Adaptive Iterative Ridge High-dimensional Ordinary Least-squares Projection (Air-HOLP, suitable when the number of predictors p is greater than or equal to the sample size n) and Adaptive Iterative Ridge Ordinary Least Squares (Air-OLS, for n greater than p). Also provides helper functions to generate compound-symmetry and AR(1) correlated data, plus a unified Air() front end and a summary method. For methodological details see Joudah, Muller and Zhu (2025) <doi:10.1007/s11222-025-10599-6>.
Manage storage in Microsoft's Azure cloud: <https://azure.microsoft.com/en-us/products/category/storage/>. On the admin side, AzureStor includes features to create, modify and delete storage accounts. On the client side, it includes an interface to blob storage, file storage, and Azure Data Lake Storage Gen2': upload and download files and blobs; list containers and files/blobs; create containers; and so on. Authenticated access to storage is supported, via either a shared access key or a shared access signature (SAS). Part of the AzureR family of packages.
This package provides Partial least squares Regression and various regular, sparse or kernel, techniques for fitting Cox models for big data. Provides a Partial Least Squares (PLS) algorithm adapted to Cox proportional hazards models that works with bigmemory matrices without loading the entire dataset in memory. Also implements a gradient-descent based solver for Cox proportional hazards models that works directly on bigmemory matrices. Bertrand and Maumy (2023) <https://hal.science/hal-05352069>, and <https://hal.science/hal-05352061> highlighted fitting and cross-validating PLS-based Cox models to censored big data.
Generates confidence intervals for standardized regression coefficients using delta method standard errors for models fitted by lm() as described in Yuan and Chan (2011) <doi:10.1007/s11336-011-9224-6> and Jones and Waller (2015) <doi:10.1007/s11336-013-9380-y>. The package can also be used to generate confidence intervals for differences of standardized regression coefficients and as a general approach to performing the delta method. A description of the package and code examples are presented in Pesigan, Sun, and Cheung (2023) <doi:10.1080/00273171.2023.2201277>.
Typical morphological profiling datasets have millions of cells and hundreds of features per cell. When working with this data, you must clean the data, normalize the features to make them comparable across experiments, transform the features, select features based on their quality, and aggregate the single-cell data, if needed. cytominer makes these steps fast and easy. Methods used in practice in the field are discussed in Caicedo (2017) <doi:10.1038/nmeth.4397>. An overview of the field is presented in Caicedo (2016) <doi:10.1016/j.copbio.2016.04.003>.
Comprehensive suite of Granger causality tests including standard Toda-Yamamoto (1995) <doi:10.1016/0304-4076(94)01616-8>, Fourier-based tests with single frequency (Enders and Jones, 2016) <doi:10.1515/snde-2014-0101> and cumulative frequencies (Nazlioglu et al., 2019) <doi:10.1080/1540496X.2018.1434072>, as well as quantile causality tests (Cai et al., 2023) <doi:10.1016/j.frl.2023.104327> and Bootstrap Fourier Granger Causality in Quantiles (Cheng et al., 2021) <doi:10.1007/s12076-020-00263-0>. All tests include bootstrap inference for robust p-values.
We provide a comprehensive software to estimate general K-stage DTRs from SMARTs with Q-learning and a variety of outcome-weighted learning methods. Penalizations are allowed for variable selection and model regularization. With the outcome-weighted learning scheme, different loss functions - SVM hinge loss, SVM ramp loss, binomial deviance loss, and L2 loss - are adopted to solve the weighted classification problem at each stage; augmentation in the outcomes is allowed to improve efficiency. The estimated DTR can be easily applied to a new sample for individualized treatment recommendations or DTR evaluation.
This package provides a flexible framework for Agent-Based Models (ABM), the epiworldR package provides methods for prototyping disease outbreaks and transmission models using a C++ backend, making it very fast. It supports multiple epidemiological models, including the Susceptible-Infected-Susceptible (SIS), Susceptible-Infected-Removed (SIR), Susceptible-Exposed-Infected-Removed (SEIR), and others, involving arbitrary mitigation policies and multiple-disease models. Users can specify infectiousness/susceptibility rates as a function of agents features, providing great complexity for the model dynamics. Furthermore, epiworldR is ideal for simulation studies featuring large populations.
This package provides a comprehensive suite of functions for processing and visualizing taxonomic data. It includes functionality to clean and transform taxonomic data, categorize it into hierarchical ranks (such as Phylum, Class, Order, Family, and Genus), and calculate the relative abundance of each category. The package also generates a color palette for visual representation of the taxonomic data, allowing users to easily identify and differentiate between various taxonomic groups. Additionally, it features a river plot visualization to effectively display the distribution of individuals across different taxonomic ranks, facilitating insights into taxonomic visualization.
This package contains miscellaneous functions useful for managing NetCDF files (see <https://en.wikipedia.org/wiki/NetCDF>), get moon phase and time for sun rise and fall, tide level, analyse and reconstruct periodic time series of temperature with irregular sinusoidal pattern, show scales and wind rose in plot with change of color of text, Metropolis-Hastings algorithm for Bayesian MCMC analysis, plot graphs or boxplot with error bars, search files in disk by there names or their content, read the contents of all files from a folder at one time.
Automate the detection of gaps and elevations in mapped sequencing read coverage using a 2D pattern-matching algorithm. ProActive detects, characterizes and visualizes read coverage patterns in both genomes and metagenomes. Optionally, users may provide gene annotations associated with their genome or metagenome in the form of a .gff file. In this case, ProActive will generate an additional output table containing the gene annotations found within the detected regions of gapped and elevated read coverage. Additionally, users can search for gene annotations of interest in the output read coverage plots.
Efficient Markov chain Monte Carlo (MCMC) algorithms for fully Bayesian estimation of time-varying parameter models with shrinkage priors, both dynamic and static. Details on the algorithms used are provided in Bitto and Frühwirth-Schnatter (2019) <doi:10.1016/j.jeconom.2018.11.006> and Cadonna et al. (2020) <doi:10.3390/econometrics8020020> and Knaus and Frühwirth-Schnatter (2023) <doi:10.48550/arXiv.2312.10487>. For details on the package, please see Knaus et al. (2021) <doi:10.18637/jss.v100.i13>. For the multivariate extension, see the shrinkTVPVAR package.
Machine learning provides algorithms that can learn from data and make inferences or predictions. Stochastic automata is a class of input/output devices which can model components. This work provides implementation an inference algorithm for stochastic automata which is similar to the Viterbi algorithm. Moreover, we specify a learning algorithm using the expectation-maximization technique and provide a more efficient implementation of the Baum-Welch algorithm for stochastic automata. This work is based on Inference and learning in stochastic automata was by Karl-Heinz Zimmermann(2017) <doi:10.12732/ijpam.v115i3.15>.
Data exploration and modelling is a process in which a lot of data artifacts are produced. Artifacts like: subsets, data aggregates, plots, statistical models, different versions of data sets and different versions of results. Archivist helps to store and manage artifacts created in R. It allows you to store selected artifacts as binary files together with their metadata and relations. Archivist allows sharing artifacts with others. It can look for already created artifacts by using its class, name, date of the creation or other properties. It also makes it easy to restore such artifacts.
Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.
This package provides a simple tool to quantify the amount of transmission of an infectious disease of interest occurring within and between population groups. bumblebee uses counts of observed directed transmission pairs, identified phylogenetically from deep-sequence data or from epidemiological contacts, to quantify transmission flows within and between population groups accounting for sampling heterogeneity. Population groups might include: geographical areas (e.g. communities, regions), demographic groups (e.g. age, gender) or arms of a randomized clinical trial. See the bumblebee website for statistical theory, documentation and examples <https://magosil86.github.io/bumblebee/>.
Color values in R are often represented as strings of hexadecimal colors or named colors. This package offers fast conversion of these color representations to either an array of red/green/blue/alpha values or to the packed integer format used in native raster objects. Functions for conversion are also exported at the C level for use in other packages. This fast conversion of colors is implemented using an order-preserving minimal perfect hash derived from Majewski et al (1996) "A Family of Perfect Hashing Methods" <doi:10.1093/comjnl/39.6.547>.
This package provides fast moving-window ("focal") and buffer-based extraction for raster data using the terra package. Automatically selects between a C++ backend (via terra') and a Fast Fourier Transform (FFT) backend depending on problem size. The FFT backend supports sum and mean, while other statistics (e.g., median, min, max, standard deviation) are handled by the terra backend. Supports multiple kernel types (e.g., circle, rectangle, gaussian), with NA handling consistent with terra via na.rm and na.policy'. Operates on SpatRaster objects and returns results with the same geometry.
Grey zones locally occur in an agreement table due to the subjective evaluation of raters based on various factors such as not having uniform guidelines, the differences between the raters level of expertise or low variability among the level of the categorical variable. It is important to detect grey zones since they cause a negative bias in the estimate of the agreement level. This package provides a function for detecting the existence of grey zones in two-way inter-rater agreement tables (Demirhan and Yilmaz (2023) <doi:10.1186/s12874-022-01759-7>).
Vitamin and mineral deficiencies continue to be a significant public health problem. This is particularly critical in developing countries where deficiencies to vitamin A, iron, iodine, and other micronutrients lead to adverse health consequences. Cross-sectional surveys are helpful in answering questions related to the magnitude and distribution of deficiencies of selected vitamins and minerals. This package provides tools for calculating and determining select vitamin and mineral deficiencies based on World Health Organization (WHO) guidelines found at <https://www.who.int/teams/nutrition-and-food-safety/databases/vitamin-and-mineral-nutrition-information-system>.
The SoundexBR package provides an algorithm for decoding names into phonetic codes, as pronounced in Portuguese. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. The soundex code resultant consists of a four digits long string composed by one letter followed by three numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants.