epidecodeR is a package capable of analysing impact of degree of DNA/RNA epigenetic chemical modifications on dysregulation of genes or proteins. This package integrates chemical modification data generated from a host of epigenomic or epitranscriptomic techniques such as ChIP-seq, ATAC-seq, m6A-seq, etc. and dysregulated gene lists in the form of differential gene expression, ribosome occupancy or differential protein translation and identify impact of dysregulation of genes caused due to varying degrees of chemical modifications associated with the genes. epidecodeR generates cumulative distribution function (CDF) plots showing shifts in trend of overall log2FC between genes divided into groups based on the degree of modification associated with the genes. The tool also tests for significance of difference in log2FC between groups of genes.
The Datasaurus Dozen is a set of datasets with the same summary statistics. They retain the same summary statistics despite having radically different distributions. The datasets represent a larger and quirkier object lesson that is typically taught via Anscombe's Quartet (available in the 'datasets' package). Anscombe's Quartet contains four very different distributions with the same summary statistics and as such highlights the value of visualisation in understanding data, over and above summary statistics. As well as being an engaging variant on the Quartet, the data is generated in a novel way. The simulated annealing process used to derive datasets from the original Datasaurus is detailed in "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing" doi:10.1145/3025453.3025912.
Given one or multiple paths to files produced by a PULSE multi-channel or a PULSE one-channel system (<https://electricblue.eu/pulse>) from a single experiment: [1] check pulse files for inconsistencies and read/merge all data, [2] split across time windows, [3] interpolate and smooth to optimize the dataset, [4] compute the heart rate frequency for each channel/window, and [5] facilitate quality control, summarising and plotting. Heart rate frequency is calculated using the Automatic Multi-scale Peak Detection algorithm proposed by Felix Scholkmann and team. For more details see Scholkmann et al (2012) <doi:10.3390/a5040588>. Check original code at <https://github.com/ig248/pyampd>. ElectricBlue is a non-profit technology transfer startup creating research-oriented solutions for the scientific community (<https://electricblue.eu>).
This package provides a powerful and flexible tool for visualizing proportional data across spatially resolved contexts. By combining the concepts of scatter plots and stacked bar charts, scatterbar allows users to create scattered bar chart plots, which effectively display the proportions of different categories at each (x, y) location. This visualization is particularly useful for applications where understanding the distribution of categories across spatial coordinates is essential. This package features automatic determination of optimal scaling factors based on data, customizable scaling and padding options for both x and y axes, flexibility to specify custom colors for each category, options to customize the legend title, and integration with ggplot2 for robust and high-quality visualizations. For more details, see Velazquez et al. (2024) <doi:10.1101/2024.08.14.606810>.
This package provides model data and functions for easily using machine learning models that use data from the DNA methylome to classify cancer type and phenotype from a sample. The primary motivation for the development of this package is to abstract away the granular and accessibility-limiting code required to utilize machine learning models in R. Our package provides this abstraction for RandomForest, e1071 Support Vector, Extreme Gradient Boosting, and Tensorflow models. This is paired with an ExperimentHub component, which contains models developed for epigenetic cancer classification and predicting phenotypes. This includes CNS tumor classification, Pan-cancer classification, race prediction, cell of origin classification, and subtype classification models. The package links to our models on ExperimentHub. The package currently supports HM450, EPIC, EPICv2, MSA, and MM285.
In computer experiments space-filling designs are having great impact. Most popularly used space-filling designs are Uniform designs (UDs), Latin hypercube designs (LHDs) etc. For further references one can see Mckay (1979) <DOI:10.1080/00401706.1979.10489755> and Fang (1980) <https://cir.nii.ac.jp/crid/1570291225616774784>. In this package, we have provided algorithms for generate efficient LHDs and UDs. Here, generated LHDs are efficient as they possess lower value of Maxpro measure, Phi_p value and Maximum Absolute Correlation (MAC) value based on the weightage given to each criterion. On the other hand, the produced UDs are having good space-filling property as they always attain the lower bound of Discrete Discrepancy measure. Further, some useful functions added in this package for adding more value to this package.
This package provides a customisable R shiny app for immersively visualising, mapping and annotating panospheric (360 degree) imagery. The flexible interface allows annotation of any geocoded images using up to 4 user specified dropdown menus. The app uses leaflet to render maps that display the geo-locations of images and panellum <https://pannellum.org/>, a lightweight panorama viewer for the web, to render images in virtual 360 degree viewing mode. Key functions include the ability to draw on & export parts of 360 images for downstream applications. Users can also draw polygons and points on map imagery related to the panoramic images and export them for further analysis. Downstream applications include using annotations to train Artificial Intelligence/Machine Learning (AI/ML) models and geospatial modelling and analysis of camera based survey data.
Unleash the power of time-series data visualization with ease using our package. Designed with simplicity in mind, it offers three key features through the shiny package output. The first tab shows time- series charts with forecasts, allowing users to visualize trends and changes effortlessly. The second one displays Averages per country presented in tables with accompanying sparklines, providing a quick and attractive overview of the data. The last tab presents A customizable world map colored based on user-defined variables for any chosen number of countries, offering an advanced visual approach to understanding geographical data distributions. This package operates with just a few simple arguments, enabling users to conduct sophisticated analyses without the need for complex programming skills. Transform your time-series data analysis experience with our user-friendly tool.
DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.
The cyclotomic numbers are complex numbers that can be thought of as the rational numbers extended with the roots of unity. They are represented exactly, enabling exact computations. They contain the Gaussian rationals (complex numbers with rational real and imaginary parts) as well as the square roots of all rational numbers. They also contain the sine and cosine of all rational multiples of pi. The algorithms implemented in this package are taken from the Haskell package cyclotomic', whose algorithms are adapted from code by Martin Schoenert and Thomas Breuer in the GAP project (<https://www.gap-system.org/>). Cyclotomic numbers have applications in number theory, algebraic geometry, algebraic number theory, coding theory, and in the theory of graphs and combinatorics. They have connections to the theory of modular functions and modular curves.
Data analysis often requires coding, especially when data are collected through interviews, observations, or questionnaires. As a result, code counting and data preparation are essential steps in the analysis process. Analysts may need to count the codes in a text (Tokenization, counting of pre-established codes, computing the co-occurrence matrix by line) and prepare the data (e.g., min-max normalization, Z-score, robust scaling, Box-Cox transformation, and non-parametric bootstrap). For the Box-Cox transformation (Box & Cox, 1964, <https://www.jstor.org/stable/2984418>), the optimal Lambda is determined using the log-likelihood method. Non-parametric bootstrap involves randomly sampling data with replacement. Two random number generators are also integrated: a Lehmer congruential generator for uniform distribution and a Box-Muller generator for normal distribution. Package for educational purposes.
Calculates exact hypothesis tests to compare a treatment and a reference group with respect to multiple binary endpoints. The tested null hypothesis is an identical multidimensional distribution of successes and failures in both groups. The alternative hypothesis is a larger success proportion in the treatment group in at least one endpoint. The tests are based on the multivariate permutation distribution of subjects between the two groups. For this permutation distribution, rejection regions are calculated that satisfy one of different possible optimization criteria. In particular, regions with maximal exhaustion of the nominal significance level, maximal power under a specified alternative or maximal number of elements can be found. Optimization is achieved by a branch-and-bound algorithm. By application of the closed testing principle, the global hypothesis tests are extended to multiple testing procedures.
This package performs multi-omic differential network analysis by revealing differential interactions between molecular entities (genes, proteins, transcription factors, or other biomolecules) across the omic datasets provided. For each omic dataset, a differential network is constructed where links represent statistically significant differential interactions between entities. These networks are then integrated into a comprehensive visualization using distinct colors to distinguish interactions from different omic layers. This unified display allows interactive exploration of cross-omic patterns, such as differential interactions present at both transcript and protein levels. For each link, users can access differential statistical significance metrics (p values or adjusted p values, calculated via robust or traditional linear regression with interaction term) and differential regression plots. The methods implemented in this package are described in Sciacca et al. (2023) <doi:10.1093/bioinformatics/btad192>.
Fast randomization based two sample tests. Testing the hypothesis that two samples come from the same distribution using randomization to create p-values. Included tests are: Kolmogorov-Smirnov, Kuiper, Cramer-von Mises, Anderson-Darling, Wasserstein, and DTS. The default test (two_sample) is based on the DTS test statistic, as it is the most powerful, and thus most useful to most users. The DTS test statistic builds on the Wasserstein distance by using a weighting scheme like that of Anderson-Darling. See the companion paper at <arXiv:2007.01360> or <https://codowd.com/public/DTS.pdf> for details of that test statistic, and non-standard uses of the package (parallel for big N, weighted observations, one sample tests, etc). We also include the permutation scheme to make test building simple for others.
In the context of paid research studies and clinical trials, budget considerations and patient sampling from available populations are subject to inherent constraints. We introduce the CDsampling package, which integrates optimal design theories within the framework of constrained sampling. This package offers the possibility to find both D-optimal approximate and exact allocations for samplings with or without constraints. Additionally, it provides functions to find constrained uniform sampling as a robust sampling strategy with limited model information. Our package offers functions for the computation of the Fisher information matrix under generalized linear models (including regular linear regression model) and multinomial logistic models.To demonstrate the applications, we also provide a simulated dataset and a real dataset embedded in the package. Yifei Huang, Liping Tong, and Jie Yang (2025)<doi:10.5705/ss.202022.0414>.
This package provides a decorator is a function that receives a function, extends its behaviour, and returned the altered function. Any caller that uses the decorated function uses the same interface as it were the original, undecorated function. Decorators serve two primary uses: (1) Enhancing the response of a function as it sends data to a second component; (2) Supporting multiple optional behaviours. An example of the first use is a timer decorator that runs a function, outputs its execution time on the console, and returns the original function's result. An example of the second use is input type validation decorator that during running time tests whether the caller has passed input arguments of a particular class. Decorators can reduce execution time, say by memoization, or reduce bugs by adding defensive programming routines.
An implementation of logistic normal multinomial (LNM) clustering. It is an extension of LNM mixture model proposed by Fang and Subedi (2020) <arXiv:2011.06682>, and is designed for clustering compositional data. The package includes 3 extended models: LNM Factor Analyzer (LNM-FA), LNM Bicluster Mixture Model (LNM-BMM) and Penalized LNM Factor Analyzer (LNM-FA). There are several advantages of LNM models: 1. LNM provides more flexible covariance structure; 2. Factor analyzer can reduce the number of parameters to estimate; 3. Bicluster can simultaneously cluster subjects and taxa, and provides significant biological insights; 4. Penalty term allows sparse estimation in the covariance matrix. Details for model assumptions and interpretation can be found in papers: Tu and Subedi (2021) <arXiv:2101.01871> and Tu and Subedi (2022) <doi:10.1002/sam.11555>.
Systematic reviews should be described in a high degree of methodological detail. The PRISMA Statement calls for a high level of reporting detail in systematic reviews and meta-analyses. An integral part of the methodological description of a review is a flow diagram. This package produces an interactive flow diagram that conforms to the PRISMA2020 preprint. When made interactive, the reader/user can click on each box and be directed to another website or file online (e.g. a detailed description of the screening methods, or a list of excluded full texts), with a mouse-over tool tip that describes the information linked to in more detail. Interactive versions can be saved as HTML files, whilst static versions for inclusion in manuscripts can be saved as HTML, PDF, PNG, SVG, PS or WEBP files.
The package encompasses functions to find potential guide RNAs for the CRISPR-based genome-editing systems including the Base Editors and the Prime Editors when supplied with target sequences as input. Users have the flexibility to filter resulting guide RNAs based on parameters such as the absence of restriction enzyme cut sites or the lack of paired guide RNAs. The package also facilitates genome-wide exploration for off-targets, offering features to score and rank off-targets, retrieve flanking sequences, and indicate whether the hits are located within exon regions. All detected guide RNAs are annotated with the cumulative scores of the top5 and topN off-targets together with the detailed information such as mismatch sites and restrictuion enzyme cut sites. The package also outputs INDELs and their frequencies for Cas9 targeted sites.
flowcatchR is a set of tools to analyze in vivo microscopy imaging data, focused on tracking flowing blood cells. It guides the steps from segmentation to calculation of features, filtering out particles not of interest, providing also a set of utilities to help checking the quality of the performed operations (e.g. how good the segmentation was). It allows investigating the issue of tracking flowing cells such as in blood vessels, to categorize the particles in flowing, rolling and adherent. This classification is applied in the study of phenomena such as hemostasis and study of thrombosis development. Moreover, flowcatchR presents an integrated workflow solution, based on the integration with a Shiny App and Jupyter notebooks, which is delivered alongside the package, and can enable fully reproducible bioimage analysis in the R environment.
Implementation of a probabilistic method to calculate nicheROVER (_niche_ _r_egion and niche _over_lap) metrics using multidimensional niche indicator data (e.g., stable isotopes, environmental variables, etc.). The niche region is defined as the joint probability density function of the multidimensional niche indicators at a user-defined probability alpha (e.g., 95%). Uncertainty is accounted for in a Bayesian framework, and the method can be extended to three or more indicator dimensions. It provides directional estimates of niche overlap, accounts for species-specific distributions in multivariate niche space, and produces unique and consistent bivariate projections of the multivariate niche region. The article by Swanson et al. (2015) <doi:10.1890/14-0235.1> provides a detailed description of the methodology. See the package vignette for a worked example using fish stable isotope data.
This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.
Rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. Rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted. Finally, rdiff-backup is easy to use and settings have sensible defaults.
This package provides a function to calculate multiple performance metrics for actual and predicted values. In total eight metrics will be calculated for particular actual and predicted series. Helps to describe a Statistical model's performance in predicting a data. Also helps to compare various models performance. The metrics are Root Mean Squared Error (RMSE), Relative Root Mean Squared Error (RRMSE), Mean absolute Error (MAE), Mean absolute percentage error (MAPE), Mean Absolute Scaled Error (MASE), Nash-Sutcliffe Efficiency (NSE), Willmottâ s Index (WI), and Legates and McCabe Index (LME). Among them, first five are expected to be lesser whereas, the last three are greater the better. More details can be found from Garai and Paul (2023) <doi:10.1016/j.iswa.2023.200202> and Garai et al. (2024) <doi:10.1007/s11063-024-11552-w>.