systemPipeTools package extends the widely used systemPipeR (SPR) workflow environment with an enhanced toolkit for data visualization, including utilities to automate the data visualizaton for analysis of differentially expressed genes (DEGs). systemPipeTools provides data transformation and data exploration functions via scatterplots, hierarchical clustering heatMaps, principal component analysis, multidimensional scaling, generalized principal components, t-Distributed Stochastic Neighbor embedding (t-SNE), and MA and volcano plots. All these utilities can be integrated with the modular design of the systemPipeR environment that allows users to easily substitute any of these features and/or custom with alternatives.
Discrete event simulation using both R and C++ (Karlsson et al 2016; <doi:10.1109/eScience.2016.7870915>). The C++ code is adapted from the SSIM library <https://www.inf.usi.ch/carzaniga/ssim/>, allowing for event-oriented simulation. The code includes a SummaryReport class for reporting events and costs by age and other covariates. The C++ code is available as a static library for linking to other packages. A priority queue implementation is given in C++ together with an S3 closure and a reference class implementation. Finally, some tools are provided for cost-effectiveness analysis.
Takes the outputs of a caret confusion matrix and allows for the quick conversion of these list items to lists. The intended usage is to allow the tool to work with the outputs of machine learning classification models. This tool works with classification problems for binary and multi-classification problems and allows for the record level conversion of the confusion matrix outputs. This is useful, as it allows quick conversion of these objects for storage in database systems and to track ML model performance over time. Traditionally, this approach has been used for highlighting model representation and feature slippage.
Precompiled and processed miRNA-overexpression fold-changes from 84 Gene Expression Omnibus (GEO) series corresponding to 6 platforms, 77 human cells or tissues, and 113 distinct miRNAs. Accompanied with the data, we also included in this package the sequence feature scores from TargetScanHuman 6.1 including the context+ score and the probabilities of conserved targeting for each miRNA-mRNA interaction. Thus, the user can use these static sequence-based scores together with user-supplied tissue/cell-specific fold-change due to miRNA overexpression to predict miRNA targets using the package TargetScore (download separately).
The normal process of creating clinical study slides is that a statistician manually type in the numbers from outputs and a separate statistician to double check the typed in numbers. This process is time consuming, resource intensive, and error prone. Automatic slide generation is a solution to address these issues. It reduces the amount of work and the required time when creating slides, and reduces the risk of errors from manually typing or copying numbers from the output to slides. It also helps users to avoid unnecessary stress when creating large amounts of slide decks in a short time window.
This package provides a framework for scalable statistical computing on large on-disk matrices stored in HDF5 files. It provides efficient block-wise implementations of core linear-algebra operations (matrix multiplication, SVD, PCA, QR decomposition, and canonical correlation analysis) written in C++ and R. These building blocks are designed not only for direct use, but also as foundational components for developing new statistical methods that must operate on datasets too large to fit in memory. The package supports data provided either as HDF5 files or standard R objects, and is intended for high-dimensional applications such as omics and precision-medicine research.
Machine learning algorithms for predictor variables that are compositional data and the response variable is either continuous or categorical. Specifically, the Boruta variable selection algorithm, random forest, support vector machines and projection pursuit regression are included. Relevant papers include: Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> and Alenazi, A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>.
Convert one biological ID to another of rice (Oryza sativa). Rice(Oryza sativa) has more than one form gene ID for the genome. The two main gene ID for rice genome are the RAP (The Rice Annotation Project, <https://rapdb.dna.affrc.go.jp/>, and the MSU(The Rice Genome Annotation Project, <http://rice.plantbiology.msu.edu/>. All RAP rice gene IDs are of the form Os##g####### as explained on the website <https://rapdb.dna.affrc.go.jp/>. All MSU rice gene IDs are of the form LOC_Os##g##### as explained on the website <http://rice.plantbiology.msu.edu/analyses_nomenclature.shtml>. All SYMBOL rice gene IDs are the unique name on the NCBI(National Center for Biotechnology Information, <https://www.ncbi.nlm.nih.gov/>. The TRANSCRIPTID, is the transcript id of rice, are of the form Os##t#######. The researchers usually need to converter between various IDs. Such as converter RAP to SYMBOLS for function searching on NCBI. There are a lot of websites with the function for converting RAP to MSU or MSU to RA, such as ID Converter <https://rapdb.dna.affrc.go.jp/tools/converter>. But it is difficult to convert super multiple IDs on these websites. The package can convert all IDs between the three IDs (RAP, MSU and SYMBOL) regardless of the number.
The goal of automatedRecLin is to perform record linkage (also known as entity resolution) in unsupervised or supervised settings. It compares pairs of records from two datasets using selected comparison functions to estimate the probability or density ratio between matched and non-matched records. Based on these estimates, it predicts a set of matches that maximizes entropy. For details see: Lee et al. (2022) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022001/article/00007-eng.htm>, Vo et al. (2023) <https://ideas.repec.org/a/eee/csdana/v179y2023ics0167947322002365.html>, Sugiyama et al. (2008) <doi:10.1007/s10463-008-0197-x>.
Perform fast and memory efficient time-weighted averaging of values measured over intervals into new arbitrary intervals. This package is useful in the context of data measured or represented as constant values over intervals on a one-dimensional discrete axis (e.g. time-integrated averages of a curve over defined periods). This package was written specifically to deal with air pollution data recorded or predicted as averages over sampling periods. Data in this format often needs to be shifted to non-aligned periods or averaged up to periods of longer duration (e.g. averaging data measured over sequential non-overlapping periods to calendar years).
An implementation of the additive (Gurevitch et al., 2000 <doi:10.1086/303337>) and multiplicative (Lajeunesse, 2011 <doi:10.1890/11-0423.1>) factorial null models for multiple stressor data (Burgess et al., 2021 <doi:10.1101/2021.07.21.453207>). Effect sizes are able to be calculated for either null model, and subsequently classified into one of four different interaction classifications (e.g., antagonistic or synergistic interactions). Analyses can be conducted on data for single experiments through to large meta-analytical datasets. Minimal input (or statistical knowledge) is required, with any output easily understood. Summary figures are also able to be easily generated.
The price action at any given time is determined by investor sentiment and market conditions. Although there is no established principle, over a long period of time, things often move with a certain periodicity. This is sometimes referred to as anomaly. The seasonPlot() function in this package calculates and visualizes the average value of price movements over a year for any given period. In addition, the monthly increase or decrease in price movement is represented with a colored background. This seasonPlot() function can use the same symbols as the quantmod package (e.g. ^IXIC, ^DJI, SPY, BTC-USD, and ETH-USD etc).
This package performs sensitivity analysis for publication bias in meta-analyses (per Mathur & VanderWeele, 2020 [<doi:10.31219/osf.io/s9dp6>]). These analyses enable statements such as: "For publication bias to shift the observed point estimate to the null, significant results would need to be at least 30-fold more likely to be published than negative or nonsignificant results." Comparable statements can be made regarding shifting to a chosen non-null value or shifting the confidence interval. Provides a worst-case meta-analytic point estimate under maximal publication bias obtained simply by conducting a standard meta-analysis of only the negative and "nonsignificant" studies.
This package provides a system for batch-marking data analysis to estimate survival probabilities, capture probabilities, and enumerate the population abundance for both marked and unmarked individuals. The estimation of only marked individuals can be achieved through the batchMarkOptim() function. Similarly, the combined marked and unmarked can be achieved through the batchMarkUnmarkOptim() function. The algorithm was also implemented for the hidden Markov model encapsulated in batchMarkUnmarkOptim() to estimate the abundance of both marked and unmarked individuals in the population. The package is based on the paper: "Hidden Markov Models for Extended Batch Data" of Cowen et al. (2017) <doi:10.1111/biom.12701>.
This package provides a collection of tools for the calculation of freewater metabolism from in situ time series of dissolved oxygen, water temperature, and, optionally, additional environmental variables. LakeMetabolizer implements 5 different metabolism models with diverse statistical underpinnings: bookkeeping, ordinary least squares, maximum likelihood, Kalman filter, and Bayesian. Each of these 5 metabolism models can be combined with 1 of 7 models for computing the coefficient of gas exchange across the airĂ¢ water interface (k). LakeMetabolizer also features a variety of supporting functions that compute conversions and implement calculations commonly applied to raw data prior to estimating metabolism (e.g., oxygen saturation and optical conversion models).
This package provides functions to analyze coarse data. Specifically, it contains functions to (1) fit parametric accelerated failure time models to interval-censored survival time data, and (2) estimate the case-fatality ratio in scenarios with under-reporting. This package's development was motivated by applications to infectious disease: in particular, problems with estimating the incubation period and the case fatality ratio of a given disease. Sample data files are included in the package. See Reich et al. (2009) <doi:10.1002/sim.3659>, Reich et al. (2012) <doi:10.1111/j.1541-0420.2011.01709.x>, and Lessler et al. (2009) <doi:10.1016/S1473-3099(09)70069-6>.
This package provides a library of density, distribution function, quantile function, (bounded) raw moments and random generation for a collection of distributions relevant for the firm size literature. Additionally, the package contains tools to fit these distributions using maximum likelihood and evaluate these distributions based on (i) log-likelihood ratio and (ii) deviations between the empirical and parametrically implied moments of the distributions. We add flexibility by allowing the considered distributions to be combined into piecewise composite or finite mixture distributions, as well as to be used when truncated. See Dewitte (2020) <https://hdl.handle.net/1854/LU-8644700> for a description and application of methods available in this package.
This package defines a BigMatrix ReferenceClass which adds safety and convenience features to the filebacked.big.matrix class from the bigmemory package. BigMatrix protects against segfaults by monitoring and gracefully restoring the connection to on-disk data and it also protects against accidental data modification with a file-system-based permissions system. Utilities are provided for using BigMatrix-derived classes as assayData matrices within the Biobase package's eSet family of classes. BigMatrix provides some optimizations related to attaching to, and indexing into, file-backed matrices with dimnames. Additionally, the package provides a BigMatrixFactor class, a file-backed matrix with factor properties.
As different antipsychotic medications have different potencies, the doses of different medications cannot be directly compared. Various strategies are used to convert doses into a common reference so that comparison is meaningful. Chlorpromazine (CPZ) has historically been used as a reference medication into which other antipsychotic doses can be converted, as "chlorpromazine-equivalent doses". Using conversion keys generated from widely-cited scientific papers, e.g. Gardner et. al 2010 <doi:10.1176/appi.ajp.2009.09060802> and Leucht et al. 2016 <doi:10.1093/schbul/sbv167>, antipsychotic doses are converted to CPZ (or any specified antipsychotic) equivalents. The use of the package is described in the included vignette. Not for clinical use.
The Core Microbiome refers to the group of microorganisms that are consistently present in a particular environment, habitat, or host species. These microorganisms play a crucial role in the functioning and stability of that ecosystem. Identifying these microorganisms can contribute to the emerging field of personalized medicine. The CoreMicrobiomeR is designed to facilitate the identification, statistical testing, and visualization of this group of microorganisms.This package offers three key functions to analyze and visualize microbial community data. This package has been developed based on the research papers published by Pereira et al.(2018) <doi:10.1186/s12864-018-4637-6> and Beule L, Karlovsky P. (2020) <doi:10.7717/peerj.9593>.
The komacv-rg bundle provides packages that aid in creating CVs based on the komacv class and creating related documents, such as cover letters and cover sheets for job applications.
Concretely, the bundle consists of three packages: komacv-addons, komacv-lco, and komacv-multilang. komacv-addons is a small collection of add-ons and fixes for the komacv class; komacv-lco enables the use of letter class options from scrlttr2 also in komacv-based and other non-scrlttr2-based documents; komacv-multilang enables the provisioning of CVs in multiple languages and the selection of a language via Babel or Polyglossia.
Generalized Additive Model for Location, Scale and Shape (GAMLSS) with zero inflated beta (BEZI) family for analysis of microbiome relative abundance data (with various options for data transformation/normalization to address compositional effects) and random effects meta-analysis models for meta-analysis pooling estimates across microbiome studies are implemented. Random Forest model to predict microbiome age based on relative abundances of shared bacterial genera with the Bangladesh data (Subramanian et al 2014), comparison of multiple diversity indexes using linear/linear mixed effect models and some data display/visualization are also implemented. The reference paper is published by Ho NT, Li F, Wang S, Kuhn L (2019) <doi:10.1186/s12859-019-2744-2> .
This package provides a fast implementation with additional experimental features for testing, monitoring and dating structural changes in (linear) regression models. strucchangeRcpp features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g. cumulative/moving sum, recursive/moving estimates) and F statistics, respectively. These methods are described in Zeileis et al. (2002) <doi:10.18637/jss.v007.i02>. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals, and their magnitude as well as the model fit can be evaluated using a variety of statistical measures.
multiHiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. This extension of the original HiCcompare package now allows for Hi-C experiments with more than 2 groups and multiple samples per group. multiHiCcompare operates on processed Hi-C data in the form of sparse upper triangular matrices. It accepts four column (chromosome, region1, region2, IF) tab-separated text files storing chromatin interaction matrices. multiHiCcompare provides cyclic loess and fast loess (fastlo) methods adapted to jointly normalizing Hi-C data. Additionally, it provides a general linear model (GLM) framework adapting the edgeR package to detect differences in Hi-C data in a distance dependent manner.