This is an R implementation of a constrained l1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models (SIMULE). The SIMULE algorithm can be used to estimate multiple related precision matrices. For instance, it can identify context-specific gene networks from multi-context gene expression datasets. By performing data-driven network inference from high-dimensional and heterogenous data sets, this tool can help users effectively translate aggregated data into knowledge that take the form of graphs among entities. Please run demo(simuleDemo
) to learn the basic functions provided by this package. For further details, please read the original paper: Beilun Wang, Ritambhara Singh, Yanjun Qi (2017) <DOI:10.1007/s10994-017-5635-7>.
This package provides a process-oriented and trajectory-based Discrete-Event Simulation (DES) package for R. It is designed as a generic yet powerful framework. The architecture encloses a robust and fast simulation core written in C++ with automatic monitoring capabilities. It provides a rich and flexible R API that revolves around the concept of trajectory, a common path in the simulation model for entities of the same type. Documentation about simmer is provided by several vignettes included in this package, via the paper by Ucar, Smeets & Azcorra (2019, <doi:10.18637/jss.v090.i02>), and the paper by Ucar, Hernández, Serrano & Azcorra (2018, <doi:10.1109/MCOM.2018.1700960>); see citation("simmer") for details.
The Langmuir and Freundlich adsorption isotherms are pivotal in characterizing adsorption processes, essential across various scientific disciplines. Proper interpretation of adsorption isotherms involves robust fitting of data to the models, accurate estimation of parameters, and efficiency evaluation of the models, both in linear and non-linear forms. For researchers and practitioners in the fields of chemistry, environmental science, soil science, and engineering, a comprehensive package that satisfies all these requirements would be ideal for accurate and efficient analysis of adsorption data, precise model selection and validation for rigorous scientific inquiry and real-world applications. Details can be found in Langmuir (1918) <doi:10.1021/ja02242a004> and Giles (1973) <doi:10.1111/j.1478-4408.1973.tb03158.x>.
This package provides a model building procedure to build parsimonious geoadditive model from a large number of covariates. Continuous, binary and ordered categorical responses are supported. The model building is based on component wise gradient boosting with linear effects, smoothing splines and a smooth spatial surface to model spatial autocorrelation. The resulting covariate set after gradient boosting is further reduced through backward elimination and aggregation of factor levels. The package provides a model based bootstrap method to simulate prediction intervals for point predictions. A test data set of a soil mapping case study in Berne (Switzerland) is provided. Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A. (2017) <doi:10.5194/soil-3-191-2017>.
This package provides an interface to the Maxar Geospatial Platform (MGP) Application Programming Interface. <https://www.maxar.com/maxar-geospatial-platform> It facilitates imagery searches using the MGP Streaming Application Programming Interface via the Web Feature Service (WFS) method, and supports image downloads through Web Map Service (WMS) and Web Map Tile Service (WMTS) Open Geospatial Consortium (OGC) methods. Additionally, it integrates with the Maxar Geospatial Platform Basemaps Application Programming Interface for accessing Maxar basemaps imagery and seamlines. The package also offers seamless integration with the Maxar Geospatial Platform Discovery Application Programming Interface, allowing users to search, filter, and sort Maxar content, while retrieving detailed metadata in formats like SpatioTemporal
Asset Catalog (STAC) and GeoJSON
.
Medication adherence, defined as medication-taking behavior that aligns with the agreed-upon treatment protocol, is critical for realizing the benefits of prescription medications. Medication adherence can be assessed using electronic adherence monitoring devices (EAMDs), pill bottles or boxes that contain a computer chip that records the date and time of each opening (or â actuationâ ). Before researchers can use EAMD data, they must apply a series of decision rules to transform actuation data into adherence data. The purpose of this R package ('oncmap') is to transform EAMD actuations in the form of a raw .csv file, information about the patient, regimen, and non-monitored periods into two daily adherence values -- Dose Taken and Correct Dose Taken.
cogena is a workflow for co-expressed gene-set enrichment analysis. It aims to discovery smaller scale, but highly correlated cellular events that may be of great biological relevance. A novel pipeline for drug discovery and drug repositioning based on the cogena workflow is proposed. Particularly, candidate drugs can be predicted based on the gene expression of disease-related data, or other similar drugs can be identified based on the gene expression of drug-related data. Moreover, the drug mode of action can be disclosed by the associated pathway analysis. In summary, cogena is a flexible workflow for various gene set enrichment analysis for co-expressed genes, with a focus on pathway/GO analysis and drug repositioning.
This package provides a RangedSummarizedExperiment
object of read counts in genes for an RNA-Seq experiment on four human airway smooth muscle cell lines treated with dexamethasone. Details on the gene model and read counting procedure are provided in the package vignette. The citation for the experiment is: Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells. PLoS
One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
Implementation of network integration approaches comprising unweighted and weighted integration methods. Unweighted integration is performed considering the average, per-edge average, maximum and minimum of networks edges. Weighted integration takes into account a weight for each network during the fusion process, where the weights express the predictiveness strength of each network considering a specific predictive task. Weights can be learned using a machine learning algorithm able to associate the weights to the assessment of the accuracy of the learning algorithm trained on the network itself. The implemented methods can be applied to effectively integrate different biological networks modelling a wide range of problems in bioinformatics (e.g. disease gene prioritization, protein function prediction, drug repurposing, clinical outcome prediction).
The nonparametric methods for estimating copula entropy, transfer entropy, and the statistics for multivariate normality test and two-sample test are implemented. The methods for estimating transfer entropy and the statistics for multivariate normality test and two-sample test are based on the method for estimating copula entropy. The method for change point detection with copula entropy based two-sample test is also implemented. Please refer to Ma and Sun (2011) <doi:10.1016/S1007-0214(11)70008-6>, Ma (2019) <doi:10.48550/arXiv.1910.04375>
, Ma (2022) <doi:10.48550/arXiv.2206.05956>
, Ma (2023) <doi:10.48550/arXiv.2307.07247>
, and Ma (2024) <doi:10.48550/arXiv.2403.07892>
for more information.
This is an R implementation of Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure (DIFFEE). The DIFFEE algorithm can be used to fast estimate the differential network between two related datasets. For instance, it can identify differential gene network from datasets of case and control. By performing data-driven network inference from two high-dimensional data sets, this tool can help users effectively translate two aggregated data blocks into knowledge of the changes among entities between two Gaussian Graphical Model. Please run demo(diffeeDemo
) to learn the basic functions provided by this package. For further details, please read the original paper: Beilun Wang, Arshdeep Sekhon, Yanjun Qi (2018) <arXiv:1710.11223>
.
This package provides a user-friendly interface, using Shiny, to analyse glucose-stimulated insulin secretion (GSIS) assays in pancreatic beta cells or islets. The package allows the user to import several sets of experiments from different spreadsheets and to perform subsequent steps: summarise in a tidy format, visualise data quality and compare experimental conditions without omitting to account for technical confounders such as the date of the experiment or the technician. Together, insane is a comprehensive method that optimises pre-processing and analyses of GSIS experiments in a friendly-user interface. The Shiny App was initially designed for EndoC-betaH1
cell line following method described in Ndiaye et al., 2017 (<doi:10.1016/j.molmet.2017.03.011>).
This package provides a routine to partial out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm described in Stammann (2018) <arXiv:1707.01815>
and is restricted to glm's that are based on maximum likelihood estimation and nonlinear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides analytical bias corrections for binary choice models derived by Fernandez-Val and Weidner (2016) <doi:10.1016/j.jeconom.2015.12.014> and Hinz, Stammann, and Wanner (2020) <arXiv:2004.12655>
.
The Clinical Trials Network (CTN) of the U.S. National Institute of Drug Abuse sponsored the CTN-0094 research team to harmonize data sets from three nationally-representative clinical trials for opioid use disorder (OUD). The CTN-0094 team herein provides a coded collection of trial outcomes and endpoints used in various OUD clinical trials over the past 50 years. These coded outcome functions are used to contrast and cluster different clinical outcome functions based on daily or weekly patient urine screenings. Note that we abbreviate urine drug screen as "UDS" and urine opioid screen as "UOS". For the example data sets (based on clinical trials data harmonized by the CTN-0094 research team), UDS and UOS are largely interchangeable.
Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in The C Programming Language did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in Clean Code: A Handbook of Agile Software Craftsmanship did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout <https://cran.r-project.org/package=lintr> instead.
Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. <https://en.wikipedia.org/wiki/FITS> for more information). Present low-level routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multi-dimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multi-dimensional arrays). Higher-level functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16-bit integers. Known incompletenesses are reading random group extensions, as well as complex and array descriptor data types in binary tables.
Models for non-linear time series analysis and causality detection. The main functionalities of this package consist of an implementation of the classical causality test (C.W.J.Granger 1980) <doi:10.1016/0165-1889(80)90069-X>, and a non-linear version of it based on feed-forward neural networks. This package contains also an implementation of the Transfer Entropy <doi:10.1103/PhysRevLett.85.461>
, and the continuous Transfer Entropy using an approximation based on the k-nearest neighbors <doi:10.1103/PhysRevE.69.066138>
. There are also some other useful tools, like the VARNN (Vector Auto-Regressive Neural Network) prediction model, the Augmented test of stationarity, and the discrete and continuous entropy and mutual information.
Fits by ABC, the parameters of a stochastic process modelling the phylogeny and evolution of a suite of traits following the tree. The user may define an arbitrary Markov process for the trait and phylogeny. Importantly, trait-dependent speciation models are handled and fitted to data. See K. Bartoszek, P. Lio (2019) <doi:10.5506/APhysPolBSupp.12.25>
. The suggested geiger package can be obtained from CRAN's archive <https://cran.r-project.org/src/contrib/Archive/geiger/>, suggested to take latest version. Otherwise its required code is present in the pcmabc package. The suggested distory package can be obtained from CRAN's archive <https://cran.r-project.org/src/contrib/Archive/distory/>, suggested to take latest version.
The creation of effective visualizations is a fundamental component of data analysis. In biomedical research, new challenges are emerging to visualize multi-dimensional data in a 2D space, but current data visualization tools have limited capabilities. To address this problem, we leverage Gestalt principles to improve the design and interpretability of multi-dimensional data in 2D data visualizations, layering aesthetics to display multiple variables. The proposed visualization can be applied to spatially-resolved transcriptomics data, but also broadly to data visualized in 2D space, such as embedding visualizations. We provide this open source R package escheR
, which is built off of the state-of-the-art ggplot2 visualization framework and can be seamlessly integrated into genomics toolboxes and workflows.
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
Supervised learning from a source distribution (with known segmentation into cell sub-populations) to fit a target distribution with unknown segmentation. It relies regularized optimal transport to directly estimate the different cell population proportions from a biological sample characterized with flow cytometry measurements. It is based on the regularized Wasserstein metric to compare cytometry measurements from different samples, thus accounting for possible mis-alignment of a given cell population across sample (due to technical variability from the technology of measurements). Supervised learning technique based on the Wasserstein metric that is used to estimate an optimal re-weighting of class proportions in a mixture model Details are presented in Freulon P, Bigot J and Hejblum BP (2023) <doi:10.1214/22-AOAS1660>.
Inference using a class of Hidden Markov models (HMMs) called oHMMed'(ordered
HMM with emission densities <doi:10.1186/s12859-024-05751-4>): The oHMMed
algorithms identify the number of comparably homogeneous regions within observed sequences with autocorrelation patterns. These are modelled as discrete hidden states; the observed data points are then realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are then inferred. Relevant for application to genomic sequences, time series, or any other sequence data with serial autocorrelation.
This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt
package offers efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel
.
Estimating and analyzing auto regressive integrated moving average (ARIMA) models. The primary function in this package is arima()
, which fits an ARIMA model to univariate time series data using a random restart algorithm. This approach frequently leads to models that have model likelihood greater than or equal to that of the likelihood obtained by fitting the same model using the arima()
function from the stats package. This package enables proper optimization of model likelihoods, which is a necessary condition for performing likelihood ratio tests. This package relies heavily on the source code of the arima()
function of the stats package. For more information, please see Jesse Wheeler and Edward L. Ionides (2023) <doi:10.48550/arXiv.2310.01198>
.