This package provides a two-stage regression method that can be used when various input data types are correlated, for example gene expression and methylation in drug response prediction. In the first stage it uses the upstream features (such as methylation) to predict the response variable (such as drug response), and in the second stage it uses the downstream features (such as gene expression) to predict the residuals of the first stage. In our manuscript (Aben et al., 2016, <doi:10.1093/bioinformatics/btw449>), we show that using TANDEM prevents the model from being dominated by gene expression and that the features selected by TANDEM are more interpretable.
Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include NCBI E-utilities (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), Encyclopedia of Life (<https://eol.org/docs/what-is-eol/data-services>), Global Biodiversity Information Facility (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.
cosmiq is a tool for the preprocessing of liquid- or gas - chromatography mass spectrometry (LCMS/GCMS) data with a focus on metabolomics or lipidomics applications. To improve the detection of low abundant signals, cosmiq generates master maps of the mZ/RT
space from all acquired runs before a peak detection algorithm is applied. The result is a more robust identification and quantification of low-intensity MS signals compared to conventional approaches where peak picking is performed in each LCMS/GCMS file separately. The cosmiq package builds on the xcmsSet
object structure and can be therefore integrated well with the package xcms as an alternative preprocessing step.
The purpose of this package is to provide a lightweight and unified Future API for sequential and parallel processing of R expression via futures. This package implements sequential, multicore, multisession, and cluster futures. With these, R expressions can be evaluated on the local machine, in parallel a set of local machines, or distributed on a mix of local and remote machines. Extensions to this package implement additional backends for processing futures via compute cluster schedulers etc. Because of its unified API, there is no need to modify any code in order to switch from sequential on the local machine to, say, distributed processing on a remote compute cluster.
Providing classes, methods, and functions to deal with financial networks. Users can easily store information about both physical and legal persons by using pre-made classes that are studied for integration with scraping packages such as rvest and RSelenium'. Moreover, the package assists in creating various types of financial networks depending on the type of relation between its units depending on the relation under scrutiny (ownership, board interlocks, etc.), the desired tie type (valued or binary), and renders them in the most common formats (adjacency matrix, incidence matrix, edge list, igraph', network'). There are also ad-hoc functions for the Fiedler value, global network efficiency, and cascade-failure analysis.
Efficient algorithms for fitting generalized linear and additive models with group elastic net penalties as described in Helwig (2025) <doi:10.1080/10618600.2024.2362232>. Implements group LASSO, group MCP, and group SCAD with an optional group ridge penalty. Computes the regularization path for linear regression (gaussian), multivariate regression (multigaussian), Huberized support vector machines (hsvm), logistic regression (binomial), multinomial logistic regression (multinomial), log-linear count regression (poisson and negative.binomial), and log-linear continuous regression (gamma and inverse gaussian). Supports default and formula methods for model specification, k-fold cross-validation for tuning the regularization parameters, and nonparametric regression via tensor product reproducing kernel (smoothing spline) basis function expansion.
Functionality required to efficiently use R with IBM(R) Db2(R) Warehouse offerings (formerly IBM dashDB(R
)) and IBM Db2 for z/OS(R) in conjunction with IBM Db2 Analytics Accelerator for z/OS. Many basic and complex R operations are pushed down into the database, which removes the main memory boundary of R and allows to make full use of parallel processing in the underlying database. For executing R-functions in a multi-node environment in parallel the idaTApply()
function requires the SparkR
package (<https://spark.apache.org/docs/latest/sparkr.html>). The optional ggplot2 package is needed for the plot.idaLm()
function only.
This package provides functions for analysing eye tracking data, including event detection, visualizations and area of interest (AOI) based analyses. The package includes implementations of the IV-T, I-DT, adaptive velocity threshold, and Identification by two means clustering (I2MC) algorithms. See separate documentation for each function. The principles underlying I-VT and I-DT algorithms are described in Salvucci & Goldberg (2000,\doi10.1145/355017.355028). Two-means clustering is described in Hessels et al. (2017, \doi10.3758/s13428-016-0822-1). The adaptive velocity threshold algorithm is described in Nyström & Holmqvist (2010,\doi10.3758/BRM.42.1.188). See a demonstration in the URL.
This package provides functions for efficiently fitting linear models with spatially correlated errors by robust (Kuensch et al. (2011) <doi:10.3929/ethz-a-009900710>) and Gaussian (Harville (1977) <doi:10.1080/01621459.1977.10480998>) (Restricted) Maximum Likelihood and for computing robust and customary point and block external-drift Kriging predictions (Cressie (1993) <doi:10.1002/9781119115151>), along with utility functions for variogram modelling in ad hoc geostatistical analyses, model building, model evaluation by cross-validation, (conditional) simulation of Gaussian processes (Davies and Bryant (2013) <doi:10.18637/jss.v055.i09>), unbiased back-transformation of Kriging predictions of log-transformed data (Cressie (2006) <doi:10.1007/s11004-005-9022-8>).
Test for overall association between microbiome composition data and phenotypes via phylogenetic kernels. The phenotype can be univariate continuous or binary (Zhao et al. (2015) <doi:10.1016/j.ajhg.2015.04.003>), survival outcomes (Plantinga et al. (2017) <doi:10.1186/s40168-017-0239-9>), multivariate (Zhan et al. (2017) <doi:10.1002/gepi.22030>) and structured phenotypes (Zhan et al. (2017) <doi:10.1111/biom.12684>). The package can also use robust regression (unpublished work) and integrated quantile regression (Wang et al. (2021) <doi:10.1093/bioinformatics/btab668>). In each case, the microbiome community effect is modeled nonparametrically through a kernel function, which can incorporate phylogenetic tree information.
The Statistical Package for REliability Data Analysis (SPREDA) implements recently-developed statistical methods for the analysis of reliability data. Modern technological developments, such as sensors and smart chips, allow us to dynamically track product/system usage as well as other environmental variables, such as temperature and humidity. We refer to these variables as dynamic covariates. The package contains functions for the analysis of time-to-event data with dynamic covariates and degradation data with dynamic covariates. The package also contains functions that can be used for analyzing time-to-event data with right censoring, and with left truncation and right censoring. Financial support from NSF and DuPont
are acknowledged.
This package provides a graphical and automated pipeline for the analysis of short time-series in R ('santaR
'). This approach is designed to accommodate asynchronous time sampling (i.e. different time points for different individuals), inter-individual variability, noisy measurements and large numbers of variables. Based on a smoothing splines functional model, santaR
is able to detect variables highlighting significantly different temporal trajectories between study groups. Designed initially for metabolic phenotyping, santaR
is also suited for other Systems Biology disciplines. Command line and graphical analysis (via a shiny application) enable fast and parallel automated analysis and reporting, intuitive visualisation and comprehensive plotting options for non-specialist users.
Fully Bayesian Classification with a subset of high-dimensional features, such as expression levels of genes. The data are modeled with a hierarchical Bayesian models using heavy-tailed t distributions as priors. When a large number of features are available, one may like to select only a subset of features to use, typically those features strongly correlated with the response in training cases. Such a feature selection procedure is however invalid since the relationship between the response and the features has be exaggerated by feature selection. This package provides a way to avoid this bias and yield better-calibrated predictions for future cases when one uses F-statistic to select features.
Can be used to read and write a fwf with an accompanying Blaise datamodel. Blaise is the software suite built by Statistics Netherlands (CBS). It is essentially a way to write and collect surveys and perform statistical analysis on the data. It stores its data in fixed width format with an accompanying metadata file, this is the Blaise format. The package automatically interprets this metadata and reads the file into an R dataframe. When supplying a datamodel for writing, the dataframe will be automatically converted to that format and checked for compatibility. Supports dataframes, tibbles and LaF
objects. For more information about Blaise', see <https://blaise.com/products/general-information>.
Consider a goodness-of-fit (GOF) problem of testing whether a random sample comes from one sample location-scale model where location and scale parameters are unknown. It is well known that Khmaladze martingale transformation method - which was proposed by Khmaladze (1981) <DOI:10.1137/1126027> - provides asymptotic distribution free test for the GOF problem. This package contains one function: KhmaladzeTrans()
. In this version, KhmaladzeTrans()
provides test statistic and critical value of GOF test for normal, Cauchy, and logistic distributions. This package used the main algorithm proposed by Kim (2020) <DOI:10.1007/s00180-020-00971-7> and tests for other distributions will be available at the later version.
An R implementation of the python program Metabolomics Peak Analysis Computational Tool ('MPACT') (Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas (2023) <doi:10.1021/acs.analchem.2c04632>). Filters in the package serve to address common errors in tandem mass spectrometry preprocessing, including: (1) isotopic patterns that are incorrectly split during preprocessing, (2) features present in solvent blanks due to carryover between samples, (3) features whose abundance is greater than user-defined abundance threshold in a specific group of samples, for example media blanks, (4) ions that are inconsistent between technical replicates, and (5) in-source fragment ions created during ionization before fragmentation in the tandem mass spectrometry workflow.
Analysis of multivariate data with two-way completely randomized factorial design. The analysis is based on fully nonparametric, rank-based methods and uses test statistics based on the Dempster's ANOVA, Wilk's Lambda, Lawley-Hotelling and Bartlett-Nanda-Pillai criteria. The multivariate response is allowed to be ordinal, quantitative, binary or a mixture of the different variable types. The package offers two functions performing the analysis, one for small and the other for large sample sizes. The underlying methodology is largely described in Bathke and Harrar (2016) <doi:10.1007/978-3-319-39065-9_7> and in Munzel and Brunner (2000) <doi:10.1016/S0378-3758(99)00212-8>.
Stochastic collapsed variational inference on mixed-membership stochastic blockmodel for networks, incorporating node-level predictors of mixed-membership vectors, as well as dyad-level predictors. For networks observed over time, the model defines a hidden Markov process that allows the effects of node-level predictors to evolve in discrete, historical periods. In addition, the package offers a variety of utilities for exploring results of estimation, including tools for conducting posterior predictive checks of goodness-of-fit and several plotting functions. The package implements methods described in Olivella, Pratt and Imai (2019) Dynamic Stochastic Blockmodel Regression for Social Networks: Application to International Conflicts', available at <https://www.santiagoolivella.info/pdfs/socnet.pdf>.
Calculates various functions needed for design and monitoring survival trials accounting for complex situations such as delayed treatment effect, treatment crossover, non-uniform accrual, and different censoring distributions between groups. The event time distribution is assumed to be piecewise exponential (PWE) distribution and the entry time is assumed to be piecewise uniform distribution. As compared with Version 1.2.1, two more types of hybrid crossover are added. A bug is corrected in the function "pwecx" that calculates the crossover-adjusted survival, distribution, density, hazard and cumulative hazard functions. Also, to generate the crossover-adjusted event time random variable, a more efficient algorithm is used and the output includes crossover indicators.
Resources, tutorials, and code snippets dedicated to exploring the intersection of quantum computing and artificial intelligence (AI) in the context of analyzing Cluster of Differentiation 4 (CD4) lymphocytes and optimizing antiretroviral therapy (ART) for human immunodeficiency virus (HIV). With the emergence of quantum artificial intelligence and the development of small-scale quantum computers, there's an unprecedented opportunity to revolutionize the understanding of HIV dynamics and treatment strategies. This project leverages the R package qsimulatR
(Ostmeyer and Urbach, 2023, <https://CRAN.R-project.org/package=qsimulatR>
), a quantum computer simulator, to explore these applications in quantum computing techniques, addressing the challenges in studying CD4 lymphocytes and enhancing ART efficacy.
The SAVVY (Survival Analysis for AdVerse
Events with VarYing
Follow-Up Times) project is a consortium of academic and pharmaceutical industry partners that aims to improve the analyses of adverse event (AE) data in clinical trials through the use of survival techniques appropriately dealing with varying follow-up times and competing events, see Stegherr, Schmoor, Beyersmann, et al. (2021) <doi:10.1186/s13063-021-05354-x>. Although statistical methodologies have advanced, in AE analyses often the incidence proportion, the incidence density or a non-parametric Kaplan-Meier estimator are used, which either ignore censoring or competing events. This package contains functions to easily conduct the proposed improved AE analyses.
The CCREPE (Compositionality Corrected by REnormalizaion and PErmutation) package is designed to assess the significance of general similarity measures in compositional datasets. In microbial abundance data, for example, the total abundances of all microbes sum to one; CCREPE is designed to take this constraint into account when assigning p-values to similarity measures between the microbes. The package has two functions: ccrepe: Calculates similarity measures, p-values and q-values for relative abundances of bugs in one or two body sites using bootstrap and permutation matrices of the data. nc.score: Calculates species-level co-variation and co-exclusion patterns based on an extension of the checkerboard score to ordinal data.
This package implements macroevolutionary analyses on phylogenetic trees. See Morlon et al. (2010) <DOI:10.1371/journal.pbio.1000493>, Morlon et al. (2011) <DOI:10.1073/pnas.1102543108>, Condamine et al. (2013) <DOI:10.1111/ele.12062>, Morlon et al. (2014) <DOI:10.1111/ele.12251>, Manceau et al. (2015) <DOI:10.1111/ele.12415>, Lewitus & Morlon (2016) <DOI:10.1093/sysbio/syv116>, Drury et al. (2016) <DOI:10.1093/sysbio/syw020>, Manceau et al. (2016) <DOI:10.1093/sysbio/syw115>, Morlon et al. (2016) <DOI:10.1111/2041-210X.12526>, Clavel & Morlon (2017) <DOI:10.1073/pnas.1606868114>, Drury et al. (2017) <DOI:10.1093/sysbio/syx079>, Lewitus & Morlon (2017) <DOI:10.1093/sysbio/syx095>, Drury et al. (2018) <DOI:10.1371/journal.pbio.2003563>, Clavel et al. (2019) <DOI:10.1093/sysbio/syy045>, Maliet et al. (2019) <DOI:10.1038/s41559-019-0908-0>, Billaud et al. (2019) <DOI:10.1093/sysbio/syz057>, Lewitus et al. (2019) <DOI:10.1093/sysbio/syz061>, Aristide & Morlon (2019) <DOI:10.1111/ele.13385>, Maliet et al. (2020) <DOI:10.1111/ele.13592>, Drury et al. (2021) <DOI:10.1371/journal.pbio.3001270>, Perez-Lamarque & Morlon (2022) <DOI:10.1111/mec.16478>, Perez-Lamarque et al. (2022) <DOI:10.1101/2021.08.30.458192>, Mazet et al. (2023) <DOI:10.1111/2041-210X.14195>, Drury et al. (2024) <DOI:10.1016/j.cub.2023.12.055>.
This package provides a collection of functions for computing centrographic statistics (e.g., standard distance, standard deviation ellipse, standard deviation box) for observations taken at point locations. Separate plotting functions have been developed for each measure. Users interested in writing results to ESRI shapefiles can do so by using results from aspace functions as inputs to the convert.to.shapefile()
and write.shapefile()
functions in the shapefiles library. We intend to provide terra integration for geographic data in a future release. The aspace package was originally conceived to aid in the analysis of spatial patterns of travel behaviour (see Buliung and Remmel 2008 <doi:10.1007/s10109-008-0063-7>).