Analysis of Ct values from high throughput quantitative real-time PCR (qPCR) assays across multiple conditions or replicates. The input data can be from spatially-defined formats such ABI TaqMan Low Density Arrays or OpenArray; LightCycler from Roche Applied Science; the CFX plates from Bio-Rad Laboratories; conventional 96- or 384-well plates; or microfluidic devices such as the Dynamic Arrays from Fluidigm Corporation. HTqPCR handles data loading, quality assessment, normalization, visualization and parametric or non-parametric testing for statistical significance in Ct values between features (e.g. genes, microRNAs).
Alternating Manifold Proximal Gradient Method for Sparse PCA uses the Alternating Manifold Proximal Gradient (AManPG
) method to find sparse principal components from a data or covariance matrix. Provides a novel algorithm for solving the sparse principal component analysis problem which provides advantages over existing methods in terms of efficiency and convergence guarantees. Chen, S., Ma, S., Xue, L., & Zou, H. (2020) <doi:10.1287/ijoo.2019.0032>. Zou, H., Hastie, T., & Tibshirani, R. (2006) <doi:10.1198/106186006X113430>. Zou, H., & Xue, L. (2018) <doi:10.1109/JPROC.2018.2846588>.
This package provides functions for testing if the covariance structure of 2-dimensional data (e.g. samples of surfaces X_i = X_i(s,t)) is separable, i.e. if covariance(X) = C_1 x C_2. A complete descriptions of the implemented tests can be found in the paper Aston, John A. D.; Pigoli, Davide; Tavakoli, Shahin. Tests for separability in nonparametric covariance operators of random surfaces. Ann. Statist. 45 (2017), no. 4, 1431--1461. <doi:10.1214/16-AOS1495> <https://projecteuclid.org/euclid.aos/1498636862> <arXiv:1505.02023>
.
This project provides a group of new functions to calculate the outputs of the two main components of the Canadian Forest Fire Danger Rating System (CFFDRS) Van Wagner and Pickett (1985) <https://cfs.nrcan.gc.ca/publications?id=19973>) at various time scales: the Fire Weather Index (FWI) System Wan Wagner (1985) <https://cfs.nrcan.gc.ca/publications?id=19927> and the Fire Behaviour Prediction (FBP) System Forestry Canada Fire Danger Group (1992) <https://cfs.nrcan.gc.ca/pubwarehouse/pdfs/10068.pdf>. Some functions have two versions, table and raster based.
Flexible univariate count models based on renewal processes. The models may include covariates and can be specified with familiar formula syntax as in glm()
and package flexsurv'. The methodology is described by Kharrat et all (2019) <doi:10.18637/jss.v090.i13> (included as vignette Countr_guide in the package). If the suggested package pscl is not available from CRAN, it can be installed with remotes::install_github("cran/pscl")'. It is no longer used by the functions in this package but is needed for some of the extended examples.
Tool collection for common and not so common data science use cases. This includes custom made algorithms for data management as well as value calculations that are hard to find elsewhere because of their specificity but would be a waste to get lost nonetheless. Currently available functionality: find sub-graphs in an edge list data.frame, find mode or modes in a vector of values, extract (a) specific regular expression group(s), generate ISO time stamps that play well with file names, or generate URL parameter lists by expanding value combinations.
Conducts sensitivity analyses for unmeasured confounding, selection bias, and measurement error (individually or in combination; VanderWeele
& Ding (2017) <doi:10.7326/M16-2607>; Smith & VanderWeele
(2019) <doi:10.1097/EDE.0000000000001032>; VanderWeele
& Li (2019) <doi:10.1093/aje/kwz133>; Smith & VanderWeele
(2021) <arXiv:2005.02908>
). Also conducts sensitivity analyses for unmeasured confounding in meta-analyses (Mathur & VanderWeele
(2020a) <doi:10.1080/01621459.2018.1529598>; Mathur & VanderWeele
(2020b) <doi:10.1097/EDE.0000000000001180>) and for additive measures of effect modification (Mathur et al., under review).
Mapper-based survival analysis with transcriptomics data is designed to carry out. Mapper-based survival analysis is a modification of Progression Analysis of Disease (PAD) where survival data is taken into account in the filtering function. More details in: J. Fores-Martos, B. Suay-Garcia, R. Bosch-Romeu, M.C. Sanfeliu-Alonso, A. Falco, J. Climent, "Progression Analysis of Disease with Survival (PAD-S) by SurvMap
identifies different prognostic subgroups of breast cancer in a large combined set of transcriptomics and methylation studies" <doi:10.1101/2022.09.08.507080>.
Build a map of path-based geometry, this is a simple description of the number of parts in an object and their basic structure. Translation and restructuring operations for planar shapes and other hierarchical types require a data model with a record of the underlying relationships between elements. The gibble()
function creates a geometry map, a simple record of the underlying structure in path-based hierarchical types. There are methods for the planar shape types in the sf and sp packages and for types in the trip and silicate packages.
This package implements the GAMbag, GAMrsm and GAMens ensemble classifiers for binary classification (De Bock et al., 2010) <doi:10.1016/j.csda.2009.12.013>. The ensembles implement Bagging (Breiman, 1996) <doi:10.1023/A:1010933404324>, the Random Subspace Method (Ho, 1998) <doi:10.1109/34.709601> , or both, and use Hastie and Tibshirani's (1990, ISBN:978-0412343902) generalized additive models (GAMs) as base classifiers. Once an ensemble classifier has been trained, it can be used for predictions on new data. A function for cross validation is also included.
An implementation of classifier chains (CC's) for multi-label prediction. Users can employ an external package (e.g. randomForest
', C50'), or supply their own. The package can train a single set of CC's or train an ensemble of CC's -- in parallel if running in a multi-core environment. New observations are classified using a Gibbs sampler since each unobserved label is conditioned on the others. The package includes methods for evaluating the predictions for accuracy and aggregating across iterations and models to produce binary or probabilistic classifications.
Three distinct methods are implemented for evaluating the sums of arbitrary negative binomial distributions. These methods are: Furman's exact probability mass function (Furman (2007) <doi:10.1016/j.spl.2006.06.007>), saddlepoint approximation, and a method of moments approximation. Functions are provided to calculate the density function, the distribution function and the quantile function of the convolutions in question given said evaluation methods. Functions for generating random deviates from negative binomial convolutions and for directly calculating the mean, variance, skewness, and excess kurtosis of said convolutions are also provided.
An introduction to several novel predictive variable selection methods for random forest. They are based on various variable importance methods (i.e., averaged variable importance (AVI), and knowledge informed AVI (i.e., KIAVI, and KIAVI2)) and predictive accuracy in stepwise algorithms. For details of the variable selection methods, please see: Li, J., Siwabessy, J., Huang, Z. and Nichol, S. (2019) <doi:10.3390/geosciences9040180>. Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F., Nichol, S. (2017). <DOI: 10.13140/RG.2.2.27686.22085>.
Performance analysis workflow that combines the power of the R language (and the tidyverse realm) and many auxiliary tools to provide a consistent, flexible, extensible, fast, and versatile framework for the performance analysis of task-based applications that run on top of the StarPU
runtime (with its MPI (Message Passing Interface) layer for multi-node support). Its goal is to provide a fruitful prototypical environment to conduct performance analysis hypothesis-checking for task-based applications that run on heterogeneous (multi-GPU, multi-core) multi-node HPC (High-performance computing) platforms.
This package provides a nonparametric method to estimate Toeplitz covariance matrices from a sample of n independently and identically distributed p-dimensional vectors with mean zero. The data is preprocessed with the discrete cosine matrix and a variance stabilization transformation to obtain an approximate Gaussian regression setting for the log-spectral density function. Estimates of the spectral density function and the inverse of the covariance matrix are provided as well. Functions for simulating data and a protein data example are included. For details see (Klockmann, Krivobokova; 2023), <arXiv:2303.10018>
.
Interface package for sala', the spatial network analysis library from the depthmapX
software application. The R parts of the code are based on the rdepthmap package. Allows for the analysis of urban and building-scale networks and provides metrics and methods usually found within the Space Syntax domain. Methods in this package are described by K. Al-Sayed, A. Turner, B. Hillier, S. Iida and A. Penn (2014) "Space Syntax methodology", and also by A. Turner (2004) <https://discovery.ucl.ac.uk/id/eprint/2651> "Depthmap 4: a researcher's handbook".
This package provides a tool for analyzing conjoint experiments using Bayesian Additive Regression Trees ('BART'), a machine learning method developed by Chipman, George and McCulloch
(2010) <doi:10.1214/09-AOAS285>. This tool focuses specifically on estimating, identifying, and visualizing the heterogeneity within marginal component effects, at the observation- and individual-level. It uses a variable importance measure ('VIMP') with delete-d jackknife variance estimation, following Ishwaran and Lu (2019) <doi:10.1002/sim.7803>, to obtain bias-corrected estimates of which variables drive heterogeneity in the predicted individual-level effects.
Tool for the development of multi-linear QSPR/QSAR models (Quantitative structure-property/activity relationship). Theses models are used in chemistry, biology and pharmacy to find a relationship between the structure of a molecule and its property (such as activity, toxicology but also physical properties). The various functions of this package allows: selection of descriptors based of variances, intercorrelation and user expertise; selection of the best multi-linear regression in terms of correlation and robustness; methods of internal validation (Leave-One-Out, Leave-Many-Out, Y-scrambling) and external using test sets.
An implementation of the fast structural filtering with L0 penalty. It includes an adaptive polynomial estimator by minimizing the least squares error with constraints on the number of breaks in their (k + 1)-st discrete derivative, for a chosen integer k >= 0. It also includes generalized structure sparsity constraint, i.e., graph trend filtering. This package is implemented via the primal dual active set algorithm, which formulates estimates and residuals as primal and dual variables, and utilizes efficient active set selection strategies based on the properties of the primal and dual variables.
The algorithm Leabra (local error driven and associative biologically realistic algorithm) allows for the construction of artificial neural networks that are biologically realistic and balance supervised and unsupervised learning within a single framework. This package is based on the MATLAB version by Sergio Verduzco-Flores, which in turn was based on the description of the algorithm by Randall O'Reilly (1996) <ftp://grey.colorado.edu/pub/oreilly/thesis/oreilly_thesis.all.pdf>. For more general (not R specific) information on the algorithm Leabra see <https://grey.colorado.edu/emergent/index.php/Leabra>.
This package provides tools for motif analysis in multi-level networks. Multi-level networks combine multiple networks in one, e.g. social-ecological networks. Motifs are small configurations of nodes and edges (subgraphs) occurring in networks. motifr can visualize multi-level networks, count multi-level network motifs and compare motif occurrences to baseline models. It also identifies contributions of existing or potential edges to motifs to find critical or missing edges. The package is in many parts an R wrapper for the excellent SESMotifAnalyser
Python package written by Tim Seppelt.
This package provides a flexible framework for power analysis using Monte Carlo simulation for settings in which considerations of the correlations between predictors are important. Users can set up a data generative model that preserves dependence structures among predictors given existing data (continuous, binary, or ordinal). Users can also generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This package includes several statistical models common in environmental mixtures studies. For more details and tutorials, see Nguyen et al. (2022) <arXiv:2209.08036>
.
This package provides a comprehensive, user-friendly package for label-free proteomics data analysis and machine learning-based modeling. Data generated from MaxQuant
can be easily used to conduct differential expression analysis, build predictive models with top protein candidates, and assess model performance. promor includes a suite of tools for quality control, visualization, missing data imputation (Lazar et. al. (2016) <doi:10.1021/acs.jproteome.5b00981>), differential expression analysis (Ritchie et. al. (2015) <doi:10.1093/nar/gkv007>), and machine learning-based modeling (Kuhn (2008) <doi:10.18637/jss.v028.i05>).
The semiparametric accelerated failure time (AFT) model is an attractive alternative to the Cox proportional hazards model. This package provides a suite of functions for fitting one popular estimator of the semiparametric AFT model, the regularized Gehan estimator. Specifically, we provide functions for cross-validation, prediction, coefficient extraction, and visualizing both trace plots and cross-validation curves. For further details, please see Suder, P. M. and Molstad, A. J., (2022+) Scalable algorithms for semiparametric accelerated failure time models in high dimensions, to appear in Statistics in Medicine <doi:10.1002/sim.9264>.