This package provides an R API to the Open Source Geometry Engine (GEOS) library and a vector format with which to efficiently store GEOS geometries. High-performance functions to extract information from, calculate relationships between, and transform geometries are provided. Finally, facilities to import and export geometry vectors to other spatial formats are provided.
Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the qpdf
C++ API and does not require any command line utilities. Note that qpdf
does not read actual content from PDF files: to extract text and data you need the pdftools
package.
This package is for genomic regions processing using command line tools such as BEDTools, BEDOPS and Tabix. These tools offer scalable and efficient utilities to perform genome arithmetic e.g indexing, formatting and merging. The bedr package's API enhances access to these tools as well as offers additional utilities for genomic regions processing.
Robust Clustering of Time Series (RCTS) has the functionality to cluster time series using both the classical and the robust interactive fixed effects framework. The classical framework is developed in Ando & Bai (2017) <doi:10.1080/01621459.2016.1195743>. The implementation within this package excludes the SCAD-penalty on the estimations of beta. This robust framework is developed in Boudt & Heyndels (2022) <doi:10.1016/j.ecosta.2022.01.002> and is made robust against different kinds of outliers. The algorithm iteratively updates beta (the coefficients of the observable variables), group membership, and the latent factors (which can be common and/or group-specific) along with their loadings. The number of groups and factors can be estimated if they are unknown.
This package provides a set of tools to streamline data analysis. Learning both R and introductory statistics at the same time can be challenging, and so we created rigr to facilitate common data analysis tasks and enable learners to focus on statistical concepts. We provide easy-to-use interfaces for descriptive statistics, one- and two-sample inference, and regression analyses. rigr output includes key information while omitting unnecessary details that can be confusing to beginners. Heteroscedasticity-robust ("sandwich") standard errors are returned by default, and multiple partial F-tests and tests for contrasts are easy to specify. A single regression function can fit both linear and generalized linear models, allowing students to more easily make connections between different classes of models.
This package provides a comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH
arrays. As inputs, rCGH
supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD
probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS
or Affymetrix Power Tools. rCGH
also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.
Low-rank matrix decompositions are fundamental tools and widely used for data analysis, dimension reduction, and data compression. Classically, highly accurate deterministic matrix algorithms are used for this task. However, the emergence of large-scale data has severely challenged our computational ability to analyze big data. The concept of randomness has been demonstrated as an effective strategy to quickly produce approximate answers to familiar problems such as the singular value decomposition (SVD). This package provides several randomized matrix algorithms such as the randomized singular value decomposition (rsvd
), randomized principal component analysis (rpca
), randomized robust principal component analysis (rrpca
), randomized interpolative decomposition (rid
), and the randomized CUR decomposition (rcur
). In addition several plot functions are provided.
This package provides a collection of functions related to density estimation by using Chen's (2000) idea. Mean Squared Errors (MSE) are calculated for estimated curves. For this purpose, R functions allow the distribution to be Gamma, Exponential or Weibull. For details see Chen (2000), Scaillet (2004) <doi:10.1080/10485250310001624819> and Khan and Akbar.
Efficient implementations of cross-validation techniques for linear and ridge regression models, leveraging C++ code with Rcpp', RcppParallel
', and Eigen libraries. It supports leave-one-out, generalized, and K-fold cross-validation methods, utilizing Eigen matrices for high performance. Methodology references: Hastie, Tibshirani, and Friedman (2009) <doi:10.1007/978-0-387-84858-7>.
Latent process embedding for functional network data with the Functional Adjacency Spectral Embedding. Fits smooth latent processes based on cubic spline bases. Also generates functional network data from three models, and evaluates a network generalized cross-validation criterion for dimension selection. For more information, see MacDonald
, Zhu and Levina (2022+) <arXiv:2210.07491>
.
Several group factor analysis algorithms are implemented, including Canonical Correlation-based Estimation by Choi et al. (2021) <doi:10.1016/j.jeconom.2021.09.008> , Generalised Canonical Correlation Estimation by Lin and Shin (2023) <doi:10.2139/ssrn.4295429>, Circularly Projected Estimation by Chen (2022) <doi:10.1080/07350015.2022.2051520>, and Aggregated projection method.
The half-weight index gregariousness (HWIG) is an association index used in social network analyses. It extends the half-weight association index (HWI), correcting for level of gregariousness in individuals. It is calculated using group by individual data according to methods described in Godde et al. (2013) <doi:10.1016/j.anbehav.2012.12.010>.
Creating effective colour palettes for figures is challenging. This package generates and plot palettes of optimally distinct colours in perceptually uniform colour space, based on iwanthue <http://tools.medialab.sciences-po.fr/iwanthue/>. This is done through k-means clustering of CIE Lab colour space, according to user-selected constraints on hue, chroma, and lightness.
This package implements an efficient algorithm to fit and tune penalized quantile regression models using the generalized coordinate descent algorithm. Designed to handle high-dimensional datasets effectively, with emphasis on precision and computational efficiency. This package implements the algorithms proposed in Tang, Q., Zhang, Y., & Wang, B. (2022) <https://openreview.net/pdf?id=RvwMTDYTOb>
.
This package provides functions for dimension reduction, using MAVE (Minimum Average Variance Estimation), OPG (Outer Product of Gradient) and KSIR (sliced inverse regression of kernel version). Methods for selecting the best dimension are also included. Xia (2002) <doi:10.1111/1467-9868.03411>; Xia (2007) <doi:10.1214/009053607000000352>; Wang (2008) <doi:10.1198/016214508000000418>.
Calculates Model-Averaged Tail Area Wald (MATA-Wald) confidence intervals, and MATA-Wald confidence densities and distributions, which are constructed using single-model frequentist estimators and model weights. See Turek and Fletcher (2012) <doi:10.1016/j.csda.2012.03.002> and Fletcher et al (2019) <doi:10.1007/s10651-019-00432-5> for details.
This package implements an MCMC sampler for the posterior distribution of arbitrary time-homogeneous multivariate stochastic differential equation (SDE) models with possibly latent components. The package provides a simple entry point to integrate user-defined models directly with the sampler's C++ code, and parallelizes large portions of the calculations when compiled with OpenMP
'.
Fits a non-linear transformation model ('nltm') for analyzing survival data, see Tsodikov (2003) <doi:10.1111/1467-9868.00414>. The class of nltm includes the following currently supported models: Cox proportional hazard, proportional hazard cure, proportional odds, proportional hazard - proportional hazard cure, proportional hazard - proportional odds cure, Gamma frailty, and proportional hazard - proportional odds.
The ntfy (pronounce: notify) service is a simple HTTP-based pub-sub notification service. It allows you to send notifications to your phone or desktop via scripts from any computer, entirely without signup, cost or setup. It's also open source if you want to run your own. Visit <https://ntfy.sh> for more details.
Miscellaneous R functions developed as collateral damage over the course of work in statistical and scientific computing for research. These include, for example, utilities that supplement existing idiosyncrasies of the R language, extend existing plotting functionality and aesthetics, help prepare data objects for imputation, and extend access to command line tools and systems-level information.
Calculate superior identification index and its extensions. Measure the performance of journals based on how well they could identify the top papers by any index (e.g. citation indices) according to Huang & Yang. (2022) <doi:10.1007/s11192-022-04372-z>. These methods could be extended to evaluate other entities such as institutes, countries, etc.
The goal of SIHR is to provide inference procedures in the high-dimensional generalized linear regression setting for: (1) linear functionals <doi:10.48550/arXiv.1904.12891>
<doi:10.48550/arXiv.2012.07133>
, (2) conditional average treatment effects, (3) quadratic functionals <doi:10.48550/arXiv.1909.01503>
, (4) inner product, (5) distance.
The zlib package for R aims to offer an R-based equivalent of Python's built-in zlib module for data compression and decompression. This package provides a suite of functions for working with zlib compression, including utilities for compressing and decompressing data streams, manipulating compressed files, and working with gzip', zlib', and deflate formats.
Discovery of genome-wide variable alternative splicing events from short-read RNA-seq data and visualizations of gene splicing information for publication-quality multi-panel figures in a population. (Warning: The visualizing function is removed due to the dependent package Sushi deprecated. If you want to use it, please change back to an older version.).