tidyr is a reframing of the reshape2 package designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis. It is designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.
SPAMS (SPArse Modeling Software) is an optimization toolbox for solving various sparse estimation problems. It includes tools for the following problems:
Dictionary learning and matrix factorization (NMF, sparse principle component analysis (PCA), ...)
Solving sparse decomposition problems with LARS, coordinate descent, OMP, SOMP, proximal methods
Solving structured sparse decomposition problems (l1/l2, l1/linf, sparse group lasso, tree-structured regularization, structured sparsity with overlapping groups,...).
This package provides a collection of functions for structure learning of causal networks and estimation of joint causal effects from observational Gaussian data. Main algorithm consists of a Markov chain Monte Carlo scheme for posterior inference of causal structures, parameters and causal effects between variables. References: F. Castelletti and A. Mascaro (2021) <doi:10.1007/s10260-021-00579-1>, F. Castelletti and A. Mascaro (2022) <doi:10.48550/arXiv.2201.12003>
.
Supplies higher-order coordinatized data specification and fluid transform operators that include pivot and anti-pivot as special cases. The methodology is describe in Zumel', 2018, "Fluid data reshaping with cdata'", <https://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html>
, <DOI:10.5281/zenodo.1173299> . This package introduces the idea of explicit control table specification of data transforms. Works on in-memory data or on remote data using rquery and SQL database interfaces.
Stan based functions to estimate CAR-MM models. These models allow to estimate Generalised Linear Models with CAR (conditional autoregressive) spatial random effects for spatially and temporally misaligned data, provided a suitable Multiple Membership matrix. The main references are Gramatica, Liverani and Congdon (2023) <doi:10.1214/23-BA1370>, Petrof, Neyens, Nuyts, Nackaerts, Nemery and Faes (2020) <doi:10.1002/sim.8697> and Gramatica, Congdon and Liverani <doi:10.1111/rssc.12480>.
Generally, most of the packages specify the probability density function, cumulative distribution function, quantile function, and random numbers generation of the probability distributions. The present package allows to compute some important distributional properties, including the first four ordinary and central moments, Pearson's coefficient of skewness and kurtosis, the mean and variance, coefficient of variation, median, and quartile deviation at some parametric values of several well-known and extensively used probability distributions.
We offer an implementation of the series representation put forth in "A series representation for multidimensional Rayleigh distributions" by Wiegand and Nadarajah <DOI: 10.1002/dac.3510>. Furthermore we have implemented an integration approach proposed by Beaulieu et al. for 3 and 4-dimensional Rayleigh densities (Beaulieu, Zhang, "New simplest exact forms for the 3D and 4D multivariate Rayleigh PDFs with applications to antenna array geometrics", <DOI: 10.1109/TCOMM.2017.2709307>).
R package to build and simulate deterministic discrete-time compartmental models that can be non-Markov. Length of stay in each compartment can be defined to follow a parametric distribution (d_exponential()
, d_gamma()
, d_weibull()
, d_lognormal()
) or a non-parametric distribution (nonparametric()
). Other supported types of transition from one compartment to another includes fixed transition (constant()
), multinomial (multinomial()
), fixed transition probability (transprob()
).
Interface to the python package dgpsi for Gaussian process, deep Gaussian process, and linked deep Gaussian process emulations of computer models and networks using stochastic imputation (SI). The implementations follow Ming & Guillas (2021) <doi:10.1137/20M1323771> and Ming, Williamson, & Guillas (2023) <doi:10.1080/00401706.2022.2124311> and Ming & Williamson (2023) <doi:10.48550/arXiv.2306.01212>
. To get started with the package, see <https://mingdeyu.github.io/dgpsi-R/>.
This package provides functions and methods for: splitting large raster objects into smaller chunks, transferring images from a binary format into raster layers, transferring raster layers into an RData file, calculating the maximum gap (amount of consecutive missing values) of a numeric vector, and fitting harmonic regression models to periodic time series. The homoscedastic harmonic regression model is based on G. Roerink, M. Menenti and W. Verhoef (2000) <doi:10.1080/014311600209814>.
Statistical testing procedures for detecting GxE
(gene-environment) interactions. The main focus lies on GRSxE
interaction tests that aim at detecting GxE
interactions through GRS (genetic risk scores). Moreover, a novel testing procedure based on bagging and OOB (out-of-bag) predictions is implemented for incorporating all available observations at both GRS construction and GxE
testing (Lau et al., 2023, <doi:10.1038/s41598-023-28172-4>).
Generation of synthetic data from a real dataset using the combination of rank normal inverse transformation with the calculation of correlation matrix <doi:10.1055/a-2048-7692>. Completely artificial data may be generated through the use of Generalized Lambda Distribution and Generalized Poisson Distribution <doi:10.1201/9781420038040>. Quantitative, binary, ordinal categorical, and survival data may be simulated. Functionalities are offered to generate synthetic data sets according to user's needs.
Partial informational correlation (PIC) is used to identify the meaningful predictors to the response from a large set of potential predictors. Details of methodologies used in the package can be found in Sharma, A., Mehrotra, R. (2014). <doi:10.1002/2013WR013845>, Sharma, A., Mehrotra, R., Li, J., & Jha, S. (2016). <doi:10.1016/j.envsoft.2016.05.021>, and Mehrotra, R., & Sharma, A. (2006). <doi:10.1016/j.advwatres.2005.08.007>.
Download and analyze motor vehicle crash data released by the New Jersey Department of Transportation (NJDOT). The data in this package is collected through the filing of NJTR-1 form by police officers, which provide a standardized way of documenting a motor vehicle crash that occurred in New Jersey. 3 different data tables containing data on crashes, vehicles & pedestrians released from 2001 to the present can be downloaded & cleaned using this package.
This package provides a comprehensive suite of tools for analyzing omics data. It includes functionalities for alpha diversity analysis, beta diversity analysis, differential abundance analysis, community assembly analysis, visualization of phylogenetic tree, and functional enrichment analysis. With a progressive approach, the package offers a range of analysis methods to explore and understand the complex communities. It is designed to support researchers and practitioners in conducting in-depth and professional omics data analysis.
Calculates, via simulation, power and appropriate stopping alpha boundaries (and/or futility bounds) for sequential analyses (i.e., group sequential design) as well as for multiple hypotheses (multiple tests included in an analysis), given any specified global error rate. This enables the sequential use of practically any significance test, as long as the underlying data can be simulated in advance to a reasonable approximation. Lukács (2022) <doi:10.21105/joss.04643>.
An automatic cell type detection and assignment algorithm for single cell RNA-Seq and Cytof/FACS data. SCINA is capable of assigning cell type identities to a pool of cells profiled by scRNA-Seq
or Cytof/FACS data with prior knowledge of markers, such as genes and protein symbols that are highly or lowly expressed in each category. See Zhang Z, et al (2019) <doi:10.3390/genes10070531> for more details.
Implementation of SING algorithm to extract joint and individual non-Gaussian components from two datasets. SING uses an objective function that maximizes the skewness and kurtosis of latent components with a penalty to enhance the similarity between subject scores. Unlike other existing methods, SING does not use PCA for dimension reduction, but rather uses non-Gaussianity, which can improve feature extraction. Benjamin B.Risk, Irina Gaynanova (2021) <doi:10.1214/21-AOAS1466>.
Email Finder R Client Library. Search emails are based on the website You give one domain name and it returns all the email addresses found on the internet. Email Finder generates or retrieves the most likely email address from a domain name, a first name and a last name. Email verify checks the deliverability of a given email address, verifies if it has been found in our database, and returns their sources.
Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq
) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX
, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq
data visualization functions is also included.
scCB2
is an R package implementing CB2 for distinguishing real cells from empty droplets in droplet-based single cell RNA-seq experiments (especially for 10x Chromium). It is based on clustering similar barcodes and calculating Monte-Carlo p-value for each cluster to test against background distribution. This cluster-level test outperforms single-barcode-level tests in dealing with low count barcodes and homogeneous sequencing library, while keeping FDR well controlled.
The biodb package provides access to standard remote chemical and biological databases (ChEBI, KEGG, HMDB, ...), as well as to in-house local database files (CSV, SQLite), with easy retrieval of entries, access to web services, search of compounds by mass and/or name, and mass spectra matching for LCMS and MSMS. Its architecture as a development framework facilitates the development of new database connectors for local projects or inside separate published packages.
This package provides a unified and straightforward interface for performing a variety of meta-analysis methods directly from user data. Users can input a data frame, specify key parameters, and effortlessly execute and compare multiple common meta-analytic models. Designed for immediate usability, the package facilitates transparent, reproducible research without manual implementation of each analytical method. Ideal for researchers aiming for efficiency and reproducibility, it streamlines workflows from data preparation to results interpretation.