ATAC-seq, an assay for Transposase-Accessible Chromatin using sequencing, is a rapid and sensitive method for chromatin accessibility analysis. It was developed as an alternative method to MNase-seq, FAIRE-seq and DNAse-seq. The ATACseqQC package was developed to help users to quickly assess whether their ATAC-seq experiment is successful. It includes diagnostic plots of fragment size distribution, proportion of mitochondria reads, nucleosome positioning pattern, and CTCF or other Transcript Factor footprints.
An interface to the AutoDesk API Platform including the Authentication API for obtaining authentication to the AutoDesk Forge Platform, Data Management API for managing data across the platform's cloud services, Design Automation API for performing automated tasks on design files in the cloud, Model Derivative API for translating design files into different formats, sending them to the viewer app, and extracting design data, and Viewer for rendering 2D and 3D models.
Indicators and measures by country and time describe what happens at economic and social levels. This package provides functions to calculate several measures of convergence after imputing missing values. The automated downloading of Eurostat data, followed by the production of country fiches and indicator fiches, makes possible to produce automated reports. The Eurofound report (<doi:10.2806/68012>) "Upward convergence in the EU: Concepts, measurements and indicators", 2018, is a detailed presentation of convergence.
Rcpp implementation of the multivariate Kim filter, which combines the Kalman and Hamilton filters for state probability inference. The filter is designed for state space models and can handle missing values and exogenous data in the observation and state equations. Kim, Chang-Jin and Charles R. Nelson (1999) "State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications" <doi:10.7551/mitpress/6444.001.0001><http://econ.korea.ac.kr/~cjkim/>.
Compute power and sample size for linear models of longitudinal data. Supported models include mixed-effects models and models fit by generalized least squares and generalized estimating equations. The package is described in Iddi and Donohue (2022) <DOI:10.32614/RJ-2022-022>. Relevant formulas are derived by Liu and Liang (1997) <DOI:10.2307/2533554>, Diggle et al (2002) <ISBN:9780199676750>, and Lu, Luo, and Chen (2008) <DOI:10.2202/1557-4679.1098>.
Without imposing stringent distributional assumptions or shape restrictions, nonparametric estimation has been popular in economics and other social sciences for counterfactual analysis, program evaluation, and policy recommendations. This package implements a novel density (and derivatives) estimator based on local polynomial regressions, documented in Cattaneo, Jansson and Ma (2022) <doi:10.18637/jss.v101.i02>: lpdensity() to construct local polynomial based density (and derivatives) estimator, and lpbwdensity() to perform data-driven bandwidth selection.
Clean the MS/MS spectrum, calculate spectral entropy, unweighted entropy similarity, and entropy similarity for mass spectrometry data. The entropy similarity is a novel similarity measure for MS/MS spectra which outperform the widely used dot product similarity in compound identification. For more details, please refer to the paper: Yuanyue Li et al. (2021) "Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification" <doi:10.1038/s41592-021-01331-z>.
Data analysis based on panel partially-observed Markov process (PanelPOMP) models. To implement such models, simulate them and fit them to panel data, panelPomp extends some of the facilities provided for time series data by the pomp package. Implemented methods include filtering (panel particle filtering) and maximum likelihood estimation (Panel Iterated Filtering) as proposed in Breto, Ionides and King (2020) "Panel Data Analysis via Mechanistic Models" <doi:10.1080/01621459.2019.1604367>.
This package implements two differentially private algorithms for estimating L2-regularized logistic regression coefficients. A randomized algorithm F is epsilon-differentially private (C. Dwork, Differential Privacy, ICALP 2006 <DOI:10.1007/11681878_14>), if |log(P(F(D) in S)) - log(P(F(D') in S))| <= epsilon for any pair D, D of datasets that differ in exactly one record, any measurable set S, and the randomness is taken over the choices F makes.
This package provides a system to implement the Q-Q boxplot. It is implemented as an extension to ggplot2'. The Q-Q boxplot is an amalgam of the boxplot and the Q-Q plot and allows the user to rapidly examine summary statistics and tail behavior for multiple distributions in the same pane. As an extension of the ggplot2 implementation of the boxplot, possible modifications to the boxplot extend to the Q-Q boxplot.
An easy to use tool that can compare splicing events in tumor and normal tissue samples using either a user generated matrix, or data from The Cancer Genome Atlas (TCGA). This package generates a matrix of splicing outliers that are significantly over or underexpressed in tumors samples compared to normal denoted by chromosome location. The package also will calculate the splicing burden in each tumor and characterize the types of splicing events that occur.
Implementations for several robust procedures that allow for (online) extraction of the signal of univariate or multivariate time series by applying robust regression techniques to a moving time window are provided. Included are univariate filtering procedures based on repeated-median regression as well as hybrid and trimmed filters derived from it; see Schettlinger et al. (2006) <doi:10.1515/BMT.2006.010>. The adaptive online repeated median by Schettlinger et al. (2010) <doi:10.1002/acs.1105> and the slope comparing adaptive repeated median by Borowski and Fried (2013) <doi:10.1007/s11222-013-9391-7> choose the width of the moving time window adaptively. Multivariate versions are also provided; see Borowski et al. (2009) <doi:10.1080/03610910802514972> for a multivariate online adaptive repeated median and Borowski (2012) <doi:10.17877/DE290R-14393> for a multivariate slope comparing adaptive repeated median. Furthermore, a repeated-median based filter with automatic outlier replacement and shift detection is provided; see Fried (2004) <doi:10.1080/10485250410001656444>.
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, and Hits) are implemented in the S4Vectors package itself.
repo2docker fetches a repository (from GitHub, GitLab, Zenodo, Figshare, Dataverse installations, a Git repository or a local directory) and builds a container image in which the code can be executed. The image build process is based on the configuration files found in the repository. repo2docker can be used to explore a repository locally by building and executing the constructed image of the repository, or as a means of building images that are pushed to a Docker registry.
Self-Attention algorithm helper functions and demonstration vignettes of increasing depth on how to construct the Self-Attention algorithm, this is based on Vaswani et al. (2017) <doi:10.48550/arXiv.1706.03762>, Dan Jurafsky and James H. Martin (2022, ISBN:978-0131873216) <https://web.stanford.edu/~jurafsky/slp3/> "Speech and Language Processing (3rd ed.)" and Alex Graves (2020) <https://www.youtube.com/watch?v=AIiwuClvH6k> "Attention and Memory in Deep Learning".
This package provides methods for assessing animal movement from telemetry and biologging data using non-parametric Bayesian methods. This includes features for pre- processing and analysis of data, as well as the visualization of results from the models. This framework does not rely on standard parametric density functions, which provides flexibility during model fitting. Further details regarding part of this framework can be found in Cullen et al. (2022) <doi:10.1111/2041-210X.13745>.
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Several tools for handling block-matrix diagonals and similar constructs are implemented. Block-diagonal matrices can be extracted or removed using two small functions implemented here. In addition, non-square matrices are supported. Block diagonal matrices occur when two dimensions of a data set are combined along one edge of a matrix. For example, trade-flow data in the decompr and gvc packages have each country-industry combination occur along both edges of the matrix.
This package provides a comprehensive collection of utility functions for data analysis and visualization in R. The package provides 55+ functions for data manipulation, file handling, color palette management, bioinformatics workflows, plotting, and package management. Features include void value handling, custom infix operators, flexible file I/O, and publication-ready visualizations with sensible defaults. Implementation follows tidyverse principles (Wickham et al. (2019) <doi:10.21105/joss.01686>) and incorporates best practices from the R community.
Identifies potential data outliers and their impact on estimates and analyses. Tool for evaluation of study credibility. Uses the forward search approach of Atkinson and Riani, "Robust Diagnostic Regression Analysis", 2000,<ISBN: o-387-95017-6> to prepare descriptive statistics of a dataset that is to be analyzed by functions lm stats, glm stats, nls stats, lme nlme, or coxph survival, or their equivalent in another language. Includes graphics functions to display the descriptive statistics.
An implementation of a clustering algorithm for functional data based on adaptive density peak detection technique, in which the density is estimated by functional k-nearest neighbor density estimation based on a proposed semi-metric between functions. The proposed functional data clustering algorithm is computationally fast since it does not need iterative process. (Alex Rodriguez and Alessandro Laio (2014) <doi:10.1126/science.1242072>; Xiao-Feng Wang and Yifan Xu (2016) <doi:10.1177/0962280215609948>).
Implementation of the methodology proposed in Data-driven design of targeted gene panels for estimating immunotherapy biomarkers', Bradley and Cannings (2021) <arXiv:2102.04296>. This package allows the user to fit generative models of mutation from an annotated mutation dataset, and then further to produce tunable linear estimators of exome-wide biomarkers. It also contains functions to simulate mutation annotated format (MAF) data, as well as to analyse the output and performance of models.
Helper functions that interface with the system utilities to learn about the local build environment. Lets you explore make rules to test the local configuration, or query pkg-config to find compiler flags and libs needed for building packages with external dependencies. Also contains tools to analyze which libraries that a installed R package linked to by inspecting output from ldd in combination with information from your distribution package manager, e.g. rpm or dpkg'.
This package provides a user-friendly tool for estimating both total and directional connectedness spillovers based on Diebold and Yilmaz (2009, 2012). It also provides the user with rolling estimation for total and net indices. User can find both orthogonalized and generalized versions for each kind of measures. See Diebold and Yilmaz (2009, 2012) find them at <doi:10.1111/j.1468-0297.2008.02208.x> and <doi:10.1016/j.ijforecast.2011.02.006>.