Classical methods for combining summary data from genome-wide association studies (GWAS) only use marginal genetic effects and power can be compromised in the presence of heterogeneity. subgxe is a R package that implements p-value assisted subset testing for association (pASTA
), a method developed by Yu et al. (2019) <doi:10.1159/000496867>. pASTA
generalizes association analysis based on subsets by incorporating gene-environment interactions into the testing procedure.
Sonification (or audification) is the process of representing data by sounds in the audible range. This package provides the R function sonify()
that transforms univariate data, sampled at regular or irregular intervals, into a continuous sound with time-varying frequency. The ups and downs in frequency represent the ups and downs in the data. Sonify provides a substitute for R's plot function to simplify data analysis for the visually impaired.
This package provides a set of commonly used distance measures and some additional functions which, although initially not designed for this purpose, can be used to measure the dissimilarity between time series. These measures can be used to perform clustering, classification or other data mining tasks which require the definition of a distance measure between time series. U. Mori, A. Mendiburu and J.A. Lozano (2016), <doi:10.32614/RJ-2016-058>.
This package implements core utilities for single-cell RNA-seq data analysis. Contained within are utility functions for working with DE matrices and count matrices, a collection of functions for manipulating and plotting data via ggplot2, and functions to work with cell graphs and cell embeddings. Graph-based methods include embedding kNN cell graphs into a UMAP, collapsing vertices of each cluster in the graph, and propagating graph labels.
Circle Manhattan Plot is an R package that can lay out genome-wide association study P-value results in both traditional rectangular patterns, QQ-plot and novel circular ones. United in only one bull's eye style plot, association results from multiple traits can be compared interactively, thereby to reveal both similarities and differences between signals. Additional functions include: highlight signals, a group of SNPs, chromosome visualization and candidate genes around SNPs.
This is an alternative mechanism for importing objects from packages. The syntax allows for importing multiple objects from a package with a single command in an expressive way. The import package bridges some of the gap between using library
(or require
) and direct (single-object) imports. Furthermore the imported objects are not placed in the current environment. It is also possible to import objects from stand-alone .R
files.
This package implements anomaly detection as binary classification for cross-sectional data. Uses maximum likelihood estimates and normal probability functions to classify observations as anomalous. The method is presented in the following lecture from the Machine Learning course by Andrew Ng: <https://www.coursera.org/learn/machine-learning/lecture/C8IJp/algorithm/>, and is also described in: Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, Jaideep Srivastava (2003) <doi:10.1137/1.9781611972733.3>.
Impute the survival times for censored observations based on their conditional survival distributions derived from the Kaplan-Meier estimator. CondiS
can replace the censored observations with the best approximations from the statistical model, allowing for direct application of machine learning-based methods. When covariates are available, CondiS
is extended by incorporating the covariate information through machine learning-based regression modeling ('CondiS_X
'), which can further improve the imputed survival time.
Individual gene expression patterns are encoded into a series of eigenvector patterns ('WGCNA package). Using the framework of linear model-based differential expression comparisons ('limma package), time-course expression patterns for genes in different conditions are compared and analyzed for significant pattern changes. For reference, see: Greenham K, Sartor RC, Zorich S, Lou P, Mockler TC and McClung
CR. eLife
. 2020 Sep 30;9(4). <doi:10.7554/eLife.58993>
.
This package implements a simple, likelihood-based estimation of the reproduction number (R0) using a branching process with a Poisson likelihood. This model requires knowledge of the serial interval distribution, and dates of symptom onsets. Infectiousness is determined by weighting R0 by the probability mass function of the serial interval on the corresponding day. It is a simplified version of the model introduced by Cori et al. (2013) <doi:10.1093/aje/kwt133>.
We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.
This package contains logic for computing sparse principal components via the EESPCA method, which is based on an approximation of the eigenvector/eigenvalue identity. Includes logic to support execution of the TPower and rifle sparse PCA methods, as well as logic to estimate the sparsity parameters used by EESPCA, TPower and rifle via cross-validation to minimize the out-of-sample reconstruction error. H. Robert Frost (2021) <doi:10.1080/10618600.2021.1987254>.
This package provides causal inference with interactive fixed-effect models. It imputes counterfactuals for each treated unit using control group information based on a linear interactive fixed effects model that incorporates unit-specific intercepts interacted with time-varying coefficients. This method generalizes the synthetic control method to the case of multiple treated units and variable treatment periods, and improves efficiency and interpretability. This version supports unbalanced panels and implements the matrix completion method.
Computes the sample probability value (p-value) for the estimated coefficient from a standard genome-wide univariate regression. It computes the exact finite-sample p-value under the assumption that the measured phenotype (the dependent variable in the regression) has a known Bernoulli-normal mixture distribution. Finite-sample genome-wide regression p-values (Gwrpv) with a non-normally distributed phenotype (Gregory Connor and Michael O'Neill, bioRxiv
204727 <doi:10.1101/204727>).
Implement a coherent and flexible protocol for animal color tagging. GenTag
provides a simple computational routine with low CPU usage to create color sequences for animal tag. First, a single-color tag sequence is created from an algorithm selected by the user, followed by verification of the combination uniqueness. Three methods to produce color tag sequences are provided. Users can modify the main function core to allow a wide range of applications.
This package provides tools for the development of packages related to General Transit Feed Specification (GTFS) files. Establishes a standard for representing GTFS feeds using R data types. Provides fast and flexible functions to read and write GTFS feeds while sticking to this standard. Defines a basic gtfs class which is meant to be extended by packages that depend on it. And offers utility functions that support checking the structure of GTFS objects.
Semiparametric regression models on the cumulative incidence function for interval-censored competing risks data as described in Bakoyannis, Yu, & Yiannoutsos (2017) /doi10.1002/sim.7350 and the models with missing event types as described in Park, Bakoyannis, Zhang, & Yiannoutsos (2021) \doi10.1093/biostatistics/kxaa052. The proportional subdistribution hazards model (Fine-Gray model), the proportional odds model, and other models that belong to the class of semiparametric generalized odds rate transformation models.
This package provides tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. The goal is to help users interpret the topics in their LDA topic model.
This package provides functions implementing multivariate state space models for purposes of time series analysis and forecasting. The focus of the package is on multivariate models, such as Vector Exponential Smoothing, Vector ETS (Error-Trend-Seasonal model) etc. It currently includes Vector Exponential Smoothing (VES, de Silva et al., 2010, <doi:10.1177/1471082X0901000401>), Vector ETS (Svetunkov et al., 2023, <doi:10.1016/j.ejor.2022.04.040>) and simulation function for VES.
This package provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation. * The title is a wish not a promise. By no means we expect this package to deliver everything what is needed to achieve full reproducibility and transparency, but we believe that it supports efforts in this direction.
Probabilistic Regression Trees (PRTree). Functions for fitting and predicting PRTree models with some adaptations to handle missing values. The main calculations are performed in FORTRAN', resulting in highly efficient algorithms. This package's implementation is based on the PRTree methodology described in Alkhoury, S.; Devijver, E.; Clausel, M.; Tami, M.; Gaussier, E.; Oppenheim, G. (2020) - "Smooth And Consistent Probabilistic Regression Trees" <https://proceedings.neurips.cc/paper_files/paper/2020/file/8289889263db4a40463e3f358bb7c7a1-Paper.pdf>.
Helper functions for MASCOTNUM algorithm template, for design of numerical experiments practice: algorithm template parser to support MASCOTNUM specification <https://www.gdr-mascotnum.fr/template.html>, ask & tell decoupling injection (inspired by <https://search.r-project.org/CRAN/refmans/sensitivity/html/decoupling.html>) to use "crimped" algorithms (like uniroot()
, optim()
, ...) from outside R, basic template examples: Brent algorithm for 1 dim root finding and L-BFGS-B from base optim()
.
This package provides a framework for estimating difference-in-differences with unpoolable data, based on Karim, Webb, Austin, and Strumpf (2024) <doi:10.48550/arXiv.2403.15910>
. Supports common or staggered adoption, multiple groups, and the inclusion of covariates. Also computes p-values for the aggregate average treatment effect on the treated via the randomization inference procedure described in MacKinnon
and Webb (2020) <doi:10.1016/j.jeconom.2020.04.024>.
This package provides tools for the analysis of complex survey samples. The provided features include: summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples; variances by Taylor series linearisation or replicate weights; post-stratification, calibration, and raking; two-phase subsampling designs; graphics; PPS sampling without replacement; principal components, and factor analysis.