Consolidated data simulation, sample size calculation and analysis functions for several snSMART
(small sample sequential, multiple assignment, randomized trial) designs under one library. See Wei, B., Braun, T.M., Tamura, R.N. and Kidwell, K.M. "A Bayesian analysis of small n sequential multiple assignment randomized trials (snSMARTs
)." (2018) Statistics in medicine, 37(26), pp.3723-3732 <doi:10.1002/sim.7900>.
To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly *Bioconductor* R package to investigate and visualize the local neighborhood of a set of SNPs of interest for NGS data such as chromatin marks or transcription factor binding sites from ChIP-Seq
or RNA- Seq experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and summarize reads for a genomic region, perform various data quality checks, normalize read counts using additional input files, and to cluster and visualize the regions according to the binding pattern. The regions around each SNP can be binned in a user-defined fashion to allow for analysis of very broad patterns as well as a detailed investigation of specific binding shapes. Furthermore, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns. Finally, SNPhood can be employed for determining, investigating, and visualizing allele-specific binding patterns around the SNPs of interest.
Three functions to clean, summarize and prepare genomic datasets to Genome Selection and Genome Association analysis and to estimate population genetic parameters.
This package provides classes and statistical methods for large single-nucleotide polymorphism (SNP) association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.
This package provides historical datasets related to John Snow's 1854 cholera outbreak study in London. Includes data on cholera cases, water pump locations, and the street layout, enabling analysis and visualisation of the outbreak.
This package provides functions for analysis of network objects, which are imported or simulated by the package. The non-parametric methods of analysis center on snowball and bootstrap sampling for estimating functions of network degree distribution. For other parameters of interest, see, e.g., bootnet package.
This package provides functions for reading and writing Gadget N-body snapshots. The Gadget code is popular in astronomy for running N-body / hydrodynamical cosmological and merger simulations. To find out more about Gadget see the main distribution page at www.mpa-garching.mpg.de/gadget/.
Population genetics package for designing diagnostic panels. Candidate markers, marker combinations, and different panel sizes are assessed for how well they can predict the source population of known samples. Requires a genotype file of candidate markers in STRUCTURE format. Methods for population cross-validation are described in Jombart (2008) <doi:10.1093/bioinformatics/btn129>.
Generate knockoffs for genetic data and hidden Markov models. For more information, see the website below and the accompanying papers: "Gene hunting with hidden Markov model knockoffs", Sesia et al., Biometrika, 2019, (<doi:10.1093/biomet/asy033>). "Multi-resolution localization of causal variants across the genome", Sesia et al., bioRxiv
, 2019, (<doi:10.1101/631390>).
This package is a usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. The package is also designed as connector to the cluster management tool sfCluster
, but can also used without it.
SNPediaR
provides some tools for downloading and parsing data from the SNPedia web site <http://www.snpedia.com>. The implemented functions allow users to import the wiki text available in SNPedia pages and to extract the most relevant information out of them. If some information in the downloaded pages is not automatically processed by the library functions, users can easily implement their own parsers to access it in an efficient way.
Because your linear models deserve better than console output. A sleek color palette and kable styling to make your regression results look sharper than they are. Includes support for Partial Least Squares (PLS) regression via both the SVD and NIPALS algorithms, along with a unified interface for model fitting and fabulous LaTeX
and console output formatting. See the package manual at <https://github.com/JesusButForGayPeople/snazzieR/releases/download/v0.1.1/snazzieR_0.1.1.pdf>
.
Geostatistical modeling and kriging with gridded data using spatially separable covariance functions (Kronecker covariances). Kronecker products in these models provide shortcuts for solving large matrix problems in likelihood and conditional mean, making snapKrig
computationally efficient with large grids. The package supplies its own S3 grid object class, and a host of methods including plot, print, Ops, square bracket replace/assign, and more. Our computational methods are described in Koch, Lele, Lewis (2020) <doi:10.7939/r3-g6qb-bq70>.
This package provides a fast and accurate analysis toolkit for single cell ATAC-seq (Assay for transposase-accessible chromatin using sequencing). Single cell ATAC-seq can resolve the heterogeneity of a complex tissue and reveal cell-type specific regulatory landscapes. However, the exceeding data sparsity has posed unique challenges for the data analysis. This package r-snapatac
is an end-to-end bioinformatics pipeline for analyzing large- scale single cell ATAC-seq data which includes quality control, normalization, clustering analysis, differential analysis, motif inference and exploration of single cell ATAC-seq sequencing data.
This package provides functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.
The methods discussed in this package are new non-parametric methods based on sequential normal scores SNS (Conover et al (2017) <doi:10.1080/07474946.2017.1360091>), designed for sequences of observations, usually time series data, which may occur singly or in batches, and may be univariate or multivariate. These methods are designed to detect changes in the process, which may occur as changes in location (mean or median), changes in scale (standard deviation, or variance), or other changes of interest in the distribution of the observations, over the time observed. They usually apply to large data sets, so computations need to be simple enough to be done in a reasonable time on a computer, and easily updated as each new observation (or batch of observations) becomes available. Some examples and more detail in SNS is presented in the work by Conover et al (2019) <arXiv:1901.04443>
.
This package implements asymptotic methods related to maximally selected statistics, with applications to single-nucleotide polymorphism (SNP) data.
This package provides a consistent, flexible and easy to use tool to parse and convert strings into cases like snake or camel among others.
This package provides a wrapper allowing SQL queries to be run on a Snowflake instance directly from an R script, by using the snowflake-connector-python package in the background.
snapcount is a client interface to the Snaptron webservices which support querying by gene name or genomic region. Results include raw expression counts derived from alignment of RNA-seq samples and/or various summarized measures of expression across one or more regions/genes per-sample (e.g. percent spliced in).
This package provides an R interface to the C libstemmer
library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
RStudio addin which provides a GUI to visualize and analyse networks. After finishing a session, the code to produce the plot is inserted in the current script. Alternatively, the function SNAhelperGadget()
can be used directly from the console. Additional addins include the Netreader()
for reading network files, Netbuilder()
to create small networks via point and click, and the Componentlayouter()
to layout networks with many components manually.
This package contains functions to perform various models and methods for test equating (Kolen and Brennan, 2014 <doi:10.1007/978-1-4939-0317-7> ; Gonzalez and Wiberg, 2017 <doi:10.1007/978-3-319-51824-4> ; von Davier et. al, 2004 <doi:10.1007/b97446>). It currently implements the traditional mean, linear and equipercentile equating methods. Both IRT observed-score and true-score equating are also supported, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord IRT linking methods. It also supports newest methods such that local equating, kernel equating (using Gaussian, logistic, Epanechnikov, uniform and adaptive kernels) with presmoothing, and IRT parameter linking methods based on asymmetric item characteristic functions. Functions to obtain both standard error of equating (SEE) and standard error of equating differences between two equating functions (SEED) are also implemented for the kernel method of equating.
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. The R package SNPRelate provides a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.