Cross-Species Investigation and Analysis (CoSIA
) is a package that provides researchers with an alternative methodology for comparing across species and tissues using normal wild-type RNA-Seq Gene Expression data from Bgee. Using RNA-Seq Gene Expression data, CoSIA
provides multiple visualization tools to explore the transcriptome diversity and variation across genes, tissues, and species. CoSIA
uses the Coefficient of Variation and Shannon Entropy and Specificity to calculate transcriptome diversity and variation. CoSIA
also provides additional conversion tools and utilities to provide a streamlined methodology for cross-species comparison.
mbQTL
is a statistical R package for simultaneous 16srRNA,16srDNA
(microbial) and variant, SNP, SNV (host) relationship, correlation, regression studies. We apply linear, logistic and correlation based statistics to identify the relationships of taxa, genus, species and variant, SNP, SNV in the infected host. We produce various statistical significance measures such as P values, FDR, BC and probability estimation to show significance of these relationships. Further we provide various visualization function for ease and clarification of the results of these analysis. The package is compatible with dataframe, MRexperiment and text formats.
RtAudio is a set of C++ classes that provides a common API for real-time audio input/output. It was designed with the following objectives:
object-oriented C++ design
simple, common API across all supported platforms
only one source and one header file for easy inclusion in programming projects
allow simultaneous multi-api support
support dynamic connection of devices
provide extensive audio device parameter control
allow audio device capability probing
automatic internal conversion for data format, channel number compensation, (de)interleaving, and byte-swapping
This package provides tools to directly model underlying population dynamics using date datasets (radiocarbon and other) with a Continuous Piecewise Linear (CPL) model framework. Various other model types included. Taphonomic loss included optionally as a power function. Model comparison framework using BIC. Package also calibrates 14C samples, generates Summed Probability Distributions (SPD), and performs SPD simulation analysis to generate a Goodness-of-fit test for the best selected model. Details about the method can be found in Timpson A., Barberena R., Thomas M. G., Mendez C., Manning K. (2020) <doi:10.1098/rstb.2019.0723>.
The Biomarker Optimal Segmentation System R package, bossR
', is designed for precision medicine, helping to identify individual traits using biomarkers. It focuses on determining the most effective cutoff value for a continuous biomarker, which is crucial for categorizing patients into two groups with distinctly different clinical outcomes. The package simultaneously finds the optimal cutoff from given candidate values and tests its significance. Simulation studies demonstrate that bossR
offers statistical power and false positive control non-inferior to the permutation approach (considered the gold standard in this field), while being hundreds of times faster.
Inference functionalities for distributed-lag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributed-lag linear regression with constrained lag shapes (Magrini, 2018 <doi:10.2478/bile-2018-0012>; Magrini et al., 2019 <doi:10.1007/s11135-019-00855-z>). DLSEMs account for temporal delays in the dependence relationships among the variables through a single parameter per covariate, thus allowing to perform dynamic causal inference in a feasible fashion. Endpoint-constrained quadratic, quadratic decreasing, linearly decreasing and gamma lag shapes are available.
Simultaneously detect the number and locations of change points in piecewise linear models under stationary Gaussian noise allowing autocorrelated random noise. The core idea is to transform the problem of detecting change points into the detection of local extrema (local maxima and local minima)through kernel smoothing and differentiation of the data sequence, see Cheng et al. (2020) <doi:10.1214/20-EJS1751>. A low-computational and fast algorithm call dSTEM
is introduced to detect change points based on the STEM algorithm in D. Cheng and A. Schwartzman (2017) <doi:10.1214/16-AOS1458>.
With the functions in this package you can check the validity of the following financial instrument identifiers: FIGI (Financial Instrument Global Identifier <https://www.openfigi.com/about/figi>), CUSIP (Committee on Uniform Security Identification Procedures <https://www.cusip.com/identifiers.html#/CUSIP>), ISIN (International Securities Identification Number <https://www.cusip.com/identifiers.html#/ISIN>), SEDOL (Stock Exchange Daily Official List <https://www2.lseg.com/SEDOL-masterfile-service-tech-guide-v8.6>). You can also calculate the FIGI checksum of 11-character strings, which can be useful if you want to create your own FIGI identifiers.
Fit penalized multivariable linear mixed models with a single random effect to control for population structure in genetic association studies. The goal is to simultaneously fit many genetic variants at the same time, in order to select markers that are independently associated with the response. Can also handle prior annotation information, for example, rare variants, in the form of variable weights. For more information, see the website below and the accompanying paper: Bhatnagar et al., "Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models", 2020, <DOI:10.1371/journal.pgen.1008766>.
This is the central location for data and tools for the development, maintenance, analysis, and deployment of the International Soil Radiocarbon Database (ISRaD
). ISRaD
was developed as a collaboration between the U.S. Geological Survey Powell Center and the Max Planck Institute for Biogeochemistry. This R package provides tools for accessing and manipulating ISRaD
data, compiling local data using the ISRaD
data structure, and simple query and reporting functions for ISRaD
. For more detailed information visit the ISRaD
website at: <https://soilradiocarbon.org/>.
Reproduces the harmonized DB of the ESTAT survey of the same name. The survey data is served as separate spreadsheets with noticeable differences in the collected attributes. The tool here presented carries out a series of instructions that harmonize the attributes in terms of name, meaning, and occurrence, while also introducing a series of new variables, instrumental to adding value to the product. Outputs include one harmonized table with all the years, and three separate geometries, corresponding to the theoretical point, the gps location where the measurement was made and the 250m east-facing transect.
To assist biological researchers in assembling taxonomically and marker focused molecular sequence data sets. MACER accepts a list of genera as a user input and uses NCBI-GenBank
and BOLD as resources to download and assemble molecular sequence datasets. These datasets are then assembled by marker, aligned, trimmed, and cleaned. The use of this package allows the publication of specific parameters to ensure reproducibility. The MACER package has four core functions and an example run through using all of these functions can be found in the associated repository <https://github.com/rgyoung6/MACER_example>.
An implementation of the nodiv algorithm, see Borregaard, M.K., Rahbek, C., Fjeldsaa, J., Parra, J.L., Whittaker, R.J. & Graham, C.H. 2014. Node-based analysis of species distributions. Methods in Ecology and Evolution 5(11): 1225-1235. <DOI:10.1111/2041-210X.12283>. Package for phylogenetic analysis of species distributions. The main function goes through each node in the phylogeny, compares the distributions of the two descendant nodes, and compares the result to a null model. This highlights nodes where major distributional divergence have occurred. The distributional divergence for these nodes is mapped.
This package provides an implementation of a rare variant association test that utilizes protein tertiary structure to increase signal and to identify likely causal variants. Performs structure-guided collapsing, which leads to local tests that borrow information from neighboring variants on a protein and that provide association information on a variant-specific level. For details of the implemented method see West, R. M., Lu, W., Rotroff, D. M., Kuenemann, M., Chang, S-M., Wagner M. J., Buse, J. B., Motsinger-Reif, A., Fourches, D., and Tzeng, J-Y. (2019) <doi:10.1371/journal.pcbi.1006722>.
Combining a generalized linear model with an additional tree part on the same scale. A four-step procedure is proposed to fit the model and test the joint effect of the selected tree part while adjusting on confounding factors. We also proposed an ensemble procedure based on the bagging to improve prediction accuracy and computed several scores of importance for variable selection. See Cyprien Mbogning et al.'(2014)<doi:10.1186/2043-9113-4-6> and Cyprien Mbogning et al.'(2015)<doi:10.1159/000380850> for an overview of all the methods implemented in this package.
This package provides a utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>.
Create and manipulate numeric list ('nlist') objects. An nlist is an S3 list of uniquely named numeric objects. An numeric object is an integer or double vector, matrix or array. An nlists object is a S3 class list of nlist objects with the same names, dimensionalities and typeofs. Numeric list objects are of interest because they are the raw data inputs for analytic engines such as JAGS', STAN and TMB'. Numeric lists objects, which are useful for storing multiple realizations of of simulated data sets, can be converted to coda::mcmc and coda::mcmc.list objects.
It includes functions for applying methodologies utilized for single-process kinetic analysis of solid-state processes were recently summarized and described in the Recommendation of ICTAC Kinetic Committee. These methods work with the basic kinetic equation. The Methodologies included refers to Avrami, Friedman, Kissinger, Ozawa, OFM, Mo, Starink, isoconversional methodology (Vyazovkin) according to ICATAC Kinetics Committee recommendations as reported in Vyazovkin S, Chrissafis K, Di Lorenzo ML, et al. ICTAC Kinetics Committee recommendations for collecting experimental thermal analysis data for kinetic computations. Thermochim Acta. 2014;590:1-23. <doi:10.1016/J.TCA.2014.05.036> .
Implementation of "light" stemmers for French, German, Italian, Spanish, Portuguese, Finnish, Swedish. They are based on the same work as the "light" stemmers found in SolR
<https://lucene.apache.org/solr/> or ElasticSearch
<https://www.elastic.co/fr/products/elasticsearch>. A "light" stemmer consists in removing inflections only for noun and adjectives. Indexing verbs for these languages is not of primary importance compared to nouns and adjectives. The stemming procedure for French is described in (Savoy, 1999) <doi:10.1002/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H>.
DMCFB is a pipeline for identifying differentially methylated cytosines using a Bayesian functional regression model in bisulfite sequencing data. By using a functional regression data model, it tries to capture position-specific, group-specific and other covariates-specific methylation patterns as well as spatial correlation patterns and unknown underlying models of methylation data. It is robust and flexible with respect to the true underlying models and inclusion of any covariates, and the missing values are imputed using spatial correlation between positions and samples. A Bayesian approach is adopted for estimation and inference in the proposed method.
Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical for the identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning
), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization.
The sqldf
function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf
transparently sets up a database, imports the data frames into that database, performs the SQL statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf
or read.csv.sql
functions can also be used to read filtered files into R even if the original files are larger than R itself can handle.
This package provides functions to specify, fit and visualize nested partially-latent class models ( Wu, Deloria-Knoll, Hammitt, and Zeger (2016) <doi:10.1111/rssc.12101>; Wu, Deloria-Knoll, and Zeger (2017) <doi:10.1093/biostatistics/kxw037>; Wu and Chen (2021) <doi:10.1002/sim.8804>) for inference of population disease etiology and individual diagnosis. In the motivating Pneumonia Etiology Research for Child Health (PERCH) study, because both quantities of interest sum to one hundred percent, the PERCH scientists frequently refer to them as population etiology pie and individual etiology pie, hence the name of the package.