Reads/write binary genotype file compatible with PLINK <https://www.cog-genomics.org/plink/1.9/input#bed> into/from a R matrix; traverse genotype data one windows of variants at a time, like apply()
or a for loop; reads/writes genotype relatedness/kinship matrices created by PLINK <https://www.cog-genomics.org/plink/1.9/distance#make_rel> or GCTA <https://cnsgenomics.com/software/gcta/#MakingaGRM>
into/from a R square matrix. It is best used for bringing data produced by PLINK and GCTA into R workflow.
Given a sample with additive measurement error, the package estimates the deconvolution density - that is, the density of the underlying distribution of the sample without measurement error. The method maximises the log-likelihood of the estimated density, plus a quadratic smoothness penalty. The distribution of the measurement error can be either a known family, or can be estimated from a "pure error" sample. For known error distributions, the package supports Normal, Laplace or Beta distributed error. For unknown error distribution, a pure error sample independent from the data is used.
Quickly and flexibly calculates weights for survey data, in order to correct for survey non-response or other sampling issues. Uses rake weighting, a common technique also know as rim weighting or iterative proportional fitting. This technique allows for weighting on multiple variables, even when the interlocked distribution of the two variables is not known. Interacts with Thomas Lumley's survey package, as described in Lumley, Thomas (2011, ISBN:978-1-118-21093-2). Adds additional functionality, more adaptable syntax, and error-checking to the base weighting functionality in survey.'.
This package provides functions for the evaluation of surrogate endpoints when both the surrogate and the true endpoint are failure time variables. The approaches implemented are: (1) the two-step approach (Burzykowski et al, 2001) <DOI:10.1111/1467-9876.00244> with a copula model (Clayton, Plackett, Hougaard) at the first step and either a linear regression of log-hazard ratios at the second step (either adjusted or not for measurement error); (2) mixed proportional hazard models estimated via mixed Poisson GLM (Rotolo et al, 2017 <DOI:10.1177/0962280217718582>).
Statistics students often have problems understanding the relation between a random variable's true scale and its z-values. To allow instructors to better better visualize histograms for these students, the package provides histograms with two horizontal axis containing z-values and the true scale of the variable. The function TeachHistDens()
provides a density histogram with two axis. TeachHistCounts()
and TeachHistRelFreq()
are variations for count and relative frequency histograms, respectively. TeachConfInterv()
and TeachHypTest()
help instructors to visualize confidence levels and the results of hypothesis tests.
This package provides a collection of functions to make R a more effective viewscape analysis tool for calculating viewscape metrics based on computing the viewable area for given a point/multiple viewpoints and a digital elevation model.The method of calculating viewscape metrics implemented in this package are based on the work of Tabrizian et al. (2020) <doi:10.1016/j.landurbplan.2019.103704>. The algorithm of computing viewshed is based on the work of Franklin & Ray. (1994) <https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=555780f6f5d7e537eb1edb28862c86d1519af2be>.
This package is intended to fill the role of conventional cytometry pre-processing software, for spectral decomposition, transformation, visualization and cleanup, and to aid further downstream analyses, such as with DepecheR
, by enabling transformation of flowFrames
and flowSets
to dataframes. Functions for flowCore-compliant
automatic 1D-gating/filtering are in the pipe line. The package name has been chosen both as it will deal with spectral cytometry and as it will hopefully give the user a nice pair of spectacles through which to view their data.
scDDboost
is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost
benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost
treats gene-level expression data within each condition as a mixture of negative binomial distributions.
The googleVis
package provides an interface between R and the Google Charts API. Google Charts offer interactive charts which can be embedded into web pages. The functions of the googleVis
package allow the user to visualise data stored in R data frames with Google Charts without uploading the data to Google. The output of a googleVis
function is HTML code that contains the data and references to JavaScript functions hosted by Google. googleVis
makes use of the internal R HTTP server to display the output locally.
This package provides an R Client for the Europe PubMed Central RESTful Web Service. It gives access to both metadata on life science literature and open access full texts. Europe PMC indexes all PubMed content and other literature sources including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents. In addition to bibliographic metadata, the client allows users to fetch citations and reference lists. Links between life-science literature and other EBI databases, including ENA, PDB or ChEMBL are also accessible.
This package implements a method for identifying and removing the cell-cycle effect from scRNA-Seq
data. The description of the method is in Barron M. and Li J. (2016) <doi:10.1038/srep33892>. Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data. Submitted. Different from previous methods, ccRemover
implements a mechanism that formally tests whether a component is cell-cycle related or not, and thus while it often thoroughly removes the cell-cycle effect, it preserves other features/signals of interest in the data.
Alpha and beta diversity for taxonomic (TD), functional (FD), and phylogenetic (PD) dimensions based on rasters. Spatial and temporal beta diversity can be partitioned into replacement and richness difference components. It also calculates standardized effect size for FD and PD alpha diversity and the average individual traits across multilayer rasters. The layers of the raster represent species, while the cells represent communities. Methods details can be found at Cardoso et al. 2022 <https://CRAN.R-project.org/package=BAT> and Heming et al. 2023 <https://CRAN.R-project.org/package=SESraster>.
Data quality assessments guided by a data quality framework introduced by Schmidt and colleagues, 2021 <doi:10.1186/s12874-021-01252-7> target the data quality dimensions integrity, completeness, consistency, and accuracy. The scope of applicable functions rests on the availability of extensive metadata which can be provided in spreadsheet tables. Either standardized (e.g. as html5 reports) or individually tailored reports can be generated. For an introduction into the specification of corresponding metadata, please refer to the package website <https://dataquality.qihs.uni-greifswald.de/VIN_Annotation_of_Metadata.html>.
This package implements the methods of McGrath
et al. (2020) <doi:10.1177/0962280219889080> and Cai et al. (2021) <doi:10.1177/09622802211047348> for estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. These methods can be applied to studies that report the sample median, sample size, and one or both of (i) the sample minimum and maximum values and (ii) the first and third quartiles. The corresponding standard error estimators described by McGrath
et al. (2023) <doi:10.1177/09622802221139233> are also included.
Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in finance, neuroscience and other disciplines. Two algorithms to estimate the distribution parameters, quantiles, value-at-risk and expected shortfall. IMPORTANT: Standardization has been changed in versions >= 2.0.0 to get sd = 1 when kappa = Inf rather than 2*pi/sqrt(3) in versions <= 1.8.6. This affects parameter g (other parameters stay unchanged). Do not update if you need consistent comparisons with previous results for the g parameter.
Make R scripts reproducible, by ensuring that every time a given script is run, the same version of the used packages are loaded (instead of whichever version the user running the script happens to have installed). This is achieved by using the command groundhog.library()
instead of the base command library()
, and including a date in the call. The date is used to call on the same version of the package every time (the most recent version available at that date). Load packages from CRAN, GitHub
, or Gitlab.
This package provides a collection of functions for working with time series data, including functions for drawing, decomposing, and forecasting. Includes capabilities to compare multiple series and fit both additive and multiplicative models. Used by iNZight
', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Holt (1957) <doi:10.1016/j.ijforecast.2003.09.015>, Winters (1960) <doi:10.1287/mnsc.6.3.324>, Cleveland, Cleveland, & Terpenning (1990) "STL: A Seasonal-Trend Decomposition Procedure Based on Loess".
Quantify the causal effect of a binary exposure on a binary outcome with adjustment for multiple biases. The functions can simultaneously adjust for any combination of uncontrolled confounding, exposure/outcome misclassification, and selection bias. The underlying method generalizes the concept of combining inverse probability of selection weighting with predictive value weighting. Simultaneous multi-bias analysis can be used to enhance the validity and transparency of real-world evidence obtained from observational, longitudinal studies. Based on the work from Paul Brendel, Aracelis Torres, and Onyebuchi Arah (2023) <doi:10.1093/ije/dyad001>.
Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the taxmap format defined by the taxa package. The metacoder package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.
Optimal scaling of a data vector, relative to a set of targets, is obtained through a least-squares transformation subject to appropriate measurement constraints. The targets are usually predicted values from a statistical model. If the data are nominal level, then the transformation must be identity-preserving. If the data are ordinal level, then the transformation must be monotonic. If the data are discrete, then tied data values must remain tied in the optimal transformation. If the data are continuous, then tied data values can be untied in the optimal transformation.
Computation and visualization of Taxicab Correspondence Analysis, Choulakian (2006) <doi:10.1007/s11336-004-1231-4>. Classical correspondence analysis (CA) is a statistical method to analyse 2-dimensional tables of positive numbers and is typically applied to contingency tables (Benzecri, J.-P. (1973). L'Analyse des Donnees. Volume II. L'Analyse des Correspondances. Paris, France: Dunod). Classical CA is based on the Euclidean distance. Taxicab CA is like classical CA but is based on the Taxicab or Manhattan distance. For some tables, Taxicab CA gives more informative results than classical CA.
Chromatin segmentation analysis transforms ChIP-seq
data into signals over the genome. The latter represents the observed states in a multivariate Markov model to predict the chromatin's underlying states. ChromHMM
, written in Java, integrates histone modification datasets to learn the chromatin states de-novo. The goal of this package is to call chromHMM
from within R, capture the output files in an S4 object and interface to other relevant Bioconductor analysis tools. In addition, segmenter provides functions to test, select and visualize the output of the segmentation.
TEKRABber is made to provide a user-friendly pipeline for comparing orthologs and transposable elements (TEs) between two species. It considers the orthology confidence between two species from BioMart
to normalize expression counts and detect differentially expressed orthologs/TEs. Then it provides one to one correlation analysis for desired orthologs and TEs. There is also an app function to have a first insight on the result. Users can prepare orthologs/TEs RNA-seq expression data by their own preference to run TEKRABber following the data structure mentioned in the vignettes.
This package provides methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.