This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
This package provides a one-to-one mapping from gene to "best" probe set for four Affymetrix human gene expression microarrays: hgu95av2, hgu133a, hgu133plus2, and u133x3p. On Affymetrix gene expression microarrays, a single gene may be measured by multiple probe sets. This can present a mild conundrum when attempting to evaluate a gene "signature" that is defined by gene names rather than by specific probe sets. This package also includes the pre-calculated probe set quality scores that were used to define the mapping.
This is a package for fast and user-friendly estimation of econometric models with multiple fixed-effects. It includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018). It further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
Select hits from synthetic lethal RNAi screen data. For example, there are two identical celllines except one gene is knocked-down in one cellline. The interest is to find genes that lead to stronger lethal effect when they are knocked-down further by siRNA. Quality control and various visualisation tools are implemented. Four different algorithms could be used to pick up the interesting hits. This package is designed based on 384 wells plates, but may apply to other platforms with proper configuration.
This package provides a systematic biology tool was developed to identify cell infiltration via Individualized Cell-Cell interaction network. CITMIC first constructed a weighted cell interaction network through integrating Cell-target interaction information, molecular function data from Gene Ontology (GO) database and gene transcriptomic data in specific sample, and then, it used a network propagation algorithm on the network to identify cell infiltration for the sample. Ultimately, cell infiltration in the patient dataset was obtained by normalizing the centrality scores of the cells.
Saturation of ionic substances in urine is calculated based on sodium, potassium, calcium, magnesium, ammonia, chloride, phosphate, sulfate, oxalate, citrate, ph, and urate. This program is intended for research use, only. The code within is translated from EQUIL2 Visual Basic code based on Werness, et al (1985) "EQUIL2: a BASIC computer program for the calculation of urinary saturation" <doi:10.1016/s0022-5347(17)47703-2> to R. The Visual Basic code was kindly provided by Dr. John Lieske of the Mayo Clinic.
The propensity score is one of the most widely used tools in studying the causal effect of a treatment, intervention, or policy. Given that the propensity score is usually unknown, it has to be estimated, implying that the reliability of many treatment effect estimators depends on the correct specification of the (parametric) propensity score. This package implements the data-driven nonparametric diagnostic tools for detecting propensity score misspecification proposed by Sant'Anna and Song (2019) <doi:10.1016/j.jeconom.2019.02.002>.
This package provides methods for estimation of mean- and quantile-optimal treatment regimes from censored data. Specifically, we have developed distinct functions for three types of right censoring for static treatment using quantile criterion: (1) independent/random censoring, (2) treatment-dependent random censoring, and (3) covariates-dependent random censoring. It also includes a function to estimate quantile-optimal dynamic treatment regimes for independent censored data. Finally, this package also includes a simulation data generative model of a dynamic treatment experiment proposed in literature.
Assists in analyzing the lying behavior of cows from raw data recorded with a triaxial accelerometer attached to the hind leg of a cow. Allows the determination of common measures for lying behavior including total lying duration, the number of lying bouts, and the mean duration of lying bouts. Further capabilities are the description of lying laterality and the calculation of proxies for the level of physical activity of the cow. Reference: Simmler M., Brouwers S. P. (2024) <doi:10.7717/peerj.17036>.
Implement the Tariff algorithm for coding cause-of-death from verbal autopsies. The Tariff method was originally proposed in James et al (2011) <DOI:10.1186/1478-7954-9-31> and later refined as Tariff 2.0 in Serina, et al. (2015) <DOI:10.1186/s12916-015-0527-9>. Note that this package was not developed by authors affiliated with the Institute for Health Metrics and Evaluation and thus unintentional discrepancies may exist between the this implementation and the implementation available from IHME.
Fast algorithms for fitting Bayesian variable selection models and computing Bayes factors, in which the outcome (or response variable) is modeled using a linear regression or a logistic regression. The algorithms are based on the variational approximations described in "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies" (P. Carbonetto & M. Stephens, 2012, <DOI:10.1214/12-BA703>). This software has been applied to large data sets with over a million variables and thousands of samples.
Traces information spread through interactions between features, utilising information theory measures and a higher-order generalisation of the concept of widest paths in graphs. In particular, vistla can be used to better understand the results of high-throughput biomedical experiments, by organising the effects of the investigated intervention in a tree-like hierarchy from direct to indirect ones, following the plausible information relay circuits. Due to its higher-order nature, vistla can handle multi-modality and assign multiple roles to a single feature.
This package provides a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems vtreat defends against: Inf', NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Imports WhatsApp chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.
This package provides a general routine, envMU, which allows estimation of the M envelope of span(U) given root n consistent estimators of M and U. The routine envMU does not presume a model. This package implements response envelopes, partial response envelopes, envelopes in the predictor space, heteroscedastic envelopes, simultaneous envelopes, scaled response envelopes, scaled envelopes in the predictor space, groupwise envelopes, weighted envelopes, envelopes in logistic regression, envelopes in Poisson regression envelopes in function-on-function linear regression, envelope-based Partial Partial Least Squares, envelopes with non-constant error covariance, envelopes with t-distributed errors, reduced rank envelopes and reduced rank envelopes with non-constant error covariance. For each of these model-based routines the package provides inference tools including bootstrap, cross validation, estimation and prediction, hypothesis testing on coefficients are included except for weighted envelopes. Tools for selection of dimension include AIC, BIC and likelihood ratio testing. Background is available at Cook, R. D., Forzani, L. and Su, Z. (2016) <doi:10.1016/j.jmva.2016.05.006>. Optimization is based on a clockwise coordinate descent algorithm.
MesKit provides commonly used analysis and visualization modules based on mutational data generated by multi-region sequencing (MRS). This package allows to depict mutational profiles, measure heterogeneity within or between tumors from the same patient, track evolutionary dynamics, as well as characterize mutational patterns on different levels. Shiny application was also developed for a need of GUI-based analysis. As a handy tool, MesKit can facilitate the interpretation of tumor heterogeneity and the understanding of evolutionary relationship between regions in MRS study.
This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.
Vulcan (VirtUaL ChIP-Seq Analysis through Networks) is a package that interrogates gene regulatory networks to infer cofactors significantly enriched in a differential binding signature coming from ChIP-Seq data. In order to do so, our package combines strategies from different BioConductor packages: DESeq for data normalization, ChIPpeakAnno and DiffBind for annotation and definition of ChIP-Seq genomic peaks, csaw to define optimal peak width and viper for applying a regulatory network over a differential binding signature.
Laplace approximations and penalized B-splines are combined for fast Bayesian inference in latent Gaussian models. The routines can be used to fit survival models, especially proportional hazards and promotion time cure models (Gressani, O. and Lambert, P. (2018) <doi:10.1016/j.csda.2018.02.007>). The Laplace-P-spline methodology can also be implemented for inference in (generalized) additive models (Gressani, O. and Lambert, P. (2021) <doi:10.1016/j.csda.2020.107088>). See the associated website for more information and examples.
This package provides two main functions, il() and fil(). The il() function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068> to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil() function implements the algorithm proposed by Maity et. al. (2017+) <https://github.com/arnabkrmaity/brlrmr> to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068>.
Consider an at-most-K-stage group sequential design with only an upper bound for the last analysis and non-binding lower bounds.With binary endpoint, two kinds of test can be applied, asymptotic test based on normal distribution and exact test based on binomial distribution. This package supports the computation of boundaries and conditional power for single-arm group sequential test with binary endpoint, via either asymptotic or exact test. The package also provides functions to obtain boundary crossing probabilities given the design.
This package provides functions for implementing cmenet - a bi-level variable selection method for conditional main effects (see Mak and Wu (2018) <doi:10.1080/01621459.2018.1448828>). CMEs are reparametrized interaction effects which capture the conditional impact of a factor at a fixed level of another factor. Compared to traditional two-factor interactions, CMEs can quantify more interpretable interaction effects in many problems. The current implementation performs variable selection on only binary CMEs; we are working on an extension for the continuous setting.
Makes univariate, multivariate, or random fields simulations precise and simple. Just select the desired time series or random fieldsâ properties and it will do the rest. CoSMoS is based on the framework described in Papalexiou (2018, <doi:10.1016/j.advwatres.2018.02.013>), extended for random fields in Papalexiou and Serinaldi (2020, <doi:10.1029/2019WR026331>), and further advanced in Papalexiou et al. (2021, <doi:10.1029/2020WR029466>) to allow fine-scale space-time simulation of storms (or even cyclone-mimicking fields).
This package provides a wrapper on top of the Domino Command-Line Client'. It lets you run Domino commands (e.g., "run", "upload", "download") directly from your R environment. Under the hood, it uses R's system function to run the Domino executable, which must be installed as a prerequisite. Domino is a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features designed for analytical workflows (see <http://www.dominodatalab.com> for more information).