Fast algorithms for fitting Bayesian variable selection models and computing Bayes factors, in which the outcome (or response variable) is modeled using a linear regression or a logistic regression. The algorithms are based on the variational approximations described in "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies" (P. Carbonetto & M. Stephens, 2012, <DOI:10.1214/12-BA703>). This software has been applied to large data sets with over a million variables and thousands of samples.
This package provides a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems vtreat defends against: Inf', NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Imports WhatsApp
chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS
phones and on Linux, macOS
and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.
Select hits from synthetic lethal RNAi screen data. For example, there are two identical celllines except one gene is knocked-down in one cellline. The interest is to find genes that lead to stronger lethal effect when they are knocked-down further by siRNA
. Quality control and various visualisation tools are implemented. Four different algorithms could be used to pick up the interesting hits. This package is designed based on 384 wells plates, but may apply to other platforms with proper configuration.
This tool supports analyses on massive phylogenies comprising up to millions of tips. Functions include pruning, rerooting, calculation of most-recent common ancestors, calculating distances from the tree root and calculating pairwise distances. In addition, this tool takes care of calculation of phylogenetic signal and mean trait depth (trait conservatism), ancestral state reconstruction and hidden character prediction of discrete characters, simulating and fitting models of trait evolution, fitting and simulating diversification models, dating trees, comparing trees, and reading/writing trees in Newick format.
Consider an at-most-K-stage group sequential design with only an upper bound for the last analysis and non-binding lower bounds.With binary endpoint, two kinds of test can be applied, asymptotic test based on normal distribution and exact test based on binomial distribution. This package supports the computation of boundaries and conditional power for single-arm group sequential test with binary endpoint, via either asymptotic or exact test. The package also provides functions to obtain boundary crossing probabilities given the design.
This package provides two main functions, il()
and fil()
. The il()
function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068> to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil()
function implements the algorithm proposed by Maity et. al. (2017+) <https://github.com/arnabkrmaity/brlrmr> to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068>.
Laplace approximations and penalized B-splines are combined for fast Bayesian inference in latent Gaussian models. The routines can be used to fit survival models, especially proportional hazards and promotion time cure models (Gressani, O. and Lambert, P. (2018) <doi:10.1016/j.csda.2018.02.007>). The Laplace-P-spline methodology can also be implemented for inference in (generalized) additive models (Gressani, O. and Lambert, P. (2021) <doi:10.1016/j.csda.2020.107088>). See the associated website for more information and examples.
Makes univariate, multivariate, or random fields simulations precise and simple. Just select the desired time series or random fieldsâ properties and it will do the rest. CoSMoS
is based on the framework described in Papalexiou (2018, <doi:10.1016/j.advwatres.2018.02.013>), extended for random fields in Papalexiou and Serinaldi (2020, <doi:10.1029/2019WR026331>), and further advanced in Papalexiou et al. (2021, <doi:10.1029/2020WR029466>) to allow fine-scale space-time simulation of storms (or even cyclone-mimicking fields).
This package provides functions for implementing cmenet - a bi-level variable selection method for conditional main effects (see Mak and Wu (2018) <doi:10.1080/01621459.2018.1448828>). CMEs are reparametrized interaction effects which capture the conditional impact of a factor at a fixed level of another factor. Compared to traditional two-factor interactions, CMEs can quantify more interpretable interaction effects in many problems. The current implementation performs variable selection on only binary CMEs; we are working on an extension for the continuous setting.
Differential geometric least angle regression method for fitting sparse generalized linear models. In this version of the package, the user can fit models specifying Gaussian, Poisson, Binomial, Gamma and Inverse Gaussian family. Furthermore, several link functions can be used to model the relationship between the conditional expected value of the response variable and the linear predictor. The solution curve can be computed using an efficient predictor-corrector or a cyclic coordinate descent algorithm, as described in the paper linked to via the URL below.
This package provides a wrapper on top of the Domino Command-Line Client'. It lets you run Domino commands (e.g., "run", "upload", "download") directly from your R environment. Under the hood, it uses R's system function to run the Domino executable, which must be installed as a prerequisite. Domino is a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features designed for analytical workflows (see <http://www.dominodatalab.com> for more information).
This package provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.
This package provides pipe-style interface for data.table'. Package preserves all data.table features without significant impact on performance. let and take functions are simplified interfaces for most common data manipulation tasks. For example, you can write take(mtcars, mean(mpg), by = am) for aggregation or let(mtcars, hp_wt = hp/wt, hp_wt_mpg = hp_wt/mpg) for modification. Use take_if/let_if for conditional aggregation/modification. Additionally there are some conveniences such as automatic data.frame conversion to data.table'.
This package provides functions to estimate the size-controlled phenotypic integration index, a novel method by Torices & Méndez (2014) to solve problems due to individual size when estimating integration (namely, larger individuals have larger components, which will drive a correlation between components only due to resource availability that might obscure the observed measures of integration). In addition, the package also provides the classical estimation by Wagner (1984), bootstrapping and jackknife methods to calculate confidence intervals and a significance test for both integration indices.
Generates cell-level cytokine activity estimates using relevant information from gene sets constructed with the CytoSig
and the Reactome databases and scored using the modified Variance-adjusted Mahalanobis (VAM) framework for single-cell RNA-sequencing (scRNA-seq
) data. CytoSig
database is described in: Jiang at al., (2021) <doi:10.1038/s41592-021-01274-5>. Reactome database is described in: Gillespie et al., (2021) <doi:10.1093/nar/gkab1028>. The VAM method is outlined in: Frost (2020) <doi:10.1093/nar/gkaa582>.
Computes the probability of a set of species abundances of a single or multiple samples of individuals with one or more guilds under a mainland-island model. One must specify the mainland (metacommunity) model and the island (local) community model. It assumes that species fluctuate independently. The package also contains functions to simulate under this model. See Haegeman, B. & R.S. Etienne (2017). A general sampling formula for community structure data. Methods in Ecology & Evolution 8: 1506-1519 <doi:10.1111/2041-210X.12807>.
This package provides tools for processing and evaluating seasonal weather forecasts, with an emphasis on tercile forecasts. We follow the World Meteorological Organization's "Guidance on Verification of Operational Seasonal Climate Forecasts", S.J.Mason (2018, ISBN: 978-92-63-11220-0, URL: <https://library.wmo.int/idurl/4/56227>). The development was supported by the European Unionâ s Horizon 2020 research and innovation programme under grant agreement no. 869730 (CONFER). A comprehensive online tutorial is available at <https://seasonalforecastingengine.github.io/SeaValDoc/>
.
This package provides a method that inherits the standard gene set variation analysis (GSVA) method and also provides the option to use summary statistics from any analysis (disease vs healthy, lesional side vs nonlesional side, etc..) input to define the direction of gene sets used for directional gene set score calculation for a given disease. Note to use this package, GSVA(>= 1.52.1) is needed to pre-installed. Hanzelmann, S., Castelo, R., and Guinney, J. (2013) <doi:10.1186/1471-2105-14-7>.
This package provides methods for constructing and maintaining a database of presentations in R. The presentations are either ones that the user gives or gave or presentations at a particular event or event series. The package also provides a plot method for the interactive mapping of the presentations using leaflet by grouping them according to country, city, year and other presentation attributes. The markers on the map come with popups providing presentation details (title, institution, event, links to materials and events, and so on).
This package provides tools to perform multiple comparison analyses, based on the well-known Tukey's "Honestly Significant Difference" (HSD) test. In models involving interactions, TukeyC
stands out from other R packages by implementing intuitive and easy-to-use functions. In addition to accommodating traditional R methods such as lm()
and aov()
, it has also been extended to objects of the lmer()
class, that is, mixed models with fixed effects. For more details see Tukey (1949) <doi:10.2307/3001913>.
MesKit
provides commonly used analysis and visualization modules based on mutational data generated by multi-region sequencing (MRS). This package allows to depict mutational profiles, measure heterogeneity within or between tumors from the same patient, track evolutionary dynamics, as well as characterize mutational patterns on different levels. Shiny application was also developed for a need of GUI-based analysis. As a handy tool, MesKit
can facilitate the interpretation of tumor heterogeneity and the understanding of evolutionary relationship between regions in MRS study.
This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.
Vulcan (VirtUaL
ChIP-Seq
Analysis through Networks) is a package that interrogates gene regulatory networks to infer cofactors significantly enriched in a differential binding signature coming from ChIP-Seq
data. In order to do so, our package combines strategies from different BioConductor
packages: DESeq for data normalization, ChIPpeakAnno
and DiffBind
for annotation and definition of ChIP-Seq
genomic peaks, csaw to define optimal peak width and viper for applying a regulatory network over a differential binding signature.