Traces information spread through interactions between features, utilising information theory measures and a higher-order generalisation of the concept of widest paths in graphs. In particular, vistla can be used to better understand the results of high-throughput biomedical experiments, by organising the effects of the investigated intervention in a tree-like hierarchy from direct to indirect ones, following the plausible information relay circuits. Due to its higher-order nature, vistla can handle multi-modality and assign multiple roles to a single feature.
Fast algorithms for fitting Bayesian variable selection models and computing Bayes factors, in which the outcome (or response variable) is modeled using a linear regression or a logistic regression. The algorithms are based on the variational approximations described in "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies" (P. Carbonetto & M. Stephens, 2012, <DOI:10.1214/12-BA703>). This software has been applied to large data sets with over a million variables and thousands of samples.
This package provides a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems vtreat defends against: Inf', NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Imports WhatsApp
chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS
phones and on Linux, macOS
and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.
This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
This package provides a one-to-one mapping from gene to "best" probe set for four Affymetrix human gene expression microarrays: hgu95av2, hgu133a, hgu133plus2, and u133x3p. On Affymetrix gene expression microarrays, a single gene may be measured by multiple probe sets. This can present a mild conundrum when attempting to evaluate a gene "signature" that is defined by gene names rather than by specific probe sets. This package also includes the pre-calculated probe set quality scores that were used to define the mapping.
This is a package for fast and user-friendly estimation of econometric models with multiple fixed-effects. It includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018). It further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
This package provides a general routine, envMU
, which allows estimation of the M envelope of span(U) given root n consistent estimators of M and U. The routine envMU
does not presume a model. This package implements response envelopes, partial response envelopes, envelopes in the predictor space, heteroscedastic envelopes, simultaneous envelopes, scaled response envelopes, scaled envelopes in the predictor space, groupwise envelopes, weighted envelopes, envelopes in logistic regression, envelopes in Poisson regression envelopes in function-on-function linear regression, envelope-based Partial Partial Least Squares, envelopes with non-constant error covariance, envelopes with t-distributed errors, reduced rank envelopes and reduced rank envelopes with non-constant error covariance. For each of these model-based routines the package provides inference tools including bootstrap, cross validation, estimation and prediction, hypothesis testing on coefficients are included except for weighted envelopes. Tools for selection of dimension include AIC, BIC and likelihood ratio testing. Background is available at Cook, R. D., Forzani, L. and Su, Z. (2016) <doi:10.1016/j.jmva.2016.05.006>. Optimization is based on a clockwise coordinate descent algorithm.
Consider an at-most-K-stage group sequential design with only an upper bound for the last analysis and non-binding lower bounds.With binary endpoint, two kinds of test can be applied, asymptotic test based on normal distribution and exact test based on binomial distribution. This package supports the computation of boundaries and conditional power for single-arm group sequential test with binary endpoint, via either asymptotic or exact test. The package also provides functions to obtain boundary crossing probabilities given the design.
This package provides two main functions, il()
and fil()
. The il()
function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068> to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil()
function implements the algorithm proposed by Maity et. al. (2017+) <https://github.com/arnabkrmaity/brlrmr> to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068>.
Laplace approximations and penalized B-splines are combined for fast Bayesian inference in latent Gaussian models. The routines can be used to fit survival models, especially proportional hazards and promotion time cure models (Gressani, O. and Lambert, P. (2018) <doi:10.1016/j.csda.2018.02.007>). The Laplace-P-spline methodology can also be implemented for inference in (generalized) additive models (Gressani, O. and Lambert, P. (2021) <doi:10.1016/j.csda.2020.107088>). See the associated website for more information and examples.
This package provides functions for implementing cmenet - a bi-level variable selection method for conditional main effects (see Mak and Wu (2018) <doi:10.1080/01621459.2018.1448828>). CMEs are reparametrized interaction effects which capture the conditional impact of a factor at a fixed level of another factor. Compared to traditional two-factor interactions, CMEs can quantify more interpretable interaction effects in many problems. The current implementation performs variable selection on only binary CMEs; we are working on an extension for the continuous setting.
Makes univariate, multivariate, or random fields simulations precise and simple. Just select the desired time series or random fieldsâ properties and it will do the rest. CoSMoS
is based on the framework described in Papalexiou (2018, <doi:10.1016/j.advwatres.2018.02.013>), extended for random fields in Papalexiou and Serinaldi (2020, <doi:10.1029/2019WR026331>), and further advanced in Papalexiou et al. (2021, <doi:10.1029/2020WR029466>) to allow fine-scale space-time simulation of storms (or even cyclone-mimicking fields).
This package provides a wrapper on top of the Domino Command-Line Client'. It lets you run Domino commands (e.g., "run", "upload", "download") directly from your R environment. Under the hood, it uses R's system function to run the Domino executable, which must be installed as a prerequisite. Domino is a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features designed for analytical workflows (see <http://www.dominodatalab.com> for more information).
Differential geometric least angle regression method for fitting sparse generalized linear models. In this version of the package, the user can fit models specifying Gaussian, Poisson, Binomial, Gamma and Inverse Gaussian family. Furthermore, several link functions can be used to model the relationship between the conditional expected value of the response variable and the linear predictor. The solution curve can be computed using an efficient predictor-corrector or a cyclic coordinate descent algorithm, as described in the paper linked to via the URL below.
This package provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.
This package provides pipe-style interface for data.table'. Package preserves all data.table features without significant impact on performance. let and take functions are simplified interfaces for most common data manipulation tasks. For example, you can write take(mtcars, mean(mpg), by = am) for aggregation or let(mtcars, hp_wt = hp/wt, hp_wt_mpg = hp_wt/mpg) for modification. Use take_if/let_if for conditional aggregation/modification. Additionally there are some conveniences such as automatic data.frame conversion to data.table'.
Computes the probability of a set of species abundances of a single or multiple samples of individuals with one or more guilds under a mainland-island model. One must specify the mainland (metacommunity) model and the island (local) community model. It assumes that species fluctuate independently. The package also contains functions to simulate under this model. See Haegeman, B. & R.S. Etienne (2017). A general sampling formula for community structure data. Methods in Ecology & Evolution 8: 1506-1519 <doi:10.1111/2041-210X.12807>.
This package provides tools for processing and evaluating seasonal weather forecasts, with an emphasis on tercile forecasts. We follow the World Meteorological Organization's "Guidance on Verification of Operational Seasonal Climate Forecasts", S.J.Mason (2018, ISBN: 978-92-63-11220-0, URL: <https://library.wmo.int/idurl/4/56227>). The development was supported by the European Unionâ s Horizon 2020 research and innovation programme under grant agreement no. 869730 (CONFER). A comprehensive online tutorial is available at <https://seasonalforecastingengine.github.io/SeaValDoc/>
.
Generates cell-level cytokine activity estimates using relevant information from gene sets constructed with the CytoSig
and the Reactome databases and scored using the modified Variance-adjusted Mahalanobis (VAM) framework for single-cell RNA-sequencing (scRNA-seq
) data. CytoSig
database is described in: Jiang at al., (2021) <doi:10.1038/s41592-021-01274-5>. Reactome database is described in: Gillespie et al., (2021) <doi:10.1093/nar/gkab1028>. The VAM method is outlined in: Frost (2020) <doi:10.1093/nar/gkaa582>.
This package provides a method that inherits the standard gene set variation analysis (GSVA) method and also provides the option to use summary statistics from any analysis (disease vs healthy, lesional side vs nonlesional side, etc..) input to define the direction of gene sets used for directional gene set score calculation for a given disease. Note to use this package, GSVA(>= 1.52.1) is needed to pre-installed. Hanzelmann, S., Castelo, R., and Guinney, J. (2013) <doi:10.1186/1471-2105-14-7>.
This package provides methods for constructing and maintaining a database of presentations in R. The presentations are either ones that the user gives or gave or presentations at a particular event or event series. The package also provides a plot method for the interactive mapping of the presentations using leaflet by grouping them according to country, city, year and other presentation attributes. The markers on the map come with popups providing presentation details (title, institution, event, links to materials and events, and so on).
This package provides tools to perform multiple comparison analyses, based on the well-known Tukey's "Honestly Significant Difference" (HSD) test. In models involving interactions, TukeyC
stands out from other R packages by implementing intuitive and easy-to-use functions. In addition to accommodating traditional R methods such as lm()
and aov()
, it has also been extended to objects of the lmer()
class, that is, mixed models with fixed effects. For more details see Tukey (1949) <doi:10.2307/3001913>.
Diagnostic tools based on two-way anova and median-polish residual plots for Bicluster output obtained from packages; "biclust" by Kaiser et al.(2008),"isa2" by Csardi et al. (2010) and "fabia" by Hochreiter et al. (2010). Moreover, It provides visualization tools for bicluster output and corresponding non-bicluster rows- or columns outcomes. It has also extended the idea of Kaiser et al.(2008) which is, extracting bicluster output in a text format, by adding two bicluster methods from the fabia and isa2 R packages.