Estimate the receiver operating characteristic (ROC) curve, area under the curve (AUC) and optimal cut-off points for individual classification taking into account complex sampling designs when working with complex survey data. Methods implemented in this package are described in: A. Iparragirre, I. Barrio, I. Arostegui (2024) <doi:10.1002/sta4.635>; A. Iparragirre, I. Barrio, J. Aramendi, I. Arostegui (2022) <doi:10.2436/20.8080.02.121>; A. Iparragirre, I. Barrio (2024) <doi:10.1007/978-3-031-65723-8_7>.
Calculates total survey error (TSE) for one or more surveys, using both scale-dependent and scale-independent metrics. Package works directly from the data set, with no hand calculations required: just upload a properly structured data set (see TESTIND and its documentation), properly input column names (see functions documentation), and run your functions. For more on TSE, see: Weisberg, Herbert (2005, ISBN:0-226-89128-3); Biemer, Paul (2010) <doi:10.1093/poq/nfq058>; Biemer, Paul et.al. (2017, ISBN:9781119041672); etc.
In diagnostic contexts, individuals are often assessed using multiple tests that measure the same latent variable (e.g., intelligence). These test scores are typically not exactly identical. Simple averaging neglects the correlation between tests and the reduced variance of their combination. The unifyR package provides functions to compute statistically accurate unified scores, reliabilities and validities of multiple tests. The underlying algorithms build on and extend the method proposed by Evans (1996, <DOI:10.3758/BF03204767>) and have been validated through simulations.
This package implements functions for varying coefficient meta-analysis methods. These methods do not assume effect size homogeneity. Subgroup effect size comparisons, general linear effect size contrasts, and linear models of effect sizes based on varying coefficient methods can be used to describe effect size heterogeneity. Varying coefficient meta-analysis methods do not require the unrealistic assumptions of the traditional fixed-effect and random-effects meta-analysis methods. For details see: Statistical Methods for Psychologists, Volume 5, <https://dgbonett.sites.ucsc.edu/>.
Vector binary tree provides a new data structure, to make your data visiting and management more efficient. If the data has structured column names, it can read these names and factorize them through specific split pattern, then build the mappings within double list, vector binary tree, array and tensor mutually, through which the batched data processing is achievable easily. The methods of array and tensor are also applicable. Detailed methods are described in Chen Zhang et al. (2020) <doi:10.35566/isdsa2019c8>.
Enables interaction with the National Weather Service application programming web-interface for fetching of real-time and forecast meteorological data. Users can provide latitude and longitude, Automated Surface Observing System identifier, or Automated Weather Observing System identifier to fetch recent weather observations and recent forecasts for the given location or station. Additionally, auxiliary functions exist to identify stations nearest to a point, convert wind direction from character to degrees, and fetch active warnings. Results are returned as simple feature objects whenever possible.
This package provides probe-level data for 20 HGU133A and 20 HGU133B arrays which are a subset of arrays from a large ALL study. The data is for the MLL arrays. This data was published in Mary E. Ross, Xiaodong Zhou, Guangchun Song, Sheila A. Shurtleff, Kevin Girtman, W. Kent Williams, Hsi-Che Liu, Rami Mahfouz, Susana C. Raimondi, Noel Lenny, Anami Patel, and James R. Downing (2003) Classification of pediatric acute lymphoblastic leukemia by gene expression profiling Blood 102: 2951-2959.
CIMICE is a tool in the field of tumor phylogenetics and its goal is to build a Markov Chain (called Cancer Progression Markov Chain, CPMC) in order to model tumor subtypes evolution. The input of CIMICE is a Mutational Matrix, so a boolean matrix representing altered genes in a collection of samples. These samples are assumed to be obtained with single-cell DNA analysis techniques and the tool is specifically written to use the peculiarities of this data for the CMPC construction.
This tool supports analyses on massive phylogenies comprising up to millions of tips. Functions include pruning, rerooting, calculation of most-recent common ancestors, calculating distances from the tree root and calculating pairwise distances. In addition, this tool takes care of calculation of phylogenetic signal and mean trait depth (trait conservatism), ancestral state reconstruction and hidden character prediction of discrete characters, simulating and fitting models of trait evolution, fitting and simulating diversification models, dating trees, comparing trees, and reading/writing trees in Newick format.
An interface for performing all stages of ADMIXTOOLS analyses (<https://reich.hms.harvard.edu/software>) entirely from R. Wrapper functions (D, f4, f3, etc.) completely automate the generation of intermediate configuration files, run ADMIXTOOLS programs on the command-line, and parse output files to extract values of interest. This allows users to focus on the analysis itself instead of worrying about low-level technical details. A set of complementary functions for processing and filtering of data in the EIGENSTRAT format is also provided.
This package provides a systematic biology tool was developed to identify cell infiltration via Individualized Cell-Cell interaction network. CITMIC first constructed a weighted cell interaction network through integrating Cell-target interaction information, molecular function data from Gene Ontology (GO) database and gene transcriptomic data in specific sample, and then, it used a network propagation algorithm on the network to identify cell infiltration for the sample. Ultimately, cell infiltration in the patient dataset was obtained by normalizing the centrality scores of the cells.
Saturation of ionic substances in urine is calculated based on sodium, potassium, calcium, magnesium, ammonia, chloride, phosphate, sulfate, oxalate, citrate, ph, and urate. This program is intended for research use, only. The code within is translated from EQUIL2 Visual Basic code based on Werness, et al (1985) "EQUIL2: a BASIC computer program for the calculation of urinary saturation" <doi:10.1016/s0022-5347(17)47703-2> to R. The Visual Basic code was kindly provided by Dr. John Lieske of the Mayo Clinic.
The propensity score is one of the most widely used tools in studying the causal effect of a treatment, intervention, or policy. Given that the propensity score is usually unknown, it has to be estimated, implying that the reliability of many treatment effect estimators depends on the correct specification of the (parametric) propensity score. This package implements the data-driven nonparametric diagnostic tools for detecting propensity score misspecification proposed by Sant'Anna and Song (2019) <doi:10.1016/j.jeconom.2019.02.002>.
This package provides methods for estimation of mean- and quantile-optimal treatment regimes from censored data. Specifically, we have developed distinct functions for three types of right censoring for static treatment using quantile criterion: (1) independent/random censoring, (2) treatment-dependent random censoring, and (3) covariates-dependent random censoring. It also includes a function to estimate quantile-optimal dynamic treatment regimes for independent censored data. Finally, this package also includes a simulation data generative model of a dynamic treatment experiment proposed in literature.
Implement the Tariff algorithm for coding cause-of-death from verbal autopsies. The Tariff method was originally proposed in James et al (2011) <DOI:10.1186/1478-7954-9-31> and later refined as Tariff 2.0 in Serina, et al. (2015) <DOI:10.1186/s12916-015-0527-9>. Note that this package was not developed by authors affiliated with the Institute for Health Metrics and Evaluation and thus unintentional discrepancies may exist between the this implementation and the implementation available from IHME.
Assists in analyzing the lying behavior of cows from raw data recorded with a triaxial accelerometer attached to the hind leg of a cow. Allows the determination of common measures for lying behavior including total lying duration, the number of lying bouts, and the mean duration of lying bouts. Further capabilities are the description of lying laterality and the calculation of proxies for the level of physical activity of the cow. Reference: Simmler M., Brouwers S. P. (2024) <doi:10.7717/peerj.17036>.
Fast algorithms for fitting Bayesian variable selection models and computing Bayes factors, in which the outcome (or response variable) is modeled using a linear regression or a logistic regression. The algorithms are based on the variational approximations described in "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies" (P. Carbonetto & M. Stephens, 2012, <DOI:10.1214/12-BA703>). This software has been applied to large data sets with over a million variables and thousands of samples.
This package provides a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems vtreat defends against: Inf', NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Traces information spread through interactions between features, utilising information theory measures and a higher-order generalisation of the concept of widest paths in graphs. In particular, vistla can be used to better understand the results of high-throughput biomedical experiments, by organising the effects of the investigated intervention in a tree-like hierarchy from direct to indirect ones, following the plausible information relay circuits. Due to its higher-order nature, vistla can handle multi-modality and assign multiple roles to a single feature.
Imports WhatsApp chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.
Select hits from synthetic lethal RNAi screen data. For example, there are two identical celllines except one gene is knocked-down in one cellline. The interest is to find genes that lead to stronger lethal effect when they are knocked-down further by siRNA. Quality control and various visualisation tools are implemented. Four different algorithms could be used to pick up the interesting hits. This package is designed based on 384 wells plates, but may apply to other platforms with proper configuration.
This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
This package provides a one-to-one mapping from gene to "best" probe set for four Affymetrix human gene expression microarrays: hgu95av2, hgu133a, hgu133plus2, and u133x3p. On Affymetrix gene expression microarrays, a single gene may be measured by multiple probe sets. This can present a mild conundrum when attempting to evaluate a gene "signature" that is defined by gene names rather than by specific probe sets. This package also includes the pre-calculated probe set quality scores that were used to define the mapping.
This is a package for fast and user-friendly estimation of econometric models with multiple fixed-effects. It includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018). It further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.