This package provides raw files recorded on different Liquid Chromatography Mass Spectrometry (LC-MS) instruments. All included MS instruments are manufactured by Thermo Fisher Scientific and belong to the Orbitrap Tribrid or Q Exactive Orbitrap family of instruments. Despite their common origin and shared hardware components, e.g., Orbitrap mass analyser, the above instruments tend to write data in different "dialects" in a shared binary file format (.raw). The intention behind tartare is to provide complex but slim real-world files that can be used to make code robust with respect to this diversity. In other words, it is intended for enhanced unit testing. The package is considered to be used with the rawrr package and the Spectra MsBackends.
For spatial data analysis; provides exploratory spatial analysis tools, spatial regression, spatial econometric, and disease mapping models, model diagnostics, and special methods for inference with small area survey data (e.g., the America Community Survey (ACS)) and censored population health monitoring data. Models are pre-specified using the Stan programming language, a platform for Bayesian inference using Markov chain Monte Carlo (MCMC). References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Donegan (2021) <doi:10.31219/osf.io/3ey65>; Donegan (2022) <doi:10.21105/joss.04716>; Donegan, Chun and Hughes (2020) <doi:10.1016/j.spasta.2020.100450>; Donegan, Chun and Griffith (2021) <doi:10.3390/ijerph18136856>; Morris et al. (2019) <doi:10.1016/j.sste.2019.100301>.
This package implements Latent Unknown Clusters By Integrating Multi-omics Data (LUCID; Peng (2019) <doi:10.1093/bioinformatics/btz667>) for integrative clustering with exposures, multi-omics data, and health outcomes. Supports three integration strategies: early, parallel, and serial. Provides model fitting and tuning, lasso-type regularization for exposure and omics feature selection, handling of missing data, including both sporadic and complete-case patterns, prediction, and g-computation for estimating causal effects of exposures, bootstrap inference for uncertainty estimation, and S3 summary and plot methods. For the multi-omics integration framework, see Jia (2024) <https://journal.r-project.org/articles/RJ-2024-012/RJ-2024-012.pdf>. For the missing-data imputation mechanism, see Jia (2024) <doi:10.1093/bioadv/vbae123>.
SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>.
This package provides a collection of functions for estimating spatial and spatio-temporal regression models. Moran eigenvectors are used as spatial basis functions to efficiently approximate spatially dependent Gaussian processes (i.e., random effects eigenvector spatial filtering; see Murakami and Griffith 2015 <doi: 10.1007/s10109-015-0213-7>). The implemented models include linear regression with residual spatial dependence, spatially/spatio-temporally varying coefficient models (Murakami et al., 2017, 2024; <doi:10.1016/j.spasta.2016.12.001>,<doi:10.48550/arXiv.2410.07229>), spatially filtered unconditional quantile regression (Murakami and Seya, 2019 <doi:10.1002/env.2556>), Gaussian and non-Gaussian spatial mixed models through compositionally-warping (Murakami et al. 2021, <doi:10.1016/j.spasta.2021.100520>).
This package provides functions for building cognitive maps based on qualitative data. Inputs are textual sources (articles, transcription of qualitative interviews of agents,...). These sources have been coded using relations and are linked to (i) a table describing the variables (or concepts) used for the coding and (ii) a table describing the sources (typology of agents, ...). Main outputs are Individual Cognitive Maps (ICM), Social Cognitive Maps (all sources or group of sources) and a list of quotes linked to relations. This package is linked to the work done during the PhD of Frederic M. Vanwindekens (CRA-W / UCL) hold the 13 of May 2014 at University of Louvain in collaboration with the Walloon Agricultural Research Centre (project MIMOSA, MOERMAN fund).
"Evolutionary Virtual Education" - evolved - provides multiple tools to help educators (especially at the graduate level or in advanced undergraduate level courses) apply inquiry-based learning in general evolution classes. In particular, the tools provided include functions that simulate evolutionary processes (e.g., genetic drift, natural selection within a single locus) or concepts (e.g. Hardy-Weinberg equilibrium, phylogenetic distribution of traits). More than only simulating, the package also provides tools for students to analyze (e.g., measuring, testing, visualizing) datasets with characteristics that are common to many fields related to evolutionary biology. Importantly, the package is heavily oriented towards providing tools for inquiry-based learning - where students follow scientific practices to actively construct knowledge. For additional details, see package's vignettes.
Deconvolution of thermal decay curves allows you to quantify proportions of biomass components in plant litter. Thermal decay curves derived from thermogravimetric analysis (TGA) are imported, modified, and then modelled in a three- or four- part mixture model using the Fraser-Suzuki function. The output is estimates for weights of pseudo-components corresponding to hemicellulose, cellulose, and lignin. For more information see: Müller-Hagedorn, M. and Bockhorn, H. (2007) <doi:10.1016/j.jaap.2006.12.008>, à rfão, J. J. M. and Figueiredo, J. L. (2001) <doi:10.1016/S0040-6031(01)00634-7>, and Yang, H. and Yan, R. and Chen, H. and Zheng, C. and Lee, D. H. and Liang, D. T. (2006) <doi:10.1021/ef0580117>.
Genotyping arrays enable the direct measurement of an individuals genotype at thousands of markers. plinkQC facilitates genotype quality control for genetic association studies as described by Anderson and colleagues (2010) <doi:10.1038/nprot.2010.116>. It makes PLINK basic statistics (e.g. missing genotyping rates per individual, allele frequencies per genetic marker) and relationship functions accessible from R and generates a per-individual and per-marker quality control report. Individuals and markers that fail the quality control can subsequently be removed to generate a new, clean dataset. Removal of individuals based on relationship status is optimised to retain as many individuals as possible in the study. Additionally, there is a trained classifier to predict genomic ancestry of human samples.
Computes sample size and power for causal inference studies that use propensity score (PS) weighting. Supports continuous, binary, and time-to-event (survival) outcomes under four estimands: average treatment effect (ATE), average treatment effect on the treated (ATT), average treatment effect on the controls (ATC), and average treatment effect on the overlap population (ATO). For continuous and binary outcomes, the asymptotic variance of the Hajek inverse probability weighting estimator is derived under a logit-normal propensity score model, approximated by a Beta distribution matched through the Bhattacharyya overlap coefficient. For survival outcomes, the asymptotic variance of the propensity-score- weighted partial likelihood estimator is used for randomized trials and observational studies. The Schoenfeld formula is also available for randomized trial settings.
This package provides a framework of tools to summarise, visualise, and explore longitudinal data. It builds upon the tidy time series data frames used in the tsibble package, and is designed to integrate within the tidyverse', and tidyverts (for time series) ecosystems. The methods implemented include calculating features for understanding longitudinal data, including calculating summary statistics such as quantiles, medians, and numeric ranges, sampling individual series, identifying individual series representative of a group, and extending the facet system in ggplot2 to facilitate exploration of samples of data. These methods are fully described in the paper "brolgar: An R package to Browse Over Longitudinal Data Graphically and Analytically in R", Nicholas Tierney, Dianne Cook, Tania Prvan (2020) <doi:10.32614/RJ-2022-023>.
The main purpose of this package is to make it easy for userR's to interact with jMetrik an open source application for psychometric analysis. For example it allows useR's to write data frames to file in a format that can be used by jMetrik'. It also allows useR's to read *.jmetrik files (e.g. output from an analysis) for follow-up analysis in R. The *.jmetrik format is a flat file that includes a multiline header and the data as comma separated values. The header includes metadata about the file and one row per variable with the following information in each row: variable name, data type, item scoring, special data codes, and variable label.
In the generalized Roy model, the marginal treatment effect (MTE) can be used as a building block for constructing conventional causal parameters such as the average treatment effect (ATE) and the average treatment effect on the treated (ATT). Given a treatment selection equation and an outcome equation, the function mte() estimates the MTE via the semiparametric local instrumental variables method or the normal selection model. The function mte_at() evaluates MTE at different values of the latent resistance u with a given X = x, and the function mte_tilde_at() evaluates MTE projected onto the estimated propensity score. The function ace() estimates population-level average causal effects such as ATE, ATT, or the marginal policy relevant treatment effect.
Identification of the most appropriate pharmacotherapy for each patient based on genomic alterations is a major challenge in personalized oncology. PANACEA is a collection of personalized anti-cancer drug prioritization approaches utilizing network methods. The methods utilize personalized "driverness" scores from driveR to rank drugs, mapping these onto a protein-protein interaction network. The "distance-based" method scores each drug based on these scores and distances between drugs and genes to rank given drugs. The "RWR" method propagates these scores via a random-walk with restart framework to rank the drugs. The methods are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2023. PANACEA: network-based methods for pharmacotherapy prioritization in personalized oncology. Bioinformatics <doi:10.1093/bioinformatics/btad022>.
Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.
While data from randomized experiments remain the gold standard for causal inference, estimation of causal estimands from observational data is possible through various confounding adjustment methods. However, the challenge of unmeasured confounding remains a concern in causal inference, where failure to account for unmeasured confounders can lead to biased estimates of causal estimands. Sensitivity analysis within the framework of causal inference can help adjust for possible unmeasured confounding. In `causens`, three main methods are implemented: adjustment via sensitivity functions (Brumback, Hernán, Haneuse, and Robins (2004) <doi:10.1002/sim.1657> and Li, Shen, Wu, and Li (2011) <doi:10.1093/aje/kwr096>), Bayesian parametric modelling and Monte Carlo approaches (McCandless, Lawrence C and Gustafson, Paul (2017) <doi:10.1002/sim.7298>).
Techniques from a particular branch of spatial statistics,termed geographically-weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localised calibration provides a better description. GWmodel includes functions to calibrate: GW summary statistics (Brunsdon et al., 2002)<doi: 10.1016/s0198-9715(01)00009-6>, GW principal components analysis (Harris et al., 2011)<doi: 10.1080/13658816.2011.554838>, GW discriminant analysis (Brunsdon et al., 2007)<doi: 10.1111/j.1538-4632.2007.00709.x> and various forms of GW regression (Brunsdon et al., 1996)<doi: 10.1111/j.1538-4632.1996.tb00936.x>; some of which are provided in basic and robust (outlier resistant) forms.
In gene-expression microarray studies, for example, one generally obtains a list of dozens or hundreds of genes that differ in expression between samples and then asks What does all of this mean biologically? Alternatively, gene lists can be derived conceptually in addition to experimentally. For instance, one might want to analyze a group of genes known as housekeeping genes. The work of the Gene Ontology (GO) Consortium <geneontology.org> provides a way to address that question. GO organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. The role of GoMiner is to automate the mapping between a list of genes and GO, and to provide a statistical summary of the results as well as a visualization.
This package provides a set of functions designed to quickly generate results of a multiple choice test. Generates detailed global results, lists for anonymous feedback and personalised result feedback (in LaTeX and/or PDF format), as well as item statistics like Cronbach's alpha or disciminatory power. klausuR also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package rkward cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage.
Real-time quantitative polymerase chain reaction (qPCR) data sets by Lievens et al. (2012) <doi:10.1093/nar/gkr775>. Provides one single tabular tidy data set in long format, encompassing three dilution series, targeted against the soybean Lectin endogene. Each dilution series was assayed in one of the following PCR-efficiency-modifying conditions: no PCR inhibition, inhibition by isopropanol and inhibition by tannic acid. The inhibitors were co-diluted along with the dilution series. The co-dilution series consists of a five-point, five-fold serial dilution. For each concentration there are 18 replicates. Each amplification curve is 60 cycles long. Original raw data file is available at the Supplementary Data section at Nucleic Acids Research Online <doi:10.1093/nar/gkr775>.
An implementation of ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. As described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7>, ranked sparsity is a philosophy with methods primarily useful for variable selection in the presence of prior informational asymmetry, which occurs in the context of trying to perform variable selection in the presence of interactions and/or polynomials. Ultimately, this package attempts to facilitate dealing with cumbersome interactions and polynomials while not avoiding them entirely. Typically, models selected under ranked sparsity principles will also be more transparent, having fewer falsely selected interactions and polynomials than other methods.
This package provides a collection of datasets of human-computer interaction (HCI) experiments. Each dataset is from an HCI paper, with all fields described and the original publication linked. All paper authors of included data have consented to the inclusion of their data in this package. The datasets include data from a range of HCI studies, such as pointing tasks, user experience ratings, and steering tasks. Dataset sources: Bergström et al. (2022) <doi:10.1145/3490493>; Dalsgaard et al. (2021) <doi:10.1145/3489849.3489853>; Larsen et al. (2019) <doi:10.1145/3338286.3340115>; Lilija et al. (2019) <doi:10.1145/3290605.3300676>; Pohl and Murray-Smith (2013) <doi:10.1145/2470654.2481307>; Pohl and Mottelson (2022) <doi:10.3389/frvir.2022.719506>.
Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.
This package provides a Non-Metric Space Library ('NMSLIB <https://github.com/nmslib/nmslib>) wrapper, which according to the authors "is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the NMSLIB <https://github.com/nmslib/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic non-metric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on approximate methods". The wrapper also includes Approximate Kernel k-Nearest-Neighbor functions based on the NMSLIB <https://github.com/nmslib/nmslib> Python Library.