EpiCompare is used to compare and analyse epigenetic datasets for quality control and benchmarking purposes. The package outputs an HTML report consisting of three sections: (1. General metrics) Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples, (2. Peak overlap) Percentage and statistical significance of overlapping and non-overlapping peaks. Also includes upset plot and (3. Functional annotation) functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around TSS.
Access to internal data required for the functional performance of easier package and exemplary bladder cancer dataset with both processed RNA-seq data and information on response to ICB therapy generated by Mariathasan et al. "TGF-B attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells", published in Nature, 2018 [doi:10.1038/nature25501](https://doi.org/10.1038/nature25501). The data is made available via [`IMvigor210CoreBiologies`](http://research-pub.gene.com/IMvigor210CoreBiologies/) package under the CC-BY license.
This package provides a pipeline with high specificity and sensitivity in extracting proteins from the RefSeq database (National Center for Biotechnology Information). Manual identification of gene families is highly time-consuming and laborious, requiring an iterative process of manual and computational analysis to identify members of a given family. The pipelines implements an automatic approach for the identification of gene families based on the conserved domains that specifically define that family. See Die et al. (2018) <doi:10.1101/436659> for more information and examples.
This package performs Modal Clustering (MAC) including Hierarchical Modal Clustering (HMAC) along with their parallel implementation (PHMAC) over several processors. These model-based non-parametric clustering techniques can extract clusters in very high dimensions with arbitrary density shapes. By default clustering is performed over several resolutions and the results are summarised as a hierarchical tree. Associated plot functions are also provided. There is a package vignette that provides many examples. This version adheres to CRAN policy of not spanning more than two child processes by default.
This package provides a comprehensive toolkit for missing person identification combining genetic and non-genetic evidence within a Bayesian framework. Computes likelihood ratios (LRs) for DNA profiles, biological sex, age, hair color, and birthdate evidence. Provides decision analysis tools including optimal LR thresholds, error rate calculations, and ROC curve visualization. Includes interactive Shiny applications for exploring evidence combinations. For methodological details see Marsico et al. (2023) <doi:10.1016/j.fsigen.2023.102891> and Marsico, Vigeland et al. (2021) <doi:10.1016/j.fsigen.2021.102519>.
Implementation of T. Hailperin's procedure to calculate lower and upper bounds of the probability for a propositional-logic expression, given equality and inequality constraints on the probabilities for other expressions. Truth-valuation is included as a special case. Applications range from decision-making and probabilistic reasoning, to pedagogical for probability and logic courses. For more details see T. Hailperin (1965) <doi:10.1080/00029890.1965.11970533>, T. Hailperin (1996) "Sentential Probability Logic" ISBN:0-934223-45-9, and package documentation. Requires the lpSolve package.
Determine the chlorophyll a (Chl a) concentrations of different phytoplankton groups based on their pigment biomarkers. The method uses non-negative matrix factorisation and simulated annealing to minimise error between the observed and estimated values of pigment concentrations (Hayward et al. (2023) <doi:10.1002/lom3.10541>). The approach is similar to the widely used CHEMTAX program (Mackey et al. 1996) <doi:10.3354/meps144265>, but is more straightforward, accurate, and not reliant on initial guesses for the pigment to Chl a ratios for phytoplankton groups.
This package provides tools for generating and analyzing simulation studies. Users may easily specify all terms of a simulation study, often in a single line of code. Common univariate and bivariate methods, such as t tests, proportions tests, and chi squared tests, are integrated. Multivariate studies involving linear or logistic regression may also be specified with symbolic inputs. The simulation studies generate data for n observations in each of B experiments. Analyses of each experiment are integrated, and empirical results across the experiments are also provided.
Implementation of the boosting procedure with the simulation and extrapolation approach to address variable selection and estimation for high-dimensional data subject to measurement error in predictors. It can be used to address generalized linear models (GLM) in Chen (2023) <doi: 10.1007/s11222-023-10209-3> and the accelerated failure time (AFT) model in Chen and Qiu (2023) <doi: 10.1111/biom.13898>. Some relevant references include Chen and Yi (2021) <doi:10.1111/biom.13331> and Hastie, Tibshirani, and Friedman (2008, ISBN:978-0387848570).
Scaffold an entire web-based report using template chunks, based on a small chapter overview and a dataset. Highly adaptable with prefixes, suffixes, translations, etc. Also contains tools for password-protecting, e.g. for each organization's report on a website. Developed for the common case of a survey across multiple organizations/sites where each organization wants to obtain results for their organization compared with everyone else. See saros (<https://CRAN.R-project.org/package=saros>) for tools used for authors in the drafted reports.
Fundamental time series forecasting models such as autoregressive integrated moving average (ARIMA), exponential smoothing, and simple moving average are included. For ARIMA models, the output follows the traditional parameterisation by Box and Jenkins (1970, ISBN: 0816210942, 9780816210947). Furthermore, there are functions for detailed time series exploration and decomposition, respectively. All data and result visualisations are generated by ggplot2 instead of conventional R graphical output. For more details regarding the theoretical background of the models see Hyndman, R.J. and Athanasopoulos, G. (2021) <https://otexts.com/fpp3/>.
By creating crowd-sourcing tasks that can be easily posted and results retrieved using Amazon's Mechanical Turk (MTurk) API, researchers can use this solution to validate the quality of topics obtained from unsupervised or semi-supervised learning methods, and the relevance of topic labels assigned. This helps ensure that the topic modeling results are accurate and useful for research purposes. See Ying and others (2022) <doi:10.1101/2023.05.02.538599>. For more information, please visit <https://github.com/Triads-Developer/Topic_Model_Validation>.
This package provides a collection of functions dedicated to simulating staggered entry platform trials whereby the treatment under investigation is a combination of two active compounds. In order to obtain approval for this combination therapy, superiority of the combination over the two active compounds and superiority of the two active compounds over placebo need to be demonstrated. A more detailed description of the design can be found in Meyer et al. <DOI:10.1002/pst.2194> and a manual in Meyer et al. <arXiv:2202.02182>.
This package provides a tool to sample data with the desired properties.Samples can be drawn by purposive sampling with determining distributional conditions, such as deviation from normality (skewness and kurtosis), and sample size in quantitative research studies. For purposive sampling, a researcher has something in mind and participants that fit the purpose of the study are included (Etikan,Musa, & Alkassim, 2015) <doi:10.11648/j.ajtas.20160501.11>.Purposive sampling can be useful for answering many research questions (Klar & Leeper, 2019) <doi:10.1002/9781119083771.ch21>.
Likelihood-based approaches to estimate linear regression parameters and treatment effects in the presence of endogeneity. Specifically, this package includes James Heckman's classical simultaneous equation models-the sample selection model for outcome selection bias and hybrid model with structural shift for endogenous treatment. For more information, see the seminal paper of Heckman (1978) <DOI:10.3386/w0177> in which the details of these models are provided. This package accommodates repeated measures on subjects with a working independence approach. The hybrid model further accommodates treatment effect modification.
This package implements a method of iteratively collapsing the rows of a contingency table, two at a time, by selecting the pair of categories whose combination yields a new table with the smallest loss of chi-squared, as described by Greenacre, M.J. (1988) <doi:10.1007/BF01901670>. The result is compatible with the class of object returned by the stats package's hclust() function and can be used similarly (plotted as a dendrogram, cut, etc.). Additional functions are provided for automatic cutting and diagnostic plotting.
Matrix eQTL is designed for fast eQTL analysis on large datasets. Matrix eQTL can test for association between genotype and gene expression using linear regression with either additive or ANOVA genotype effects. The models can include covariates to account for factors as population stratification, gender, and clinical variables. It also supports models with heteroscedastic and/or correlated errors, false discovery rate estimation and separate treatment of local (cis) and distant (trans) eQTLs. For more details see Shabalin (2012) <doi:10.1093/bioinformatics/bts163>.
Several methods have been developed to integrate structural equation modeling techniques with network data analysis to examine the relationship between network and non-network data. Both node-based and edge-based information can be extracted from the network data to be used as observed variables in structural equation modeling. To facilitate the application of these methods, model specification can be performed in the familiar syntax of the lavaan package, ensuring ease of use for researchers. Technical details and examples can be found at <https://bigsem.psychstat.org>.
When people make decisions, they may do so using a wide variety of decision rules. The package allows users to easily create obfuscation games to test the obfuscation hypothesis. It provides an easy to use interface and multiple options designed to vary the difficulty of the game and tailor it to the user's needs. For more detail: Chorus et al., 2021, Obfuscation maximization-based decision-making: Theory, methodology and first empirical evidence, Mathematical Social Sciences, 109, 28-44, <doi:10.1016/j.mathsocsci.2020.10.002>.
This package implements conjugate power priors for efficient Bayesian analysis of normal data. Power priors allow principled incorporation of historical information while controlling the degree of borrowing through a discounting parameter (Ibrahim and Chen (2000) <doi:10.1214/ss/1009212519>). This package provides closed-form conjugate representations for both univariate and multivariate normal data using Normal-Inverse-Chi-squared and Normal-Inverse-Wishart distributions, eliminating the need for MCMC sampling. The conjugate framework builds upon standard Bayesian methods described in Gelman et al. (2013, ISBN:978-1439840955).
This package provides functions to calculate exact critical values, statistical power, expected time to signal, and required sample sizes for performing exact sequential analysis. All these calculations can be done for either Poisson or binomial data, for continuous or group sequential analyses, and for different types of rejection boundaries. In case of group sequential analyses, the group sizes do not have to be specified in advance and the alpha spending can be arbitrarily settled. For regression versions of the methods, Monte Carlo and asymptotic methods are used.
Representation-dependent gene-level operations for genetic and evolutionary algorithms with real-coded genes are collected in this package. The common feature of the gene operations is that all of them are useful for derivation-free optimization algorithms. At the moment the package implements initialization, mutation, crossover, and replication operations for differential evolution as described in Price, Kenneth V., Storn, Rainer M. and Lampinen, Jouni A. (2005) <doi:10.1007/3-540-31306-0>. In addition, several (more recent) methods for determining the scale factor are provided.
CAGE is a widely used high throughput assay for measuring transcription start site (TSS) activity. CAGEfightR is an R/Bioconductor package for performing a wide range of common data analysis tasks for CAGE and 5'-end data in general. Core functionality includes: import of CAGE TSSs (CTSSs), tag (or unidirectional) clustering for TSS identification, bidirectional clustering for enhancer identification, annotation with transcript and gene models, correlation of TSS and enhancer expression, calculation of TSS shapes, quantification of CAGE expression as expression matrices and genome brower visualization.
Are you spending too much time fetching and managing clinical trial data? Struggling with complex queries and bulk data extraction? What if you could simplify this process with just a few lines of code? Introducing clintrialx - Fetch clinical trial data from sources like ClinicalTrials.gov <https://clinicaltrials.gov/> and the Clinical Trials Transformation Initiative - Access to Aggregate Content of ClinicalTrials.gov database <https://aact.ctti-clinicaltrials.org/>, supporting pagination and bulk downloads. Also, you can generate HTML reports based on the data obtained from the sources!