Filtering of lowly expressed features (e.g. genes) is a common step before performing statistical analysis, but an arbitrary threshold is generally chosen. SeqGate implements a method that rationalize this step by the analysis of the distibution of counts in replicate samples. The gate is the threshold above which sequenced features can be considered as confidently quantified.
This package provides functions for Arps decline-curve analysis on oil and gas data. Includes exponential, hyperbolic, harmonic, and hyperbolic-to-exponential models as well as the preceding with initial curtailment or a period of linear rate buildup. Functions included for computing rate, cumulative production, instantaneous decline, EUR, time to economic limit, and performing least-squares best fits.
This package provides a framework for the replicable removal of personally identifiable data (PID) in data sets. The package implements a suite of methods to suit different data types based on the suggestions of Garfinkel (2015) <doi:10.6028/NIST.IR.8053> and the ICO "Guidelines on Anonymization" (2012) <https://ico.org.uk/media/1061/anonymisation-code.pdf>.
This package provides tools and methods to apply the model Geospatial Regression Equation for European Nutrient losses (GREEN); Grizzetti et al. (2005) <doi:10.1016/j.jhydrol.2004.07.036>; Grizzetti et al. (2008); Grizzetti et al. (2012) <doi:10.1111/j.1365-2486.2011.02576.x>; Grizzetti et al. (2021) <doi:10.1016/j.gloenvcha.2021.102281>.
Develops a General Equilibrium (GE) Model, which estimates key variables such as wages, the number of residents and workers, the prices of the floor space, and its distribution between commercial and residential use, as in Ahlfeldt et al., (2015) <doi:10.3982/ECTA10876>. By doing so, the model allows understanding the economic influence of different urban policies.
This package provides a comprehensive R interface to access data from the Kraken cryptocurrency exchange REST API <https://docs.kraken.com/api/>. It allows users to retrieve various market data, such as asset information, trading pairs, and price data. The package is designed to facilitate efficient data access for analysis, strategy development, and monitoring of cryptocurrency market trends.
Four measures of linkage disequilibrium are provided: the usual r^2 measure, the r^2_S measure (r^2 corrected by the structure sample), the r^2_V (r^2 corrected by the relatedness of genotyped individuals), the r^2_VS measure (r^2 corrected by both the relatedness of genotyped individuals and the structure of the sample).
This package provides functions to fit finite mixture of scale mixture of skew-normal (FM-SMSN) distributions, details in Prates, Lachos and Cabral (2013) <doi: 10.18637/jss.v054.i12>, Cabral, Lachos and Prates (2012) <doi:10.1016/j.csda.2011.06.026> and Basso, Lachos, Cabral and Ghosh (2010) <doi:10.1016/j.csda.2009.09.031>.
An R wrapper for pulling data from the National Public Transport Access Nodes ('NaPTAN') API (<https://www.api.gov.uk/dft/national-public-transport-access-nodes-naptan-api/#national-public-transport-access-nodes-naptan-api>). This allows users to download NaPTAN transport information, for the full dataset, by ATCO region code, or by name of region.
Computes effective population size (Ne) and the Ne/N ratio for stage-structured populations using the matrix population model framework of Yonezawa (2000) <doi:10.1111/j.0014-3820.2000.tb01244.x>. Functions are provided for sexually reproducing, clonally reproducing, and mixed (sexual + clonal) populations. Includes sensitivity and elasticity analyses for Ne/N with respect to vital rates.
An implementation of Simultaneous Truth and Performance Level Estimation (STAPLE) <doi:10.1109/TMI.2004.828354>. This method is used when there are multiple raters for an object, typically an image, and this method fuses these ratings into one rating. It uses an expectation-maximization method to estimate this rating and the individual specificity/sensitivity for each rater.
This dataset was collected using a new four-arm within-study comparison design. The study aimed to examine the impact of a mathematics training intervention and a vocabulary study session on post-test scores in mathematics and vocabulary, respectively. The innovative four-arm within-study comparison design facilitates both experimental and quasi-experimental identification of average causal effects.
Various tools for handling fuzzy measures, calculating Shapley value and interaction index, Choquet and Sugeno integrals, as well as fitting fuzzy measures to empirical data are provided. Construction of fuzzy measures from empirical data is done by solving a linear programming problem by using lpsolve package, whose source in C adapted to the R environment is included. The description of the basic theory of fuzzy measures is in the manual in the Doc folder in this package. Please refer to the following: [1] <https://personal-sites.deakin.edu.au/~gleb/fmtools.html> [2] G. Beliakov, H. Bustince, T. Calvo, A Practical Guide to Averaging', Springer, (2016, ISBN: 978-3-319-24753-3). [3] G. Beliakov, S. James, J-Z. Wu, Discrete Fuzzy Measures', Springer, (2020, ISBN: 978-3-030-15305-2).
For tree ensembles such as random forests, regularized random forests and gradient boosted trees, this package provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner; calculating frequent variable interactions; formatting rules in latex code. Reference: Interpreting tree ensembles with inTrees (Houtao Deng, 2019, <doi:10.1007/s41060-018-0144-8>).
This package Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting point estimates from interaction terms in regressions (and multiply imputed regressions). NOTE: Weighted partial correlation calculations pulled to address a bug.
This package provides a port of the web-based software DAGitty for analyzing structural causal models (also known as directed acyclic graphs or DAGs). This package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
After the clustering step of a single-cell RNAseq experiment, this package aims to suggest labels/cell types for the clusters, on the basis of similarity to a reference dataset. It requires a table of read counts per cell per gene, and a list of the cells belonging to each of the clusters, (for both test and reference data).
Messina is a collection of algorithms for constructing optimally robust single-gene classifiers, and for identifying differential expression in the presence of outliers or unknown sample subgroups. The methods have application in identifying lead features to develop into clinical tests (both diagnostic and prognostic), and in identifying differential expression when a fraction of samples show unusual patterns of expression.
The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.
This package enables automated selection of group specific signature, especially for rare population. The package is developed for generating specifc lists of signature genes based on Term Frequency-Inverse Document Frequency (TF-IDF) modified methods. It can also be used as a new gene-set scoring method or data transformation method. Multiple visualization functions are implemented in this package.
This package provides a collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the Sudachi morphological analyzer and the NEologd (Neologism dictionary for MeCab'). These features are specific to Japanese and are not implemented in ICU (International Components for Unicode).
This wrapper package for mgcv makes it easier to create high-performing Generalized Additive Models (GAMs). With its central function autogam(), by entering just a dataset and the name of the outcome column as inputs, AutoGAM tries to automate the procedure of configuring a highly accurate GAM which performs at reasonably high speed, even for large datasets.
Computes the cosine-correlation coefficient for measuring the degree of linear dependence among variables in a multidimensional context. The package implements the generalized cosine-correlation theorem for p-1 variables, providing a quantitative assessment of interrelationships within experimental frameworks. This methodology extends classical correlation measures to higher-dimensional spaces using a dimensional exploration approach based on time scale calculus.
Package to fit diffusion-based IRT models to response and response time data. Models are fit using marginal maximum likelihood. Parameter restrictions (fixed value and equality constraints) are possible. In addition, factor scores (person drift rate and person boundary separation) can be estimated. Model fit assessment tools are also available. The traditional diffusion model can be estimated as well.