The kallisto | bustools pipeline is a fast and modular set of tools to convert single cell RNA-seq reads in fastq files into gene count or transcript compatibility counts (TCC) matrices for downstream analysis. Central to this pipeline is the barcode, UMI, and set (BUS) file format. This package serves the following purposes: First, this package allows users to manipulate BUS format files as data frames in R and then convert them into gene count or TCC matrices. Furthermore, since R and Rcpp code is easier to handle than pure C++ code, users are encouraged to tweak the source code of this package to experiment with new uses of BUS format and different ways to convert the BUS file into gene count matrix. Second, this package can conveniently generate files required to generate gene count matrices for spliced and unspliced transcripts for RNA velocity. Here biotypes can be filtered and scaffolds and haplotypes can be removed, and the filtered transcriptome can be extracted and written to disk. Third, this package implements utility functions to get transcripts and associated genes required to convert BUS files to gene count matrices, to write the transcript to gene information in the format required by bustools, and to read output of bustools into R as sparses matrices.
Online, Semi-online, and Offline K-medians algorithms are given. For both methods, the algorithms can be initialized randomly or with the help of a robust hierarchical clustering. The number of clusters can be selected with the help of a penalized criterion. We provide functions to provide robust clustering. Function gen_K()
enables to generate a sample of data following a contaminated Gaussian mixture. Functions Kmedians()
and Kmeans()
consists in a K-median and a K-means algorithms while Kplot()
enables to produce graph for both methods. Cardot, H., Cenac, P. and Zitt, P-A. (2013). "Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm". Bernoulli, 19, 18-43. <doi:10.3150/11-BEJ390>. Cardot, H. and Godichon-Baggioni, A. (2017). "Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis". Test, 26(3), 461-480 <doi:10.1007/s11749-016-0519-x>. Godichon-Baggioni, A. and Surendran, S. "A penalized criterion for selecting the number of clusters for K-medians" <arXiv:2209.03597>
Vardi, Y. and Zhang, C.-H. (2000). "The multivariate L1-median and associated data depth". Proc. Natl. Acad. Sci. USA, 97(4):1423-1426. <doi:10.1073/pnas.97.4.1423>.
Implement different Item Response Theory (IRT) based procedures for the development of static short test forms (STFs) from a test. Two main procedures are considered, specifically the typical IRT-based procedure for the development of STF, and a recently introduced procedure (Epifania, Anselmi & Robusto, 2022 <doi:10.1007/978-3-031-27781-8_7>). The procedures differ in how the most informative items are selected for the inclusion in the STF, either by considering their item information functions without considering any specific level of the latent trait (typical procedure) or by considering their informativeness with respect to specific levels of the latent trait, denoted as theta targets (the newly introduced procedure). Regarding the latter procedure, three methods are implemented for the definition of the theta targets: (i) theta targets are defined by segmenting the latent trait in equal intervals and considering the midpoint of each interval (equal interval procedure, eip), (ii) by clustering the latent trait to obtain unequal intervals and considering the centroids of the clusters as the theta targets (unequal intervals procedure, uip), and (iii) by letting the user set the specific theta targets of interest (user-defined procedure, udp). For further details on the procedure, please refer to Epifania, Anselmi & Robusto (2022) <doi:10.1007/978-3-031-27781-8_7>.
Local Individual Conditional Expectation ('localICE
') is a local explanation approach from the field of eXplainable
Artificial Intelligence (XAI). localICE
is a model-agnostic XAI approach which provides three-dimensional local explanations for particular data instances. The approach is proposed in the master thesis of Martin Walter as an extension to ICE (see Reference). The three dimensions are the two features at the horizontal and vertical axes as well as the target represented by different colors. The approach is applicable for classification and regression problems to explain interactions of two features towards the target. For classification models, the number of classes can be more than two and each class is added as a different color to the plot. The given instance is added to the plot as two dotted lines according to the feature values. The localICE-package
can explain features of type factor and numeric of any machine learning model. Automatically supported machine learning packages are mlr', randomForest
', caret or all other with an S3 predict function. For further model types from other libraries, a predict function has to be provided as an argument in order to get access to the model. Reference to the ICE approach: Alex Goldstein, Adam Kapelner, Justin Bleich, Emil Pitkin (2013) <arXiv:1309.6392>
.
This package provides a gradient descent algorithm to find a geodesic relationship between real-valued independent variables and a manifold-valued dependent variable (i.e. geodesic regression). Available manifolds are Euclidean space, the sphere, hyperbolic space, and Kendall's 2-dimensional shape space. Besides the standard least-squares loss, the least absolute deviations, Huber, and Tukey biweight loss functions can also be used to perform robust geodesic regression. Functions to help choose appropriate cutoff parameters to maintain high efficiency for the Huber and Tukey biweight estimators are included, as are functions for generating random tangent vectors from the Riemannian normal distributions on the sphere and hyperbolic space. The n-sphere is a n-dimensional manifold: we represent it as a sphere of radius 1 and center 0 embedded in (n+1)-dimensional space. Using the hyperboloid model of hyperbolic space, n-dimensional hyperbolic space is embedded in (n+1)-dimensional Minkowski space as the upper sheet of a hyperboloid of two sheets. Kendall's 2D shape space with K landmarks is of real dimension 2K-4; preshapes are represented as complex K-vectors with mean 0 and magnitude 1. Details are described in Shin, H.-Y. and Oh, H.-S. (2020) <arXiv:2007.04518>
. Also see Fletcher, P. T. (2013) <doi:10.1007/s11263-012-0591-y>.
In empirical studies, instrumental variable (IV) regression is the signature method to solve the endogeneity problem. If we enforce the exogeneity condition of the IV, it is likely that we end up with a large set of IVs without knowing which ones are good. Also, one could face the model uncertainty for structural equation, as large micro dataset is commonly available nowadays. This package uses adaptive group lasso and B-spline methods to select the nonparametric components of the IV function, with the linear function being a special case (naivereg). The package also incorporates two stage least squares estimator (2SLS), generalized method of moment (GMM), generalized empirical likelihood (GEL) methods post instrument selection, logistic-regression instrumental variables estimator (LIVE, for dummy endogenous variable problem), double-selection plus instrumental variable estimator (DS-IV) and double selection plus logistic regression instrumental variable estimator (DS-LIVE), where the double selection methods are useful for high-dimensional structural equation models. The naivereg is nonparametric version of ivregress in Stata with IV selection and high dimensional features. The package is based on the paper by Q. Fan and W. Zhong, "Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective" (2018), Journal of Business & Economic Statistics <doi:10.1080/07350015.2016.1180991> as well as a series of working papers led by the same authors.
While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging. Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning. However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem. We present a novel network analysis pipeline, DrDimont
, Drug response prediction from Differential analysis of multi-omics networks. It allows for comparative conclusions between two conditions and translates them into differential drug response predictions. DrDimont
focuses on molecular interactions. It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks. DrDimont's
predictions are explainable, i.e., molecular differences that are the source of high differential drug scores can be retrieved. Our proposed pipeline leverages multi-omics data for differential predictions, e.g. on drug response, and includes prior information on interactions. The case study presented in the vignette uses data published by Krug (2020) <doi:10.1016/j.cell.2020.10.036>. The package license applies only to the software and explicitly not to the included data.
IsoBayes
is a Bayesian method to perform inference on single protein isoforms. Our approach infers the presence/absence of protein isoforms, and also estimates their abundance; additionally, it provides a measure of the uncertainty of these estimates, via: i) the posterior probability that a protein isoform is present in the sample; ii) a posterior credible interval of its abundance. IsoBayes
inputs liquid cromatography mass spectrometry (MS) data, and can work with both PSM counts, and intensities. When available, trascript isoform abundances (i.e., TPMs) are also incorporated: TPMs are used to formulate an informative prior for the respective protein isoform relative abundance. We further identify isoforms where the relative abundance of proteins and transcripts significantly differ. We use a two-layer latent variable approach to model two sources of uncertainty typical of MS data: i) peptides may be erroneously detected (even when absent); ii) many peptides are compatible with multiple protein isoforms. In the first layer, we sample the presence/absence of each peptide based on its estimated probability of being mistakenly detected, also known as PEP (i.e., posterior error probability). In the second layer, for peptides that were estimated as being present, we allocate their abundance across the protein isoforms they map to. These two steps allow us to recover the presence and abundance of each protein isoform.
This package provides a multi-purpose and flexible k-meric enrichment analysis software. kmeRtone
measures the enrichment of k-mers by comparing the population of k-mers in the case loci with a carefully devised internal negative control group, consisting of k-mers from regions close to, yet sufficiently distant from, the case loci to mitigate any potential sequencing bias. This method effectively captures both the local sequencing variations and broader sequence influences, while also correcting for potential biases, thereby ensuring more accurate analysis. The core functionality of kmeRtone
is the SCORE()
function, which calculates the susceptibility scores for k-mers in case and control regions. Case regions are defined by the genomic coordinates provided in a file by the user and the control regions can be constructed relative to the case regions or provided directly. The k-meric susceptibility scores are calculated by using a one-proportion z-statistic. kmeRtone
is highly flexible by allowing users to also specify their target k-mer patterns and quantify the corresponding k-mer enrichment scores in the context of these patterns, allowing for a more comprehensive approach to understanding the functional implications of specific DNA sequences on a genomic scale (e.g., CT motifs upon UV radiation damage). Adib A. Abdullah, Patrick Pflughaupt, Claudia Feng, Aleksandr B. Sahakyan (2024) Bioinformatics (submitted).
Statistical hypothesis testing methods for inferring model-free functional dependency using asymptotic chi-squared or exact distributions. Functional test statistics are asymmetric and functionally optimal, unique from other related statistics. Tests in this package reveal evidence for causality based on the causality-by- functionality principle. They include asymptotic functional chi-squared tests (Zhang & Song 2013) <doi:10.48550/arXiv.1311.2707>
, an adapted functional chi-squared test (Kumar & Song 2022) <doi:10.1093/bioinformatics/btac206>, and an exact functional test (Zhong & Song 2019) <doi:10.1109/TCBB.2018.2809743> (Nguyen et al. 2020) <doi:10.24963/ijcai.2020/372>. The normalized functional chi-squared test was used by Best Performer NMSUSongLab
in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges (Hill et al. 2016) <doi:10.1038/nmeth.3773>. A function index (Zhong & Song 2019) <doi:10.1186/s12920-019-0565-9> (Kumar et al. 2018) <doi:10.1109/BIBM.2018.8621502> derived from the functional test statistic offers a new effect size measure for the strength of functional dependency, a better alternative to conditional entropy in many aspects. For continuous data, these tests offer an advantage over regression analysis when a parametric functional form cannot be assumed; for categorical data, they provide a novel means to assess directional dependency not possible with symmetrical Pearson's chi-squared or Fisher's exact tests.
This package provides users with a simple and convenient mechanism to manage and query a Virtuoso database using the DBI (Data-Base Interface) compatible ODBC (Open Database Connectivity) interface. Virtuoso is a high-performance "universal server," which can act as both a relational database, supporting standard Structured Query Language ('SQL') queries, while also supporting data following the Resource Description Framework ('RDF') model for Linked Data. RDF data can be queried using SPARQL ('SPARQL Protocol and RDF Query Language) queries, a graph-based query that supports semantic reasoning. This allows users to leverage the performance of local or remote Virtuoso servers using popular R packages such as DBI and dplyr', while also providing a high-performance solution for working with large RDF triplestores from R. The package also provides helper routines to install, launch, and manage a Virtuoso server locally on Mac', Windows and Linux platforms using the standard interactive installers from the R command-line. By automatically handling these setup steps, the package can make using Virtuoso considerably faster and easier for a most users to deploy in a local environment. Managing the bulk import of triples from common serializations with a single intuitive command is another key feature of this package. Bulk import performance can be tens to hundreds of times faster than the comparable imports using existing R tools, including rdflib and redland packages.
Tool to perform Bayesian inference of carcass processing/transport strategy and bone attrition from archaeofaunal skeletal profiles characterized by percentages of MAU (Minimum Anatomical Units). The approach is based on a generative model for skeletal profiles that replicates the two phases of formation of any faunal assemblage: initial accumulation as a function of human transport strategies and subsequent attrition.Two parameters define this model: 1) the transport preference (alpha), which can take any value between - 1 (mostly axial contribution) and 1 (mostly appendicular contribution) following strategies constructed as a function of butchering efficiency of different anatomical elements and the results of ethnographic studies, and 2) degree of attrition (beta), which can vary between 0 (no attrition) and 10 (maximum attrition) and relates the survivorship of bone elements to their maximum bone density. Starting from uniform prior probability distribution functions of alpha and beta, a Monte Carlo Markov Chain sampling based on a random walk Metropolis-Hasting algorithm is adopted to derive the posterior probability distribution functions, which are then available for interpretation. During this process, the likelihood of obtaining the observed percentages of MAU given a pair of parameter values is estimated by the inverse of the Chi2 statistic, multiplied by the proportion of elements within a 1 percent of the observed value. See Ana B. Marin-Arroyo, David Ocio (2018).<doi:10.1080/08912963.2017.1336620>.
This package implements a class of univariate and multivariate spatial generalised linear mixed models for areal unit data, with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation using a single or multiple Markov chains. The response variable can be binomial, Gaussian, multinomial, Poisson or zero-inflated Poisson (ZIP), and spatial autocorrelation is modelled by a set of random effects that are assigned a conditional autoregressive (CAR) prior distribution. A number of different models are available for univariate spatial data, including models with no random effects as well as random effects modelled by different types of CAR prior, including the BYM model (Besag et al., 1991, <doi:10.1007/BF00116466>) and Leroux model (Leroux et al., 2000, <doi:10.1007/978-1-4612-1284-3_4>). Additionally, a multivariate CAR (MCAR) model for multivariate spatial data is available, as is a two-level hierarchical model for modelling data relating to individuals within areas. Full details are given in the vignette accompanying this package. The initial creation of this package was supported by the Economic and Social Research Council (ESRC) grant RES-000-22-4256, and on-going development has been supported by the Engineering and Physical Science Research Council (EPSRC) grant EP/J017442/1, ESRC grant ES/K006460/1, Innovate UK / Natural Environment Research Council (NERC) grant NE/N007352/1 and the TB Alliance.
It can sometimes be difficult to ascertain when some events (such as property crime) occur because the victim is not present when the crime happens. As a result, police databases often record a start (or from') date and time, and an end (or to') date and time. The time span between these date/times can be minutes, hours, or sometimes days, hence the term Aoristic'. Aoristic is one of the past tenses in Greek and represents an uncertain occurrence in time. For events with a location describes with either a latitude/longitude, or X,Y coordinate pair, and a start and end date/time, this package generates an aoristic data frame with aoristic weighted probability values for each hour of the week, for each observation. The coordinates are not necessary for the program to calculate aoristic weights; however, they are part of this package because a spatial component has been integral to aoristic analysis from the start. Dummy coordinates can be introduced if the user only has temporal data. Outputs include an aoristic data frame, as well as summary graphs and displays. For more information see: Ratcliffe, JH (2002) Aoristic signatures and the temporal analysis of high volume crime patterns, Journal of Quantitative Criminology. 18 (1): 23-43. Note: This package replaces an original aoristic package (version 0.6) by George Kikuchi that has been discontinued with his permission.
The aim of this package is to offer new methodology for unit-level small area models under transformations and limited population auxiliary information. In addition to this new methodology, the widely used nested error regression model without transformations (see "An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data" by Battese, Harter and Fuller (1988) <doi:10.1080/01621459.1988.10478561>) and its well-known uncertainty estimate (see "The estimation of the mean squared error of small-area estimators" by Prasad and Rao (1990) <doi:10.1080/01621459.1995.10476570>) are provided. In this package, the log transformation and the data-driven log-shift transformation are provided. If a transformation is selected, an appropriate method is chosen depending on the respective input of the population data: Individual population data (see "Empirical best prediction under a nested error model with log transformation" by Molina and Martà n (2018) <doi:10.1214/17-aos1608>) but also aggregated population data (see "Estimating regional income indicators under transformations and access to limited population auxiliary information" by Würz, Schmid and Tzavidis <unpublished>) can be entered. Especially under limited data access, new methodologies are provided in saeTrafo
. Several options are available to assess the used model and to judge, present and export its results. For a detailed description of the package and the methods used see the corresponding vignette.
Pigna [_pìn'n'a_] is the Italian word for pine cone. In jargon, it is used to identify a task which is boring, banal, annoying, painful, frustrating and maybe even with a not so beautiful or rewarding result, just like the obstinate act of trying to challenge yourself in extracting pine nuts from a pine cone, provided that, in the end, you will find at least one inside it. Here you can find a backpack of functions to be used to solve small everyday problems of coding or analyzing (clinical) data, which would be normally solved using quick-and-dirty patches. You will be able to convert Hmisc and rms summary()
es into data.frames ready to be rendered by pander and knitr'. You can access easy-to-use wrappers to activate essential but useful progress bars (from progress') into your loops or functionals. Easy setup and control Telegram's bots (from telegram.bot') to send messages or to divert error messages to a Telegram's chat. You also have some utilities helping you in the development of packages, like the activation of the same user interface of usethis into your package, or call polite functions to ask a user to install other packages. Finally, you find a set of thematic sets of packages you may use to set up new environments quickly, installing them in a single call.
Single Layer Feed-forward Neural networks (SLFNs) have many applications in various fields of statistical modelling, especially for time-series forecasting. However, there are some major disadvantages of training such networks via the widely accepted gradient-based backpropagation algorithm, such as convergence to local minima, dependencies on learning rate and large training time. These concerns were addressed by Huang et al. (2006) <doi:10.1016/j.neucom.2005.12.126>, wherein they introduced the Extreme Learning Machine (ELM), an extremely fast learning algorithm for SLFNs which randomly chooses the weights connecting input and hidden nodes and analytically determines the output weights of SLFNs. It shows good generalized performance, but is still subject to a high degree of randomness. To mitigate this issue, this package uses a dimensionality reduction technique given in Hyvarinen (1999) <doi:10.1109/72.761722>, namely, the Independent Component Analysis (ICA) to determine the input-hidden connections and thus, remove any sort of randomness from the algorithm. This leads to a robust, fast and stable ELM model. Using functions within this package, the proposed model can also be compared with an existing alternative based on the Principal Component Analysis (PCA) algorithm given by Pearson (1901) <doi:10.1080/14786440109462720>, i.e., the PCA based ELM model given by Castano et al. (2013) <doi:10.1007/s11063-012-9253-x>, from which the implemented ICA based algorithm is greatly inspired.
Implement group response-adaptive randomization procedures, which also integrates standard non-group response-adaptive randomization methods as specialized instances. It is also uniquely capable of managing complex scenarios, including those with delayed and missing responses, thereby expanding its utility in real-world applications. This package offers 16 functions for simulating a variety of response adaptive randomization procedures. These functions are essential for guiding the selection of statistical methods in clinical trials, providing a flexible and effective approach to trial design. Some of the detailed methodologies and algorithms used in this package, please refer to the following references: LJ Wei (1979) <doi:10.1214/aos/1176344614> L. J. WEI and S. DURHAM (1978) <doi:10.1080/01621459.1978.10480109> Durham, S. D., FlournoY
, N. AND LI, W. (1998) <doi:10.2307/3315771> Ivanova, A., Rosenberger, W. F., Durham, S. D. and Flournoy, N. (2000) <https://www.jstor.org/stable/25053121> Bai Z D, Hu F, Shen L. (2002) <doi:10.1006/jmva.2001.1987> Ivanova, A. (2003) <doi:10.1007/s001840200220> Hu, F., & Zhang, L. X. (2004) <doi:10.1214/aos/1079120137> Hu, F., & Rosenberger, W. F. (2006, ISBN:978-0-471-65396-7). Zhang, L. X., Chan, W. S., Cheung, S. H., & Hu, F. (2007) <https://www.jstor.org/stable/26432528> Zhang, L., & Rosenberger, W. F. (2006) <doi:10.1111/j.1541-0420.2005.00496.x> Hu, F., Zhang, L. X., Cheung, S. H., & Chan, W. S. (2008) <doi:10.1002/cjs.5550360404>.
This package provides a complete and dedicated analytical toolbox for quality control and diagnosis based on subject-related measurements of micro-RNA (miRNA
) expressions. The package consists of a set of functions that allow to train, optimize and use a Bayesian classifier that relies on multiplets of measured miRNA
expressions. The package also implements the quality control tools required to preprocess input datasets. In addition, the package provides a function to carry out a statistical analysis of miRNA
expressions, which can give insights to improve the classifier's performance. The method implemented in the package was first introduced in L. Ricci, V. Del Vescovo, C. Cantaloni, M. Grasso, M. Barbareschi and M. A. Denti, "Statistical analysis of a Bayesian classifier based on the expression of miRNAs
", BMC Bioinformatics 16:287, 2015 <doi:10.1186/s12859-015-0715-9>. The package is thoroughly described in M. Castelluzzo, A. Perinelli, S. Detassis, M. A. Denti and L. Ricci, "MiRNA-QC-and-Diagnosis
: An R package for diagnosis based on MiRNA
expression", SoftwareX
12:100569, 2020 <doi:10.1016/j.softx.2020.100569>. Please cite both these works if you use the package for your analysis. DISCLAIMER: The software in this package is for general research purposes only and is thus provided WITHOUT ANY WARRANTY. It is NOT intended to form the basis of clinical decisions. Please refer to the GNU General Public License 3.0 (GPLv3) for further information.
Build and run spatially explicit agent-based models using only the R platform. NetLogoR
follows the same framework as the NetLogo
software (Wilensky (1999) <http://ccl.northwestern.edu/netlogo/>) and is a translation in R of the structure and functions of NetLogo
'. NetLogoR
provides new R classes to define model agents and functions to implement spatially explicit agent-based models in the R environment. This package allows benefiting of the fast and easy coding phase from the highly developed NetLogo
framework, coupled with the versatility, power and massive resources of the R software. Examples of two models from the NetLogo
software repository (Ants <http://ccl.northwestern.edu/netlogo/models/Ants>) and Wolf-Sheep-Predation (<http://ccl.northwestern.edu/netlogo/models/WolfSheepPredation>
), and a third, Butterfly, from Railsback and Grimm (2012) <https://www.railsback-grimm-abm-book.com/>, all written using NetLogoR
are available. The NetLogo
code of the original version of these models is provided alongside. A programming guide inspired from the NetLogo
Programming Guide (<https://ccl.northwestern.edu/netlogo/docs/programming.html>) and a dictionary of NetLogo
primitives (<https://ccl.northwestern.edu/netlogo/docs/dictionary.html>) equivalences are also available. NOTE: To increment time', these functions can use a for loop or can be integrated with a discrete event simulator, such as SpaDES
(<https://cran.r-project.org/package=SpaDES>
). The suggested package fastshp can be installed with install.packages("fastshp", repos = ("<https://rforge.net>"), type = "source")'.
Set of tools for analyzing lactate thresholds from a step incremental test to exhaustion. Easily analyze the methods Log-log, Onset of Blood Lactate Accumulation (OBLA), Baseline plus (Bsln+), Dmax, Lactate Turning Point (LTP), and Lactate / Intensity ratio (LTratio) in cycling, running, or swimming. Beaver WL, Wasserman K, Whipp BJ (1985) <doi:10.1152/jappl.1985.59.6.1936>. Heck H, Mader A, Hess G, Mücke S, Müller R, Hollmann W (1985) <doi:10.1055/s-2008-1025824>. Kindermann W, Simon G, Keul J (1979) <doi:10.1007/BF00421101>. Skinner JS, Mclellan TH (1980) <doi:10.1080/02701367.1980.10609285>. Berg A, Jakob E, Lehmann M, Dickhuth HH, Huber G, Keul J (1990) PMID 2408033. Zoladz JA, Rademaker AC, Sargeant AJ (1995) <doi:10.1113/jphysiol.1995.sp020959>. Cheng B, Kuipers H, Snyder A, Keizer H, Jeukendrup A, Hesselink M (1992) <doi:10.1055/s-2007-1021309>. Bishop D, Jenkins DG, Mackinnon LT (1998) <doi:10.1097/00005768-199808000-00014>. Hughson RL, Weisiger KH, Swanson GD (1987) <doi:10.1152/jappl.1987.62.5.1975>. Jamnick NA, Botella J, Pyne DB, Bishop DJ (2018) <doi:10.1371/journal.pone.0199794>. Hofmann P, Tschakert G (2017) <doi:10.3389/fphys.2017.00337>. Hofmann P, Pokan R, von Duvillard SP, Seibert FJ, Zweiker R, Schmid P (1997) <doi:10.1097/00005768-199706000-00005>. Pokan R, Hofmann P, Von Duvillard SP, et al. (1997) <doi:10.1097/00005768-199708000-00009>. Dickhuth H-H, Yin L, Niess A, et al. (1999) <doi:10.1055/s-2007-971105>.
Epithelial-Mesenchymal transition ('EMT') is an important form of cellular plasticity that is fully or partially activated in several biological scenarios including development and disease progression. EMT involves altered expression of hundreds of protein-coding and non-protein-coding genes. Recent studies showed the prevalence of partial EMT in multiple processes such as various cancers and organ fibrosis, which necessitates rigorous quantification of the degree of EMT'. While traditional gene set scoring methods such as gene set variation analysis have been used to generate EMT scores from omics data, multiple EMT scoring algorithms and EMT gene sets have been used by different groups without standardization. Furthermore, comparisons of EMT scores computed from different methods and/or different EMT gene sets are generally difficult due to both the context dependent nature of EMT and the lack of tools that comprehensively integrate varying components for EMT scoring. To address this problem, we have built a toolbox named EMTscore that enables users to select scoring methods from a list of previously used algorithms and EMT gene sets from a list of gene sets produced from different experiments. We provided several visualization methods for making publication quality plots of EMT scores from omics data. Furthermore, we showed a unique utility of a method based on principal component analysis for scoring divergent EMT processes from a single dataset. Overall, EMTscore provides an integrated solution for assessing the degree and complexity of EMT from omics data, and it paves the way for standardizing the comparison of EMT programs across multiple contexts.
This minimalist package is designed to quickly score raw data outputted from an Implicit Association Test (IAT; Greenwald, McGhee
, & Schwartz, 1998) <doi:10.1037/0022-3514.74.6.1464>. IAT scores are calculated as specified by Greenwald, Nosek, and Banaji (2003) <doi:10.1037/0022-3514.85.2.197>. Outputted values can be interpreted as effect sizes. The input function consists of three arguments. First, indicate the name of the dataset to be analyzed. This is the only required input. Second, indicate the number of trials in your entire IAT (the default is set to 219, which is typical for most IATs). Last, indicate whether congruent trials (e.g., flowers and pleasant) or incongruent trials (e.g., guns and pleasant) were presented first for this participant (the default is set to congruent). The script will tell you how long it took to run the code, the effect size for the participant, and whether that participant should be excluded based on the criteria outlined by Greenwald et al. (2003). Data files should consist of six columns organized in order as follows: Block (0-6), trial (0-19 for training blocks, 0-39 for test blocks), category (dependent on your IAT), the type of item within that category (dependent on your IAT), a dummy variable indicating whether the participant was correct or incorrect on that trial (0=correct, 1=incorrect), and the participantâ s reaction time (in milliseconds). Three sample datasets are included in this package (labeled IAT', TooFastIAT
', and BriefIAT
') to practice with.
The main goal of this package is to present various fuzzy statistical tools. It intends to provide an implementation of the theoretical and empirical approaches presented in the book entitled "The signed distance measure in fuzzy statistical analysis. Some theoretical, empirical and programming advances" <doi: 10.1007/978-3-030-76916-1>. For the theoretical approaches, see Berkachy R. and Donze L. (2019) <doi:10.1007/978-3-030-03368-2_1>. For the empirical approaches, see Berkachy R. and Donze L. (2016) <ISBN: 978-989-758-201-1>). Important (non-exhaustive) implementation highlights of this package are as follows: (1) a numerical procedure to estimate the fuzzy difference and the fuzzy square. (2) two numerical methods of fuzzification. (3) a function performing different possibilities of distances, including the signed distance and the generalized signed distance for instance with all its properties. (4) numerical estimations of fuzzy statistical measures such as the variance, the moment, etc. (5) two methods of estimation of the bootstrap distribution of the likelihood ratio in the fuzzy context. (6) an estimation of a fuzzy confidence interval by the likelihood ratio method. (7) testing fuzzy hypotheses and/or fuzzy data by fuzzy confidence intervals in the Kwakernaak - Kruse and Meyer sense. (8) a general method to estimate the fuzzy p-value with fuzzy hypotheses and/or fuzzy data. (9) a method of estimation of global and individual evaluations of linguistic questionnaires. (10) numerical estimations of multi-ways analysis of variance models in the fuzzy context. The unbalance in the considered designs are also foreseen.