SVP uses the distance between cells and cells, features and features, cells and features in the space of MCA to build nearest neighbor graph, then uses random walk with restart algorithm to calculate the activity score of gene sets (such as cell marker genes, kegg pathway, go ontology, gene modules, transcription factor or miRNA target sets, reactome pathway, ...), which is then further weighted using the hypergeometric test results from the original expression matrix. To detect the spatially or single cell variable gene sets or (other features) and the spatial colocalization between the features accurately, SVP provides some global and local spatial autocorrelation method to identify the spatial variable features. SVP is developed based on SingleCellExperiment class, which can be interoperable with the existing computing ecosystem.
Differential abundance testing in microbiome data challenges both parametric and non-parametric statistical methods, due to its sparsity, high variability and compositional nature. Microbiome-specific statistical methods often assume classical distribution models or take into account compositional specifics. These produce results that range within the specificity vs sensitivity space in such a way that type I and type II error that are difficult to ascertain in real microbiome data when a single method is used. Recently, a consensus approach based on multiple differential abundance (DA) methods was recently suggested in order to increase robustness. With dar, you can use dplyr-like pipeable sequences of DA methods and then apply different consensus strategies. In this way we can obtain more reliable results in a fast, consistent and reproducible way.
This package provides a method of recovering the precision matrix for Gaussian graphical models efficiently. Our approach could be divided into three categories. First of all, we use Hard Graphical Thresholding for best subset selection problem of Gaussian graphical model, and the core concept of this method was proposed by Luo et al. (2014) <arXiv:1407.7819>. Secondly, a closed form solution for graphical lasso under acyclic graph structure is implemented in our package (Fattahi and Sojoudi (2019) <https://jmlr.org/papers/v20/17-501.html>). Furthermore, we implement block coordinate descent algorithm to efficiently solve the covariance selection problem (Dempster (1972) <doi:10.2307/2528966>). Our package is computationally efficient and can solve ultra-high-dimensional problems, e.g. p > 10,000, in a few minutes.
The generalised lambda distribution, or Tukey lambda distribution, provides a wide variety of shapes with one functional form. This package provides random numbers, quantiles, probabilities, densities and density quantiles for four different types of the distribution, the FKML (Freimer et al 1988), RS (Ramberg and Schmeiser 1974), GPD (van Staden and Loots 2009) and FM5 - see documentation for details. It provides the density function, distribution function, and Quantile-Quantile plots. It implements a variety of estimation methods for the distribution, including diagnostic plots. Estimation methods include the starship (all 4 types), method of L-Moments for the GPD and FKML types, and a number of methods for only the FKML type. These include maximum likelihood, maximum product of spacings, Titterington's method, Moments, Trimmed L-Moments and Distributional Least Absolutes.
Calculate users prevalence of a product based on the prevalence of triers in the population. The measurement of triers is relatively easy. It is just a question of whether a person tried a product even once in his life or not. On the other hand, The measurement of people who also adopt it as part of their life is more complicated since adopting an innovative product is a subjective view of the individual. Mickey Kislev and Shira Kislev developed a formula to calculate the prevalence of a product's users to overcome this difficulty. The current package assists in calculating the users prevalence of a product based on the prevalence of triers in the population. See for: Kislev, M. M., and S. Kislev (2020) <doi:10.5539/ijms.v12n4p63>.
This package provides comprehensive functionalities for causal modeling with Coincidence Analysis (CNA), which is a configurational comparative method of causal data analysis that was first introduced in Baumgartner (2009) <doi:10.1177/0049124109339369>, and generalized in Baumgartner & Ambuehl (2020) <doi:10.1017/psrm.2018.45>. CNA is designed to recover INUS-causation from data, which is particularly relevant for analyzing processes featuring conjunctural causation (component causation) and equifinality (alternative causation). CNA is currently the only method for INUS-discovery that allows for multiple effects (outcomes/endogenous factors), meaning it can analyze common-cause and causal chain structures. Moreover, as of version 4.0, it is the only method of its kind that provides measures for model evaluation and selection that are custom-made for the problem of INUS-discovery.
This package provides functions for testing affine hypotheses on the regression coefficient vector in regression models with heteroskedastic errors: (i) a function for computing various test statistics (in particular using HC0-HC4 covariance estimators based on unrestricted or restricted residuals); (ii) a function for numerically approximating the size of a test based on such test statistics and a user-supplied critical value; and, most importantly, (iii) a function for determining size-controlling critical values for such test statistics and a user-supplied significance level (also incorporating a check of conditions under which such a size-controlling critical value exists). The three functions are based on results in Poetscher and Preinerstorfer (2021) "Valid Heteroskedasticity Robust Testing" <doi:10.48550/arXiv.2104.12597>, which will appear as <doi:10.1017/S0266466623000269>.
Supports Bayesian models with full and partial (hence arbitrary) dependencies between random variables. Discrete and continuous variables are supported, and conditional joint probabilities and probability densities are estimated using Kernel Density Estimation (KDE). The full general form, which implements an extension to Bayes theorem, as well as the simple form, which is just a Bayesian network, both support regression through segmentation and KDE and estimation of probability or relative likelihood of discrete or continuous target random variables. This package also provides true statistical distance measures based on Bayesian models. Furthermore, these measures can be facilitated on neighborhood searches, and to estimate the similarity and distance between data points. Related work is by Bayes (1763) <doi:10.1098/rstl.1763.0053> and by Scutari (2010) <doi:10.18637/jss.v035.i03>.
The REUSE tool helps you achieve and confirm license compliance with the REUSE specification, a set of recommendations for licensing Free Software projects. REUSE makes it easy to declare the licenses under which your works are released, especially when reusing software from different projects released under different licenses. It avoids reliance on fuzzy heuristicts and allows both legal experts and computers to understand how your project is licensed. This allows generating a "bill of materials" for software.
This tool downloads full license texts, adds copyright and license information to file headers, and contains a linter to identify problems. There are other tools that have a lot more features and functionality surrounding the analysis and inspection of copyright and licenses in software projects. This one is designed to be simple.
This package provides tools to download and manipulate the Permanent Household Survey from Argentina (EPH is the Spanish acronym for Permanent Household Survey). e.g: get_microdata() for downloading the datasets, get_poverty_lines() for downloading the official poverty baskets, calculate_poverty() for the calculation of stating if a household is in poverty or not, following the official methodology. organize_panels() is used to concatenate observations from different periods, and organize_labels() adds the official labels to the data. The implemented methods are based on INDEC (2016) <http://www.estadistica.ec.gba.gov.ar/dpe/images/SOCIEDAD/EPH_metodologia_22_pobreza.pdf>. As this package works with the argentinian Permanent Household Survey and its main audience is from this country, the documentation was written in Spanish.
This package implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the Expectation-Maximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individual-level data can be directly incorporated into the estimation whenever such data are available. Along with in-sample and out-of-sample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.
Do algebraic operations on neural networks. We seek here to implement in R, operations on neural networks and their resulting approximations. Our operations derive their descriptions mainly from Rafi S., Padgett, J.L., and Nakarmi, U. (2024), "Towards an Algebraic Framework For Approximating Functions Using Neural Network Polynomials", <doi:10.48550/arXiv.2402.01058>, Grohs P., Hornung, F., Jentzen, A. et al. (2023), "Space-time error estimates for deep neural network approximations for differential equations", <doi:10.1007/s10444-022-09970-2>, Jentzen A., Kuckuck B., von Wurstemberger, P. (2023), "Mathematical Introduction to Deep Learning Methods, Implementations, and Theory" <doi:10.48550/arXiv.2310.20360>. Our implementation is meant mainly as a pedagogical tool, and proof of concept. Faster implementations with deeper vectorizations may be made in future versions.
The â TADâ package compiled an analytical framework based on an analysis of the shape of the trait abundance distributions to better understand community assembly processes, and predict community dynamics under environmental changes. This framework mobilized a study of the relationship between the moments describing the shape of the distributions: the skewness and the kurtosis (SKR). The SKR allows the identification of commonalities in the shape of trait distributions across contrasting communities. Derived from the SKR, we developed mathematical parameters that summarise the complex pattern of distributions by assessing (i) the R², (ii) the Y-intercept, (iii) the slope, (iv) the functional stability of community (TADstab), and, (v) the distance from specific distribution families (i.e., the distance from the skew-uniform family a limit to the highest degree of evenness: TADeve).
Construction of the Total Operating Characteristic (TOC) Curve and the Receiver (aka Relative) Operating Characteristic (ROC) Curve for spatial and non-spatial data. The TOC method is a modification of the ROC method which measures the ability of an index variable to diagnose either presence or absence of a characteristic. The diagnosis depends on whether the value of an index variable is above a threshold. Each threshold generates a two-by-two contingency table, which contains four entries: hits (H), misses (M), false alarms (FA), and correct rejections (CR). While ROC shows for each threshold only two ratios, H/(H + M) and FA/(FA + CR), TOC reveals the size of every entry in the contingency table for each threshold (Pontius Jr., R.G., Si, K. 2014. <doi:10.1080/13658816.2013.862623>).
The Aligned Corpus Toolkit (act) is designed for linguists that work with time aligned transcription data. It offers functions to import and export various annotation file formats ('ELAN .eaf, EXMARaLDA .exb and Praat .TextGrid files), create print transcripts in the style of conversation analysis, search transcripts (span searches across multiple annotations, search in normalized annotations, make concordances etc.), export and re-import search results (.csv and Excel .xlsx format), create cuts for the search results (print transcripts, audio/video cuts using FFmpeg and video sub titles in Subrib title .srt format), modify the data in a corpus (search/replace, delete, filter etc.), interact with Praat using Praat'-scripts, and exchange data with the rPraat package. The package is itself written in R and may be expanded by other users.
This package implements a tree-based method specifically designed for personalized medicine applications. By using genomic and mutational data, ODT efficiently identifies optimal drug recommendations tailored to individual patient profiles. The ODT algorithm constructs decision trees that bifurcate at each node, selecting the most relevant markers (discrete or continuous) and corresponding treatments, thus ensuring that recommendations are both personalized and statistically robust. This iterative approach enhances therapeutic decision-making by refining treatment suggestions until a predefined group size is achieved. Moreover, the simplicity and interpretability of the resulting trees make the method accessible to healthcare professionals. Includes functions for training the decision tree, making predictions on new samples or patients, and visualizing the resulting tree. For detailed insights into the methodology, please refer to Gimeno et al. (2023) <doi:10.1093/bib/bbad200>.
This package provides functions are provided that implement the use of the Fieller's formula methodology, for calculating a confidence interval for a ratio of (commonly, correlated) means. See Fieller (1954) <doi:10.1111/j.2517-6161.1954.tb00159.x>. Here, the application of primary interest is to studies of insect mortality response to increasing doses of a fumigant, or, e.g., to time in coolstorage. The formula is used to calculate a confidence interval for the dose or time required to achieve a specified mortality proportion, commonly 0.5 or 0.99. Vignettes demonstrate link functions that may be considered, checks on fitted models, and alternative choices of error family. Note in particular the betabinomial error family. See also Maindonald, Waddell, and Petry (2001) <doi:10.1016/S0925-5214(01)00082-5>.
Biterm Topic Models find topics in collections of short texts. It is a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns which are called biterms. This in contrast to traditional topic models like Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis which are word-document co-occurrence topic models. A biterm consists of two words co-occurring in the same short text window. This context window can for example be a twitter message, a short answer on a survey, a sentence of a text or a document identifier. The techniques are explained in detail in the paper 'A Biterm Topic Model For Short Text' by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng (2013) https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf.
This package implements the Interpolate, Truncate, Project (ITP) root-finding algorithm developed by Oliveira and Takahashi (2021) <doi:10.1145/3423597>. The user provides the function, from the real numbers to the real numbers, and an interval with the property that the values of the function at its endpoints have different signs. If the function is continuous over this interval then the ITP method estimates the value at which the function is equal to zero. If the function is discontinuous then a point of discontinuity at which the function changes sign may be found. The function can be supplied using either an R function or an external pointer to a C++ function. Tuning parameters of the ITP algorithm can be set by the user. Default values are set based on arguments in Oliveira and Takahashi (2021).
Conducts Bayesian Hypothesis tests of a point null hypothesis against a two-sided alternative using Non-local Alternative Prior (NAP) for one- and two-sample z- and t-tests (Pramanik and Johnson, 2022). Under the alternative, the NAP is assumed on the standardized effects size in one-sample tests and on their differences in two-sample tests. The package considers two types of NAP densities: (1) the normal moment prior, and (2) the composite alternative. In fixed design tests, the functions calculate the Bayes factors and the expected weight of evidence for varied effect size and sample size. The package also provides a sequential testing framework using the Sequential Bayes Factor (SBF) design. The functions calculate the operating characteristics (OC) and the average sample number (ASN), and also conducts sequential tests for a sequentially observed data.
This package provides a single, phenome-wide permutation of large-scale biobank data. When a large number of phenotypes are analyzed in parallel, a single permutation across all phenotypes followed by genetic association analyses of the permuted data enables estimation of false discovery rates (FDRs) across the phenome. These FDR estimates provide a significance criterion for interpreting genetic associations in a biobank context. For the basic permutation of unrelated samples, this package takes a sample-by-variable file with ID, genotypic covariates, phenotypic covariates, and phenotypes as input. For data with related samples, it also takes a file with sample pair-wise identity-by-descent information. The function outputs a permuted sample-by-variable file ready for genome-wide association analysis. See Annis et al. (2021) <doi:10.21203/rs.3.rs-873449/v1> for details.
It performs interlaboratory studies (ILS) to detect those laboratories that provide non-consistent results when comparing to others. It permits to work simultaneously with various testing materials, from standard univariate, and functional data analysis (FDA) perspectives. The univariate approach based on ASTM E691-08 consist of estimating the Mandel's h and k statistics to identify those laboratories that provide more significant different results, testing also the presence of outliers by Cochran and Grubbs tests, Analysis of variance (ANOVA) techniques are provided (F and Tuckey tests) to test differences in means corresponding to different laboratories per each material. Taking into account the functional nature of data retrieved in analytical chemistry, applied physics and engineering (spectra, thermograms, etc.). ILS package provides a FDA approach for finding the Mandel's k and h statistics distribution by smoothing bootstrap resampling.
This package provides a single analysis path that includes distance-based ordination, global tests of any effect of the microbiome, and tests of the effects of individual taxa with false-discovery-rate (FDR) control. It accommodates both continuous and discrete covariates as well as interaction terms to be tested either singly or in combination, allows for adjustment of confounding covariates, and uses permutation-based p-values that can control for sample correlations. It can be applied to transformed data, and an omnibus test can combine results from analyses conducted on different transformation scales. It can also be used for testing presence-absence associations based on infinite number of rarefaction replicates, testing mediation effects of the microbiome, analyzing censored time-to-event outcomes, and for compositional analysis by fitting linear models to centered-log-ratio taxa count data.
This package provides functions for creating ensembles of optimal trees for regression, classification (Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). (2019) <doi:10.1007/s11634-019-00364-9>) and class membership probability estimation (Khan, Z, Gul, A, Mahmoud, O, Miftahuddin, M, Perperoglou, A, Adler, W & Lausen, B (2016) <doi:10.1007/978-3-319-25226-1_34>) are given. A few trees are selected from an initial set of trees grown by random forest for the ensemble on the basis of their individual and collective performance. Three different methods of tree selection for the case of classification are given. The prediction functions return estimates of the test responses and their class membership probabilities. Unexplained variations, error rates, confusion matrix, Brier scores, etc. are also returned for the test data.