Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX
, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
The computational complexity of the implemented algorithm for Kendall's correlation is O(n log(n)), which is faster than the base R implementation with a computational complexity of O(n^2). For small vectors (i.e., less than 100 observations), the time difference is negligible. However, for larger vectors, the speed difference can be substantial and the numerical difference is minimal. The references are Knight (1966) <doi:10.2307/2282833>, Abrevaya (1999) <doi:10.1016/S0165-1765(98)00255-9>, Christensen (2005) <doi:10.1007/BF02736122> and Emara (2024) <https://learningcpp.org/>. This implementation is described in Vargas Sepulveda (2024) <doi:10.48550/arXiv.2408.09618>
.
Stock, Options and Futures Trading Strategies for Traders and Investors with Bearish Outlook. The indicators, strategies, calculations, functions and all other discussions are for academic, research, and educational purposes only and should not be construed as investment advice and come with absolutely no Liability. Guy Cohen (â The Bible of Options Strategies (2nd ed.)â , 2015, ISBN: 9780133964028). Juan A. Serur, Juan A. Serur (â 151 Trading Strategiesâ , 2018, ISBN: 9783030027919). Chartered Financial Analyst Institute ("Chartered Financial Analyst Program Curriculum 2020 Level I Volumes 1-6. (Vol. 5, pp. 385-453)", 2019, ISBN: 9781119593577). John C. Hull (â Options, Futures, and Other Derivatives (11th ed.)â , 2022, ISBN: 9780136939979).
This package provides a function called COTUCKER3()
(Co-Inertia Analysis + Tucker3 method) which performs a Co-Tucker3 analysis of two sequences of matrices, as well as other functions called PCA()
(Principal Component Analysis) and BGA()
(Between-Groups Analysis), which perform analysis of one matrix, COIA()
(Co-Inertia Analysis), which performs analysis of two matrices, PTA()
(Partial Triadic Analysis), STATIS()
, STATISDUAL()
and TUCKER3()
, which perform analysis of a sequence of matrices, and BGCOIA()
(Between-Groups Co-Inertia Analysis), STATICO()
(STATIS method + Co-Inertia Analysis), COSTATIS()
(Co-Inertia Analysis + STATIS method), which also perform analysis of two sequences of matrices.
Identification of equilibrium locations in location games (Hotelling (1929) <doi:10.2307/2224214>). In these games, two competing actors place customer-serving units in two locations simultaneously. Customers make the decision to visit the location that is closest to them. The functions in this package include Prim algorithm (Prim (1957) <doi:10.1002/j.1538-7305.1957.tb01515.x>) to find the minimum spanning tree connecting all network vertices, an implementation of Dijkstra algorithm (Dijkstra (1959) <doi:10.1007/BF01386390>) to find the shortest distance and path between any two vertices, a self-developed algorithm using elimination of purely dominated strategies to find the equilibrium, and several plotting functions.
Detect Differential Clustering of Genomic Sites such as gene therapy integrations. The package provides some functions for exploring genomic insertion sites originating from two different sources. Possibly, the two sources are two different gene therapy vectors. Vectors are preferred that target sensitive regions less frequently, motivating the search for localized clusters of insertions and comparison of the clusters formed by integration of different vectors. Scan statistics allow the discovery of spatial differences in clustering and calculation of False Discovery Rates (FDRs) providing statistical methods for comparing retroviral vectors. A scan statistic for comparing two vectors using multiple window widths to detect clustering differentials and compute FDRs is implemented here.
This package provides a non-parametric test founded upon the principles of the Kolmogorov-Smirnov (KS) test, referred to as the KS Predictive Accuracy (KSPA) test. The KSPA test is able to serve two distinct purposes. Initially, the test seeks to determine whether there exists a statistically significant difference between the distribution of forecast errors, and secondly it exploits the principles of stochastic dominance to determine whether the forecasts with the lower error also reports a stochastically smaller error than forecasts from a competing model, and thereby enables distinguishing between the predictive accuracy of forecasts. KSPA test has been described in : Hassani and Silva (2015) <doi:10.3390/econometrics3030590>.
There are various functions for managing and cleaning data before the application of different approaches. This includes identifying and erasing sudden jumps in dendrometer data not related to environmental change, identifying the time gaps of recordings, and changing the temporal resolution of data to different frequencies. Furthermore, the package calculates daily statistics of dendrometer data, including the daily amplitude of tree growth. Various approaches can be applied to separate radial growth from daily cyclic shrinkage and expansion due to uptake and loss of stem water. In addition, it identifies periods of consecutive days with user-defined climatic conditions in daily meteorological data, then check what trees are doing during that period.
Fit Bayesian time series models using Stan for full Bayesian inference. A wide range of distributions and models are supported, allowing users to fit Seasonal ARIMA, ARIMAX, Dynamic Harmonic Regression, GARCH, t-student innovation GARCH models, asymmetric GARCH, Random Walks, stochastic volatility models for univariate time series. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with typical visualization methods, information criteria such as loglik, AIC, BIC WAIC, Bayes factor and leave-one-out cross-validation methods. References: Hyndman (2017) <doi:10.18637/jss.v027.i03>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
This package provides a workflow based on scTenifoldNet
to perform in-silico knockout experiments using single-cell RNA sequencing (scRNA-seq
) data from wild-type (WT) control samples as input. First, the package constructs a single-cell gene regulatory network (scGRN
) and knocks out a target gene from the adjacency matrix of the WT scGRN
by setting the geneâ s outdegree edges to zero. Then, it compares the knocked out scGRN
with the WT scGRN
to identify differentially regulated genes, called virtual-knockout perturbed genes, which are used to assess the impact of the gene knockout and reveal the geneâ s function in the analyzed cells.
Allows the user to apply the Bayes Linear approach to finite population with the Simple Random Sampling - BLE_SRS()
- and the Stratified Simple Random Sampling design - BLE_SSRS()
- (both without replacement), to the Ratio estimator (using auxiliary information) - BLE_Ratio()
- and to categorical data - BLE_Categorical()
. The Bayes linear estimation approach is applied to a general linear regression model for finite population prediction in BLE_Reg()
and it is also possible to achieve the design based estimators using vague prior distributions. Based on Gonçalves, K.C.M, Moura, F.A.S and Migon, H.S.(2014) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X201400111886>.
These functions were developed to support statistical analysis on functional covariance operators. The package contains functions to: - compute 2-Wasserstein distances between Gaussian Processes as in Masarotto, Panaretos & Zemel (2019) <doi:10.1007/s13171-018-0130-1>; - compute the Wasserstein barycenter (Frechet mean) as in Masarotto, Panaretos & Zemel (2019) <doi:10.1007/s13171-018-0130-1>; - perform analysis of variance testing procedures for functional covariances and tangent space principal component analysis of covariance operators as in Masarotto, Panaretos & Zemel (2022) <arXiv:2212.04797>
. - perform a soft-clustering based on the Wasserstein distance where functional data are classified based on their covariance structure as in Masarotto & Masarotto (2023) <doi:10.1111/sjos.12692>.
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, A., Langford, J., & Zadrozny, B., 2008, <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B., 2005, <https://dl.acm.org/citation.cfm?id=1102358>) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don't accept observation weights.
The geographical complexity of individual variables can be characterized by the differences in local attribute variables, while the common geographical complexity of multiple variables can be represented by fluctuations in the similarity of vectors composed of multiple variables. In spatial regression tasks, the goodness of fit can be improved by incorporating a geographical complexity representation vector during modeling, using a geographical complexity-weighted spatial weight matrix, or employing local geographical complexity kernel density. Similarly, in spatial sampling tasks, samples can be selected more effectively by using a method that weights based on geographical complexity. By optimizing performance in spatial regression and spatial sampling tasks, the spatial bias of the model can be effectively reduced.
This package provides functions for estimating uncertainty in the number of fatalities in the Uppsala Conflict Data Program (UCDP) data. The package implements a parametric reported-value Gumbel mixture distribution that accounts for the uncertainty in the number of fatalities in the UCDP data. The model is based on information from a survey on UCDP coders and how they view the uncertainty of the number of fatalities from UCDP events. The package provides functions for making random draws of fatalities from the mixture distribution, as well as to estimate percentiles, quantiles, means, and other statistics of the distribution. Full details on the survey and estimation procedure can be found in Vesco et al (2024).
Utilities designed to make the analysis of field trials easier and more accessible for everyone working in plant breeding. It provides a simple and intuitive interface for conducting single and multi-environmental trial analysis, with minimal coding required. Whether you're a beginner or an experienced user, agriutilities will help you quickly and easily carry out complex analyses with confidence. With built-in functions for fitting Linear Mixed Models, agriutilities is the ideal choice for anyone who wants to save time and focus on interpreting their results. Some of the functions require the R package asreml for the ASReml software, this can be obtained upon purchase from VSN international <https://vsni.co.uk/software/asreml-r/>.
Detection of a statistically significant trend in the data provided by the user. This is based on the a signed test based on the binomial distribution. The package returns a trend test value, T, and also a p-value. A T value close to 1 indicates a rising trend, whereas a T value close to -1 indicates a decreasing trend. A T value close to 0 indicates no trend. There is also a command to visualize the trend. A test data set called gtsa_data is also available, which has global mean temperatures for January, April, July, and October for the years 1851 to 2022. Reference: Walpole, Myers, Myers, Ye. (2007, ISBN: 0-13-187711-9).
The notion of power index has been widely used in literature to evaluate the influence of individual players (e.g., voters, political parties, nations, stockholders, etc.) involved in a collective decision situation like an electoral system, a parliament, a council, a management board, etc., where players may form coalitions. Traditionally this ranking is determined through numerical evaluation. More often than not however only ordinal data between coalitions is known. The package socialranking offers a set of solutions to rank players based on a transitive ranking between coalitions, including through CP-Majority, ordinal Banzhaf or lexicographic excellence solution summarized by Tahar Allouche, Bruno Escoffier, Stefano Moretti and Meltem à ztürk (2020, <doi:10.24963/ijcai.2020/3>).
An implementation of a method for building simultaneous confidence intervals for the probabilities of a multinomial distribution given a set of observations, proposed by Sison and Glaz in their paper: Sison, C.P and J. Glaz. Simultaneous confidence intervals and sample size determination for multinomial proportions. Journal of the American Statistical Association, 90:366-369 (1995). The method is an R translation of the SAS code implemented by May and Johnson in their paper: May, W.L. and W.D. Johnson. Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells. Journal of Statistical Software 5(6) (2000). Paper and code available at <DOI:10.18637/jss.v005.i06>.
Designs guide sequences for CRISPR/Cas9 genome editing and provides information on sequence features pertinent to guide efficiency. Sequence features include annotated off-target predictions in a user-selected genome and a predicted efficiency score based on the model described in Doench et al. (2016) <doi:10.1038/nbt.3437>. Users are able to import additional genomes and genome annotation files to use when searching and annotating off-target hits. All guide sequences and off-target data can be generated through the R console with sgRNA_Design()
or through crispRdesignR's
user interface with crispRdesignRUI()
. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and the associated protein Cas9 refer to a technique used in genome editing.
Probabilistic distance clustering (PD-clustering) is an iterative, distribution-free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership under the constraint that the product of the probability and the distance of each point to any cluster center is a constant. PD-clustering is a flexible method that can be used with elliptical clusters, outliers, or noisy data. PDQ is an extension of the algorithm for clusters of different sizes. GPDC and TPDC use a dissimilarity measure based on densities. Factor PD-clustering (FPDC) is a factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It works on high-dimensional data sets.
Michel Rodange was a Luxembourguish writer and poet who lived in the 19th century. His most notable work is Rodange (1872, ISBN:1166177424), ("Renert oder de Fuuà am Frack an a Ma'nsgrëà t"), but he also wrote many more works, including Rodange, Tockert (1928) <https://www.autorenlexikon.lu/page/document/361/3614/1/FRE/index.html> ("D'Léierchen - Dem Léiweckerche säi Lidd") and Rodange, Welter (1929) <https://www.autorenlexikon.lu/page/document/361/3615/1/FRE/index.html> ("Dem Grow Sigfrid seng Goldkuommer"). This package contains three datasets, each made from the plain text versions of his works available on <https://data.public.lu/fr/datasets/the-works-in-luxembourguish-of-michel-rodange/>.
Extends the base classes and methods of EnsembleBase
package for Principal-Components-Regression-based (PCR) integration of base learners. Default implementation uses cross-validation error to choose the optimal number of PC components for the final predictor. The package takes advantage of the file method provided in EnsembleBase
package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase
package as well as this package.
Implementation of gene-level rare variant association tests targeting allelic series: genes where increasingly deleterious mutations have increasingly large phenotypic effects. The COding-variant Allelic Series Test (COAST) operates on the benign missense variants (BMVs), deleterious missense variants (DMVs), and protein truncating variants (PTVs) within a gene. COAST uses a set of adjustable weights that tailor the test towards rejecting the null hypothesis for genes where the average magnitude of effect increases monotonically from BMVs to DMVs to PTVs. See McCaw
ZR, Oâ Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2023) "An allelic series rare variant association test for candidate gene discovery" <doi:10.1016/j.ajhg.2023.07.001>.