The geographical complexity of individual variables can be characterized by the differences in local attribute variables, while the common geographical complexity of multiple variables can be represented by fluctuations in the similarity of vectors composed of multiple variables. In spatial regression tasks, the goodness of fit can be improved by incorporating a geographical complexity representation vector during modeling, using a geographical complexity-weighted spatial weight matrix, or employing local geographical complexity kernel density. Similarly, in spatial sampling tasks, samples can be selected more effectively by using a method that weights based on geographical complexity. By optimizing performance in spatial regression and spatial sampling tasks, the spatial bias of the model can be effectively reduced.
This package provides functions for estimating uncertainty in the number of fatalities in the Uppsala Conflict Data Program (UCDP) data. The package implements a parametric reported-value Gumbel mixture distribution that accounts for the uncertainty in the number of fatalities in the UCDP data. The model is based on information from a survey on UCDP coders and how they view the uncertainty of the number of fatalities from UCDP events. The package provides functions for making random draws of fatalities from the mixture distribution, as well as to estimate percentiles, quantiles, means, and other statistics of the distribution. Full details on the survey and estimation procedure can be found in Vesco et al (2024).
Utilities designed to make the analysis of field trials easier and more accessible for everyone working in plant breeding. It provides a simple and intuitive interface for conducting single and multi-environmental trial analysis, with minimal coding required. Whether you're a beginner or an experienced user, agriutilities will help you quickly and easily carry out complex analyses with confidence. With built-in functions for fitting Linear Mixed Models, agriutilities is the ideal choice for anyone who wants to save time and focus on interpreting their results. Some of the functions require the R package asreml for the ASReml software, this can be obtained upon purchase from VSN international <https://vsni.co.uk/software/asreml-r/>.
Detection of a statistically significant trend in the data provided by the user. This is based on the a signed test based on the binomial distribution. The package returns a trend test value, T, and also a p-value. A T value close to 1 indicates a rising trend, whereas a T value close to -1 indicates a decreasing trend. A T value close to 0 indicates no trend. There is also a command to visualize the trend. A test data set called gtsa_data is also available, which has global mean temperatures for January, April, July, and October for the years 1851 to 2022. Reference: Walpole, Myers, Myers, Ye. (2007, ISBN: 0-13-187711-9).
The notion of power index has been widely used in literature to evaluate the influence of individual players (e.g., voters, political parties, nations, stockholders, etc.) involved in a collective decision situation like an electoral system, a parliament, a council, a management board, etc., where players may form coalitions. Traditionally this ranking is determined through numerical evaluation. More often than not however only ordinal data between coalitions is known. The package socialranking offers a set of solutions to rank players based on a transitive ranking between coalitions, including through CP-Majority, ordinal Banzhaf or lexicographic excellence solution summarized by Tahar Allouche, Bruno Escoffier, Stefano Moretti and Meltem à ztürk (2020, <doi:10.24963/ijcai.2020/3>).
An implementation of a method for building simultaneous confidence intervals for the probabilities of a multinomial distribution given a set of observations, proposed by Sison and Glaz in their paper: Sison, C.P and J. Glaz. Simultaneous confidence intervals and sample size determination for multinomial proportions. Journal of the American Statistical Association, 90:366-369 (1995). The method is an R translation of the SAS code implemented by May and Johnson in their paper: May, W.L. and W.D. Johnson. Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells. Journal of Statistical Software 5(6) (2000). Paper and code available at <DOI:10.18637/jss.v005.i06>.
Designs guide sequences for CRISPR/Cas9 genome editing and provides information on sequence features pertinent to guide efficiency. Sequence features include annotated off-target predictions in a user-selected genome and a predicted efficiency score based on the model described in Doench et al. (2016) <doi:10.1038/nbt.3437>. Users are able to import additional genomes and genome annotation files to use when searching and annotating off-target hits. All guide sequences and off-target data can be generated through the R console with sgRNA_Design()
or through crispRdesignR's
user interface with crispRdesignRUI()
. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and the associated protein Cas9 refer to a technique used in genome editing.
Probabilistic distance clustering (PD-clustering) is an iterative, distribution-free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership under the constraint that the product of the probability and the distance of each point to any cluster center is a constant. PD-clustering is a flexible method that can be used with elliptical clusters, outliers, or noisy data. PDQ is an extension of the algorithm for clusters of different sizes. GPDC and TPDC use a dissimilarity measure based on densities. Factor PD-clustering (FPDC) is a factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It works on high-dimensional data sets.
Michel Rodange was a Luxembourguish writer and poet who lived in the 19th century. His most notable work is Rodange (1872, ISBN:1166177424), ("Renert oder de Fuuà am Frack an a Ma'nsgrëà t"), but he also wrote many more works, including Rodange, Tockert (1928) <https://www.autorenlexikon.lu/page/document/361/3614/1/FRE/index.html> ("D'Léierchen - Dem Léiweckerche säi Lidd") and Rodange, Welter (1929) <https://www.autorenlexikon.lu/page/document/361/3615/1/FRE/index.html> ("Dem Grow Sigfrid seng Goldkuommer"). This package contains three datasets, each made from the plain text versions of his works available on <https://data.public.lu/fr/datasets/the-works-in-luxembourguish-of-michel-rodange/>.
This package implements a method that aims to identify enhancers on large scale. The STARR-seq data consists of two sequencing datasets of the same targets in a specific genome. The input sequences show which regions where tested for enhancers. Significant enriched peaks i.e. a lot more sequences in one region than in the input where enhancers in the genomic DNA are, can be identified. So the approach pursued is to call peak every region in which there is a lot more (significant in a binomial model) STARR-seq signal than input signal and propose an enhancer at that very same position. Enhancers then are called weak or strong dependent of there degree of enrichment in comparison to input.
This package implements easy-to-use functions to generate 2-7 sets Venn plot in publication quality. ggVennDiagram
plot Venn using well-defined geometry dataset and ggplot2
. The shapes of 2-4 sets Venn use circles and ellipses, while the shapes of 4-7 sets Venn use irregular polygons (4 has both forms), which are developed and imported from another package venn
. We provide internal functions to integrate shape data with user provided sets data, and calculated the geometry of every regions/intersections of them, then separately plot Venn in three components: set edges, set labels, and regions. From version 1.0, it is possible to customize these components as you demand in ordinary ggplot2
grammar.
Implementation of gene-level rare variant association tests targeting allelic series: genes where increasingly deleterious mutations have increasingly large phenotypic effects. The COding-variant Allelic Series Test (COAST) operates on the benign missense variants (BMVs), deleterious missense variants (DMVs), and protein truncating variants (PTVs) within a gene. COAST uses a set of adjustable weights that tailor the test towards rejecting the null hypothesis for genes where the average magnitude of effect increases monotonically from BMVs to DMVs to PTVs. See McCaw
ZR, Oâ Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2023) "An allelic series rare variant association test for candidate gene discovery" <doi:10.1016/j.ajhg.2023.07.001>.
This package provides tools for classical parameter estimation of adsorption isotherm models, including both linear and nonlinear forms of the Freundlich, Langmuir, and Temkin isotherms. This package allows users to fit these models to experimental data, providing parameter estimates along with fit statistics such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Error metrics are computed to evaluate model performance, and the package produces model fit plots with bootstrapped 95% confidence intervals. Additionally, it generates residual plots for diagnostic assessment of the models. Researchers and engineers in material science, environmental engineering, and chemical engineering can rigorously analyze adsorption behavior in their systems using this straightforward, non-Bayesian approach. For more details, see Harding (1907) <doi:10.2307/2987516>.
Allows to compare the goodness of fit of Benford's and Blondeau Da Silva's digit distributions in a dataset. It is used to check whether the data distribution is consistent with theoretical distributions highlighted by Blondeau Da Silva or not (through the dat.distr()
function): this ideal theoretical distribution must be at least approximately followed by the data for the use of Blondeau Da Silva's model to be well-founded. It also enables to plot histograms of digit distributions, both observed in the dataset and given by the two theoretical approaches (with the digit.ditr()
function). Finally, it proposes to quantify the goodness of fit via Pearson's chi-squared test (with the chi2()
function).
Perform model selection using distribution and probability-based methods, including standardized AIC, BIC, and AICc. These standardized information criteria allow one to perform model selection in a way similar to the prevalent "Rule of 2" method, but formalize the method to rely on probability theory. A novel goodness-of-fit procedure for assessing linear regression models is also available. This test relies on theoretical properties of the estimated error variance for a normal linear regression model, and employs a bootstrap procedure to assess the null hypothesis that the fitted model shows no lack of fit. For more information, see Koeneman and Cavanaugh (2023) <arXiv:2309.10614>
. Functionality to perform all subsets linear or generalized linear regression is also available.
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the sentencepiece library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece
: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.
Estimates area and subarea level proportions using the Small Area Estimation (SAE) Twofold Subarea Model with a hierarchical Bayesian (HB) approach under Beta distribution. A number of simulated datasets generated for illustration purposes are also included. The rstan package is employed to estimate parameters via the Hamiltonian Monte Carlo and No U-Turn Sampler algorithm. The model-based estimators include the HB mean, the variation of the mean, and quantiles. For references, see Rao and Molina (2015) <doi:10.1002/9781118735855>, Torabi and Rao (2014) <doi:10.1016/j.jmva.2014.02.001>, Leyla Mohadjer et al.(2007) <http://www.asasrms.org/Proceedings/y2007/Files/JSM2007-000559.pdf>, and Erciulescu et al.(2019) <doi:10.1111/rssa.12390>.
Defines a major mode rjsx-mode
based on js2-mode
for editing JSX files. rjsx-mode
extends the parser in js2-mode
to support the full JSX syntax. This means you get all of the js2
features plus proper syntax checking and highlighting of JSX code blocks.
Some features that this mode adds to js2:
Highlighting JSX tag names and attributes (using the rjsx-tag and rjsx-attr faces)
Highlight undeclared JSX components
Parsing the spread operator ...otherProps
Parsing && and || in child expressions cond && <BigComponent/>
Parsing ternary expressions toggle ? <ToggleOn /> : <ToggleOff />
Additionally, since rjsx-mode extends the js2 AST, utilities using the parse tree gain access to the JSX structure.
Monte Carlo simulation allows testing different conditions given to the correct structural equation models. This package runs Monte Carlo simulations under different conditions (such as sample size or normality of data). Within the package data sets can be simulated and run based on the given model. First, continuous and normal data sets are generated based on the given model. Later Fleishman's power method (1978) <DOI:10.1007/BF02293811> is used to add non-normality if exists. When data generation is completed (or when generated data sets are given) model test can also be run. Please cite as "Orçan, F. (2021). MonteCarloSEM
: An R Package to Simulate Data for SEM. International Journal of Assessment Tools in Education, 8 (3), 704-713.".
This package provides a metapackage that brings together a curated collection of R packages containing domain-specific datasets. It includes time series data, educational metrics, crime records, medical datasets, and oncology research data. Designed to provide researchers, analysts, educators, and data scientists with centralized access to structured and well-documented datasets, this metapackage facilitates reproducible research, data exploration, and teaching applications across a wide range of domains. Included packages: - timeSeriesDataSets
': Time series data from economics, finance, energy, and healthcare. - educationR
': Datasets related to education, learning outcomes, and school metrics. - crimedatasets': Datasets on global and local crime and criminal behavior. - MedDataSets
': Datasets related to medicine, public health, treatments, and clinical trials. - OncoDataSets
': Datasets focused on cancer research, survival, genetics, and biomarkers.
Bipartite graph-based hierarchical clustering, developed for pharmacogenomic datasets and datasets sharing the same data structure. The goal is to construct a hierarchical clustering of groups of samples based on association patterns between two sets of variables. In the context of pharmacogenomic datasets, the samples are cell lines, and the two sets of variables are typically expression levels and drug sensitivity values. For this method, sparse canonical correlation analysis from Lee, W., Lee, D., Lee, Y. and Pawitan, Y. (2011) <doi:10.2202/1544-6115.1638> is first applied to extract association patterns for each group of samples. Then, a nuclear norm-based dissimilarity measure is used to construct a dissimilarity matrix between groups based on the extracted associations. Finally, hierarchical clustering is applied.
Lattice-based space-filling designs with fill or separation distance properties including interleaved lattice-based minimax distance designs proposed in Xu He (2017) <doi:10.1093/biomet/asx036>, interleaved lattice-based maximin distance designs proposed in Xu He (2018) <doi:10.1093/biomet/asy069>, interleaved lattice-based designs with low fill and high separation distance properties proposed in Xu He (2024) <doi:10.1137/23M156940X>, rotated sphere packing designs proposed in Xu He (2017) <doi:10.1080/01621459.2016.1222289>, sliced rotated sphere packing designs proposed in Xu He (2019) <doi:10.1080/00401706.2018.1458655>, and densest packing-based maximum projections designs proposed in Xu He (2021) <doi:10.1093/biomet/asaa057> and Xu He (2018) <doi:10.48550/arXiv.1709.02062>
.
R's sf package ships with self-contained GDAL executables, including a bare bones interface to several GDAL'-related utility programs collectively known as the GDAL utilities'. For each of those utilities, this package provides an R wrapper whose formal arguments closely mirror those of the GDAL command line interface. The utilities operate on data stored in files and typically write their output to other files. Therefore, to process data stored in any of R's more common spatial formats (i.e. those supported by the sf and terra packages), first write them to disk, then process them with the package's wrapper functions before reading the outputted results back into R. GDAL function arguments introduced in GDAL version 3.5.2 or earlier are supported.
We designed this package to provides several functions for area and subarea level of small area estimation under Twofold Subarea Level Model using hierarchical Bayesian (HB) method with Univariate Normal distribution for variables of interest. Some dataset simulated by a data generation are also provided. The rjags package is employed to obtain parameter estimates using Gibbs Sampling algorithm. Model-based estimators involves the HB estimators which include the mean, the variation of mean, and the quantile. For the reference, see Rao and Molina (2015) <doi:10.1002/9781118735855>, Torabi and Rao (2014) <doi:10.1016/j.jmva.2014.02.001>, Leyla Mohadjer et al.(2007) <http://www.asasrms.org/Proceedings/y2007/Files/JSM2007-000559.pdf>, and Erciulescu et al.(2019) <doi:10.1111/rssa.12390>.