This package contains a function called gds() which accepts three input parameters like lower limits, upper limits and the frequencies of the corresponding classes. The gds() function calculate and return the values of mean ('gmean'), median ('gmedian'), mode ('gmode'), variance ('gvar'), standard deviation ('gstdev'), coefficient of variance ('gcv'), quartiles ('gq1', gq2', gq3'), inter-quartile range ('gIQR'), skewness ('g1'), and kurtosis ('g2') which facilitate effective data analysis. For skewness and kurtosis calculations we use moments.
The need for anonymization of individual survey responses often leads to many suppressed grid cells in a regular grid. Here we provide functionality for creating multi-resolution gridded data, respecting the confidentiality rules, such as a minimum number of units and dominance by one or more units for each grid cell. The functions also include the possibility for contextual suppression of data. For more details see Skoien et al. (2025) <doi:10.48550/arXiv.2410.17601>.
This package provides efficient implementation of the Narrowest-Over-Threshold methodology for detecting an unknown number of change-points occurring at unknown locations in one-dimensional data following deterministic signal + noise model. Currently implemented scenarios are: piecewise-constant signal, piecewise-constant signal with a heavy-tailed noise, piecewise-linear signal, piecewise-quadratic signal, piecewise-constant signal and with piecewise-constant variance of the noise. For details, see Baranowski, Chen and Fryzlewicz (2019) <doi:10.1111/rssb.12322>.
Provide data generation and estimation tools for the truncated positive normal (tpn) model discussed in Gomez, Olmos, Varela and Bolfarine (2018) <doi:10.1007/s11766-018-3354-x>, the slash tpn distribution discussed in Gomez, Gallardo and Santoro (2021) <doi:10.3390/sym13112164>, the bimodal tpn distribution discussed in Gomez et al. (2022) <doi:10.3390/sym14040665>, the flexible tpn model <doi:10.3390/math11214431> and the unit tpn distribution <doi:10.1016/j.chemolab.2025.105322>.
Independent hypothesis weighting (IHW) is a multiple testing procedure that increases power compared to the method of Benjamini and Hochberg by assigning data-driven weights to each hypothesis. The input to IHW is a two-column table of p-values and covariates. The covariate can be any continuous-valued or categorical variable that is thought to be informative on the statistical properties of each hypothesis test, while it is independent of the p-value under the null hypothesis.
This package provides a suite of functions to test for Functional Measurement Invariance (FMI) between two groups. Implements hierarchical permutation tests for configural, metric, and scalar invariance, adapting concepts from Multi-Group Confirmatory Factor Analysis (MGCFA) to functional data. Methods are based on concepts from: Meredith, W. (1993) <doi:10.1007/BF02294825>,5 Yao, F., Müller, H. G., & Wang, J. L. (2005) <doi:10.1198/016214504000001745>, and Lee, K. Y., & Li, L. (2022) <doi:10.1111/rssb.12471>.
The ggplot2 package is the state-of-the-art toolbox for creating and formatting graphs. However, it is easy to forget how certain formatting commands are named and sometimes users find themselves asking: How do you rotate the x-axis labels again? Or how do you hide the legend...? This package allows users to issue natural language commands related to theme-related styling of plots (colors, font size and such), which then are translated into valid ggplot2 commands.
This package provides functions for the creation, evaluation and test of decision models based in Multi Attribute Utility Theory (MAUT). Can process and evaluate local risk aversion utilities for a set of indexes, compute utilities and weights for the whole decision tree defining the decision model and simulate weights employing Dirichlet distributions under addition constraints in weights. Also includes other rating analysis methods as for example the Colley, Offensive - Defensive ratings and the ranking aggregation with Borda count.
This package provides simple crosstab output with optional statistics (e.g., Goodman-Kruskal Gamma, Somers d, and Kendall's tau-b) as well as two-way and one-way tables. The package is used within the statistics component of the Masters of Science (MSc) in Social Science of the Internet at the Oxford Internet Institute (OII), University of Oxford, but the functions should be useful for general data analysis and especially for analysis of categorical and ordinal data.
Permutation (randomisation) test for single-case phase design data with two phases (e.g., pre- and post-treatment). Correction for dependency of observations is done through stepwise resampling the time series while varying the distance between observations. The required distance 0,1,2,3.. is determined based on repeated dependency testing while stepwise increasing the distance. In preparation: Vroegindeweij et al. "A Permutation distancing test for single-case observational AB phase design data: A Monte Carlo simulation study".
Implementation of the wavelet-based spatial verification method of Buschow and Friederichs "SAD: Verifying the Scale, Anisotropy and Direction of precipitation forecasts" (2020, submitted to QJRMS). Forecasts and Observations are transformed by a decimated or redundant dual-tree complex wavelet transform to analyze the spatial scale, degree of anisotropy and preferred direction in each field. These structural attributes are compared by a series of scores. An experimental algorithm for the correction of these errors is included as well.
Calculates Windowed Cross Correlation for pairs of time series. Provides support for surrogate analysis for nonparametric test of significance. Calculates aggregate statistics over a range of parameter values. Plots the results as Windowed Cross Correlation plots and heat maps. The method is described in "Boker, S. M., Rotondo, J. L., Xu, M., & King, K. (2002). Windowed cross-correlation and peak picking for the analysis of variability in the association between behavioral time series. Psychological Methods, 7(3), 338.".
This package provides tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods.
Nonparametric detection of nonuniformity and dependence with Binary Expansion Testing (BET). See Kai Zhang (2019) BET on Independence, Journal of the American Statistical Association, 114:528, 1620-1637, <DOI:10.1080/01621459.2018.1537921>, Kai Zhang, Wan Zhang, Zhigen Zhao, Wen Zhou. (2023). BEAUTY Powered BEAST, <doi:10.48550/arXiv.2103.00674> and Wan Zhang, Zhigen Zhao, Michael Baiocchi, Yao Li, Kai Zhang. (2023) SorBET: A Fast and Powerful Algorithm to Test Dependence of Variables, Techinical report.
An ensemble method for the statistical detection of a rare class in two-class classification problems. The method uses an ensemble of classifiers where the constituent models of the ensemble use disjoint subsets (phalanxes) of explanatory variables. We provide an implementation of the phalanx-formation algorithm. Please see Tomal et al. (2015) <doi:10.1214/14-AOAS778>, Tomal et al. (2016) <doi:10.1021/acs.jcim.5b00663>, and Tomal et al. (2019) <arXiv:1706.06971> for more details.
This package provides a selection of 3 different inference rules (including additionally the clamped types of the referred inference rules) and 4 threshold functions in order to obtain the inference of the FCM (Fuzzy Cognitive Map). Moreover, the fcm package returns a data frame of the concepts values of each state after the inference procedure. Fuzzy cognitive maps were introduced by Kosko (1986) <doi:10.1002/int.4550010405> providing ideal causal cognition tools for modeling and simulating dynamic systems.
This package provides functions to download and tidy statistical data published by the Office for National Statistics <https://www.ons.gov.uk>. Covers GDP, inflation (CPI, CPIH, RPI), unemployment, employment, wages, trade, retail sales, house prices, productivity, population, and public sector finances. Most series are fetched from the ONS website using its CSV time series endpoint. House price data is sourced from HM Land Registry <https://www.gov.uk/government/organisations/land-registry>. Data is cached locally between sessions.
After develop a ODK <https://opendatakit.org/> frame, we can link the frame to Google Sheets <https://www.google.com/sheets/about/> and collect data through Android <https://www.android.com/>. This data uploaded to a Google sheets'. odk2spss() function help to convert the odk frame into SPSS <https://www.ibm.com/analytics/us/en/technology/spss/> frame. Also able to add downloaded Google sheets data or read data from Google sheets by using ODK frame submission_url'.
This package produces quality scores for each of the US companies from the Russell 3000, following the approach described in "Quality Minus Junk" (Asness, Frazzini, & Pedersen, 2013) <http://www.aqr.com/library/working-papers/quality-minus-junk>. The package includes datasets for users who wish to view the most recently uploaded quality scores. It also provides tools to automatically gather relevant financials and stock price information, allowing users to update their data and customize their universe for further analysis.
We provide a suite of tools for estimating the sample complexity of a chosen model through theoretical bounds and simulation. The package incorporates methods for estimating the Vapnik-Chervonenkis dimension (VCD) of a chosen algorithm, which can be used to estimate its sample complexity. Alternatively, we provide simulation methods to estimate sample complexity directly. For more details, see Carter, P & Choi, D (2024). "Learning from Noise: Applying Sample Complexity for Political Science Research" <doi:10.31219/osf.io/evrcj>.
This package provides conditional maximum likelihood (CML) item parameter estimation of both sequential and cumulative deterministic multistage designs (Zwitser & Maris, 2015, <doi:10.1007/s11336-013-9369-6>) and probabilistic sequential and cumulative multistage designs (Steinfeld & Robitzsch, 2024, <doi:10.1007/s41237-024-00228-3>). Supports CML item parameter estimation of conventional linear designs and additional functions for the likelihood ratio test (Andersen, 1973, <doi:10.1007/BF02291180>) as well as functions for simulating various types of multistage designs.
Bindings to Uno (Unifying Nonlinear Optimization), a C++ solver for smooth nonlinearly constrained optimization. Uno unifies Lagrange-Newton methods, including sequential quadratic programming and interior-point methods, by decomposing them into interacting building blocks (constraint-relaxation, inequality-handling, Hessian, and globalization strategies) that can be freely combined, either through options or through presets that reproduce established solvers such as filterSQP and IPOPT'. The framework is described in Vanaret and Leyffer (2024) <doi:10.48550/arXiv.2406.13454>.
Updated versions of the 1970s "US State Facts and Figures" objects from the datasets package included with R. The new data is compiled from a number of sources, primarily from the United States Census Bureau or the relevant federal agency. Modern tidy tibbles provide richer state-level data including identifiers, geography, capitals, demographics, and socioeconomic statistics. Convenience vectors parallel the base datasets state objects but extend coverage to all 51 jurisdictions: the 50 states and the District of Columbia.
Extract and process bird sightings records from eBird (<http://ebird.org>), an online tool for recording bird observations. Public access to the full eBird database is via the eBird Basic Dataset (EBD; see <http://ebird.org/ebird/data/download> for access), a downloadable text file. This package is an interface to AWK for extracting data from the EBD based on taxonomic, spatial, or temporal filters, to produce a manageable file size that can be imported into R.