This package provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. GIF, HTML pages, PDF, and videos. PDF animations can be inserted into Sweave / knitr
easily.
This package provides a set of functions for organising and analysing datasets from experiments run using Eyelink eye-trackers. Organising functions help to clean and prepare eye-tracking datasets for analysis, and mark up key events such as display changes and responses made by participants. Analysing functions help to create means for a wide range of standard measures (such as mean fixation durations'), which can then be fed into the appropriate statistical analyses and graphing packages as necessary.
An implementation of the Fizz Buzz algorithm, as defined e.g. in <https://en.wikipedia.org/wiki/Fizz_buzz>. It provides the standard algorithm with 3 replaced by Fizz and 5 replaced by Buzz, with the option of specifying start and end numbers, step size and the numbers being replaced by fizz and buzz, respectively. This package gives interviewers the optional answer of "I use fizzbuzzR::fizzbuzz()
" when interviewing rather than having to write an algorithm themselves.
An implementation of the fair data adaptation with quantile preservation described in Plecko & Meinshausen (JMLR 2020, 21(242), 1-44). The adaptation procedure uses the specified causal graph to pre-process the given training and testing data in such a way to remove the bias caused by the protected attribute. The procedure uses tree ensembles for quantile regression. Instructions for using the methods are further elaborated in the corresponding JSS manuscript, see <doi:10.18637/jss.v110.i04>.
Tool for import and process data from Lattes curriculum platform (<http://lattes.cnpq.br/>). The Brazilian government keeps an extensive base of curricula for academics from all over the country, with over 5 million registrations. The academic life of the Brazilian researcher, or related to Brazilian universities, is documented in Lattes'. Some information that can be obtained: professional formation, research area, publications, academics advisories, projects, etc. getLattes
package allows work with Lattes data exported to XML format.
Identifies chromatin interaction modules by constructing a Hi-C contact network based on statistically significant interactions, followed by network clustering. The method enables comparison of module connectivity across two Hi-C datasets and is capable of detecting cell-type-specific regulatory modules. By integrating network analysis with chromatin conformation data, this approach provides insights into the spatial organization of the genome and its functional implications in gene regulation. Author: Sora Yoon (2025) <https://github.com/ysora/HiCociety>
.
This package provides tools for the estimation of Heckman selection models with robust variance-covariance matrices. It includes functions for computing the bread and meat matrices, as well as clustered standard errors for generalized Heckman models, see Fernando de Souza Bastos and Wagner Barreto-Souza and Marc G. Genton (2022, ISSN: <https://www.jstor.org/stable/27164235>). The package also offers cluster-robust inference with sandwich estimators, and tools for handling issues related to eigenvalues in covariance matrices.
This package provides tools for parsing NOAA Integrated Surface Data ('ISD') files, described at <https://www.ncdc.noaa.gov/isd>. Data includes for example, wind speed and direction, temperature, cloud data, sea level pressure, and more. Includes data from approximately 35,000 stations worldwide, though best coverage is in North America/Europe/Australia. Data is stored as variable length ASCII character strings, with most fields optional. Included are tools for parsing entire files, or individual lines of data.
This package provides a hybrid of the K-means algorithm and a Majorization-Minimization method to introduce a robust clustering. The reference paper is: Julien Mairal, (2015) <doi:10.1137/140957639>. The two most important functions in package MajKMeans
are cluster_km()
and cluster_MajKm()
. cluster_km()
clusters data without Majorization-Minimization and cluster_MajKm()
clusters data with Majorization-Minimization method. Both of these functions calculate the sum of squares (SS) of clustering.
Introducing a novel and updated database showcasing Peru's endemic plants. This meticulously compiled and revised botanical collection encompasses a remarkable assemblage of over 7,898 distinct species. The data for this resource was sourced from the work of Govaerts, R., Nic Lughadha, E., Black, N. et al., titled The World Checklist of Vascular Plants: A continuously updated resource for exploring global plant diversity', published in Sci Data 8, 215 (2021) <doi:10.1038/s41597-021-00997-6>.
Generates interactive plots for analysing and visualising three-class high dimensional data. It is particularly suited to visualising differences in continuous attributes such as gene/protein/biomarker expression levels between three groups. Differential gene/biomarker expression analysis between two classes is typically shown as a volcano plot. However, with three groups this type of visualisation is particularly difficult to interpret. This package generates 3D volcano plots and 3-way polar plots for easier interpretation of three-class data.
This package provides functions for estimating tolerance limits (intervals) for various univariate distributions (binomial, Cauchy, discrete Pareto, exponential, two-parameter exponential, extreme value, hypergeometric, Laplace, logistic, negative binomial, negative hypergeometric, normal, Pareto, Poisson-Lindley, Poisson, uniform, and Zipf-Mandelbrot), Bayesian normal tolerance limits, multivariate normal tolerance regions, nonparametric tolerance intervals, tolerance bands for regression settings (linear regression, nonlinear regression, nonparametric regression, and multivariate regression), and analysis of variance tolerance intervals. Visualizations are also available for most of these settings.
This package provides a collection of functions to compute the Rao-Stirling diversity index (Porter and Rafols, 2009) <DOI:10.1007/s11192-008-2197-2> and its extension to acknowledge missing data (i.e., uncategorized references) by calculating its interval of uncertainty using mathematical optimization as proposed in Calatrava et al. (2016) <DOI:10.1007/s11192-016-1842-4>. The Rao-Stirling diversity index is a well-established bibliometric indicator to measure the interdisciplinarity of scientific publications. Apart from the obligatory dataset of publications with their respective references and a taxonomy of disciplines that categorizes references as well as a measure of similarity between the disciplines, the Rao-Stirling diversity index requires a complete categorization of all references of a publication into disciplines. Thus, it fails for a incomplete categorization; in this case, the robust extension has to be used, which encodes the uncertainty caused by missing bibliographic data as an uncertainty interval. Classification / ACM - 2012: Information systems ~ Similarity measures, Theory of computation ~ Quadratic programming, Applied computing ~ Digital libraries and archives.
Extends ACER ConQuest
through a family of functions designed to improve graphical outputs and help with advanced analysis (e.g., differential item functioning). Allows R users to call ACER ConQuest
from within R and read ACER ConQuest
System Files (generated by the command `put` <https://conquestmanual.acer.org/s4-00.html#put>). Requires ACER ConQuest
version 5.40 or later. A demonstration version can be downloaded from <https://shop.acer.org/acer-conquest-5.html>.
This package provides R bindings to the dockview JavaScript
library <https://dockview.dev/>. Create fully customizable grid layouts (docks) in seconds to include in interactive R reports with R Markdown or Quarto or in shiny apps <https://shiny.posit.co/>. In shiny mode, modify docks by dynamically adding, removing or moving panels or groups of panels from the server function. Choose among 8 stunning themes (dark and light), serialise the state of a dock to restore it later.
Using variational techniques we address some epidemiological problems as the incidence curve decomposition by inverting the renewal equation as described in Alvarez et al. (2021) <doi:10.1073/pnas.2105112118> and Alvarez et al. (2022) <doi:10.3390/biology11040540> or the estimation of the functional relationship between epidemiological indicators. We also propose a learning method for the short time forecast of the trend incidence curve as described in Morel et al. (2022) <doi:10.1101/2022.11.05.22281904>.
This package contains four main functions (i.e., four pieces of furniture): table1()
which produces a well-formatted table of descriptive statistics common as Table 1 in research articles, tableC()
which produces a well-formatted table of correlations, tableF()
which provides frequency counts, and washer()
which is helpful in cleaning up the data. These furniture-themed functions are designed to simplify common tasks in quantitative analysis. Other data summary and cleaning tools are also available.
When you prepare a presentation or a report, you often need to manage a large number of ggplot figures. You need to change the figure size, modify the title, label, themes, etc. It is inconvenient to go back to the original code to make these changes. This package provides a simple way to manage ggplot figures. You can easily add the figure to the database and update them later using CLI (command line interface) or GUI (graphical user interface).
This package creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean()
, median()
, even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.
Offers an easy and automated way to scale up individual-level space use analysis to that of groups. Contains a function from the move package to calculate a dynamic Brownian bridge movement model from movement data for individual animals, as well as functions to visualize and quantify space use for individuals aggregated in groups. Originally written with passive acoustic telemetry in mind, this package also provides functionality to account for unbalanced acoustic receiver array designs, and satellite tag data.
Bayesian inference analysis for bivariate meta-analysis of diagnostic test studies using integrated nested Laplace approximation with INLA. A purpose built graphic user interface is available. The installation of R package INLA is compulsory for successful usage. The INLA package can be obtained from <https://www.r-inla.org>. We recommend the testing version, which can be downloaded by running: install.packages("INLA", repos=c(getOption("repos
"), INLA="https://inla.r-inla-download.org/R/testing"), dep=TRUE).
This package provides tools for obtaining, processing, and visualizing spectral reflectance data for the user-defined land or water surface classes for visual exploring in which wavelength the classes differ. Input should be a shapefile with polygons of surface classes (it might be different habitat types, crops, vegetation, etc.). The Sentinel-2 L2A satellite mission optical bands pixel data are obtained through the Google Earth Engine service (<https://earthengine.google.com/>) and used as a source of spectral data.
The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.
This package provides a set of functions to build a scoring model from beginning to end, leading the user to follow an efficient and organized development process, reducing significantly the time spent on data exploration, variable selection, feature engineering, binning and model selection among other recurrent tasks. The package also incorporates monotonic and customized binning, scaling capabilities that transforms logistic coefficients into points for a better business understanding and calculates and visualizes classic performance metrics of a classification model.