The Pearson-ICA algorithm is a mutual information-based method for blind separation of statistically independent source signals. It has been shown that the minimization of mutual information leads to iterative use of score functions, i.e. derivatives of log densities. The Pearson system allows adaptive modeling of score functions. The flexibility of the Pearson system makes it possible to model a wide range of source distributions including asymmetric distributions. The algorithm is designed especially for problems with asymmetric sources but it works for symmetric sources as well.
Universal and robust algorithm for solving the total alkalinity-pH
equation presented in G. Munhoven (2013) <doi:10.5194/gmd-6-1367-2013> and G. Munhoven (2021) <doi:10.5194/gmd-2020-447>. The total alkalinity-pH
equation relates total alkalinity and pH
for a given set of acid-base concentrations in a given water sample, among which carbonic acid. This package is particularly useful in marine chemistry involving dissolved inorganic carbon. Original package in Fortran can be found at <doi:10.5281/zenodo.4328965>.
Visual contour and 2D point and contour plots for binary classification modeling under algorithms such as glm', rf', gbm', nnet and svm', presented over two dimensions generated by famd and mca methods. Package FactoMineR
for multivariate reduction functions and package MBA for interpolation functions are used. The package can be used to visualize the discriminant power of input variables and algorithmic modeling, explore outliers, compare algorithm behaviour, etc. It has been created initially for teaching purposes, but it has also many practical uses under the XAI paradigm.
An interface to Azure Data Explorer', also known as Kusto', a fast, distributed data exploration service from Microsoft: <https://azure.microsoft.com/en-us/products/data-explorer/>. Includes DBI and dplyr interfaces, with the latter modelled after the dbplyr package, whereby queries are translated from R into the native KQL query language and executed lazily. On the admin side, the package extends the object framework provided by AzureRMR
to support creation and deletion of databases, and management of database principals. Part of the AzureR
family of packages.
This package provides a simple interface to the Microsoft Graph API <https://learn.microsoft.com/en-us/graph/overview>. Graph is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the Azure Active Directory part, with a view to supporting interoperability of R and Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the AzureR
family of packages.
This package provides methods for probabilistic reconciliation of hierarchical forecasts of time series. The available methods include analytical Gaussian reconciliation (Corani et al., 2021) <doi:10.1007/978-3-030-67664-3_13>, MCMC reconciliation of count time series (Corani et al., 2024) <doi:10.1016/j.ijforecast.2023.04.003>, Bottom-Up Importance Sampling (Zambon et al., 2024) <doi:10.1007/s11222-023-10343-y>, methods for the reconciliation of mixed hierarchies (Mix-Cond and TD-cond) (Zambon et al., 2024) <https://proceedings.mlr.press/v244/zambon24a.html>.
Usually, it is difficult to plot choropleth maps for Bangladesh in R'. The bangladesh package provides ready-to-use shapefiles for different administrative regions of Bangladesh (e.g., Division, District, Upazila, and Union). This package helps users to draw thematic maps of administrative regions of Bangladesh easily as it comes with the sf objects for the boundaries. It also provides functions allowing users to efficiently get specific area maps and center coordinates for regions. Users can also search for a specific area and calculate the centroids of those areas.
Publication-ready regional gene locus plots similar to those produced by the web interface LocusZoom
<https://my.locuszoom.org>, but running locally in R. Genetic or genomic data with gene annotation tracks are plotted via R base graphics, ggplot2 or plotly', allowing flexibility and easy customisation including laying out multiple locus plots on the same page. It uses the LDlink API <https://ldlink.nih.gov/?tab=apiaccess> to query linkage disequilibrium data from the 1000 Genomes Project and can overlay this on plots <doi:10.1093/bioadv/vbaf006>.
This package provides access to coded election programmes from the Manifesto Corpus and to the Manifesto Project's Main Dataset and routines to analyse this data. The Manifesto Project <https://manifesto-project.wzb.eu> collects and analyses election programmes across time and space to measure the political preferences of parties. The Manifesto Corpus contains the collected and annotated election programmes in the Corpus format of the package tm to enable easy use of text processing and text mining functionality. Specific functions for scaling of coded political texts are included.
Maximum likelihood estimates are obtained via an EM algorithm with either a first-order or a fully exponential Laplace approximation as documented by Broatch and Karl (2018) <doi:10.48550/arXiv.1710.05284>
, Karl, Yang, and Lohr (2014) <doi:10.1016/j.csda.2013.11.019>, and by Karl (2012) <doi:10.1515/1559-0410.1471>. Karl and Zimmerman <doi:10.1016/j.jspi.2020.06.004> use this package to illustrate how the home field effect estimator from a mixed model can be biased under nonrandom scheduling.
Analyzes shooting data with respect to group shape, precision, and accuracy. This includes graphical methods, descriptive statistics, and inference tests using standard, but also non-parametric and robust statistical methods. Implements distributions for radial error in bivariate normal variables. Works with files exported by OnTarget
PC/TDS', Silver Mountain e-target, ShotMarker
e-target, SIUS e-target, or Taran', as well as with custom data files in text format. Supports inference from range statistics such as extreme spread. Includes a set of web-based graphical user interfaces.
This package provides a critical first step in systematic literature reviews and mining of academic texts is to identify relevant texts from a range of sources, particularly databases such as Web of Science or Scopus'. These databases often export in different formats or with different metadata tags. synthesisr expands on the tools outlined by Westgate (2019) <doi:10.1002/jrsm.1374> to import bibliographic data from a range of formats (such as bibtex', ris', or ciw') in a standard way, and allows merging and deduplication of the resulting dataset.
This package provides a collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random.
This is a package for converting natural language text into tokens. It includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the stringi
and Rcpp
packages for fast yet correct tokenization in UTF-8 encoding.
The function missForest
in this package is used to impute missing values, particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data, including complex interactions and non-linear relations. It yields an OOB imputation error estimate without the need of a test set or elaborate cross- validation. It can be run in parallel to save computation time.
This package provides simple, flexible assertions on data.frame or data.table objects with verbose output for vetting. While other assertion packages apply towards more general use-cases, assertable
is tailored towards tabular data. It includes functions to check variable names and values, whether the dataset contains all combinations of a given set of unique identifiers, and whether it is a certain length. In addition, assertable
includes utility functions to check the existence of target files and to efficiently import multiple tabular data files into one data.table.
Guile-Reader is a simple framework for building readers for GNU Guile.
The idea is to make it easy to build procedures that extend Guile’s read procedure. Readers supporting various syntax variants can easily be written, possibly by re-using existing “token readers” of a standard Scheme readers. For example, it is used to implement Skribilo’s R5RS-derived document syntax.
Guile-Reader’s approach is similar to Common Lisp’s “read table”, but hopefully more powerful and flexible (for instance, one may instantiate as many readers as needed).
Estimate the linear and nonlinear autoregressive distributed lag (ARDL & NARDL) models and the corresponding error correction models, and test for longrun and short-run asymmetric. The general-to-specific approach is also available in estimating the ARDL and NARDL models. The Pesaran, Shin & Smith (2001) (<doi:10.1002/jae.616>) bounds test for level relationships is also provided. The ardl.nardl package also performs short-run and longrun symmetric restrictions available at Shin et al. (2014) <doi:10.1007/978-1-4899-8008-3_9> and their corresponding tests.
Download data from the time-series databases of the Bundesbank, the German central bank. See the overview at the Bundesbank website (<https://www.bundesbank.de/en/statistics/time-series-databases>) for available series. The package provides only a single function, getSeries()
, which supports both traditional and real-time datasets; it will also download meta data if available. Downloaded data can automatically be arranged in various formats, such as data frames or zoo series. The data may optionally be cached, so as to avoid repeated downloads of the same series.
This package provides a collection of novel tools for generating species distribution and abundance models (SDM) that are dynamic through both space and time. These highly flexible functions incorporate spatial and temporal aspects across key SDM stages; including when cleaning and filtering species occurrence data, generating pseudo-absence records, assessing and correcting sampling biases and autocorrelation, extracting explanatory variables and projecting distribution patterns. Throughout, functions utilise Google Earth Engine and Google Drive to minimise the computing power and storage demands associated with species distribution modelling at high spatio-temporal resolution.
Efficient implementation of Kernel SHAP (Lundberg and Lee, 2017, <doi:10.48550/arXiv.1705.07874>
) permutation SHAP, and additive SHAP for model interpretability. For Kernel SHAP and permutation SHAP, if the number of features is too large for exact calculations, the algorithms iterate until the SHAP values are sufficiently precise in terms of their standard errors. The package integrates smoothly with meta-learning packages such as tidymodels', caret or mlr3'. It supports multi-output models, case weights, and parallel computations. Visualizations can be done using the R package shapviz'.
This package provides functions to perform simulations of ANOVA designs of up to three factors. Calculates the observed power and average observed effect size for all main effects and interactions in the ANOVA, and all simple comparisons between conditions. Includes functions for analytic power calculations and additional helper functions that compute effect sizes for ANOVA designs, observed error rates in the simulations, and functions to plot power curves. Please see Lakens, D., & Caldwell, A. R. (2021). "Simulation-Based Power Analysis for Factorial Analysis of Variance Designs". <doi:10.1177/2515245920951503>.
This is a package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.
This package provides a set of tools for the statistical analysis of data using:
normal linear models;
generalized linear models;
negative binomial regression models as alternative to the Poisson regression models under the presence of overdispersion;
beta-binomial and random-clumped binomial regression models as alternative to the binomial regression models under the presence of overdispersion;
zero-inflated and zero-altered regression models to deal with zero-excess in count data;
generalized nonlinear models;
generalized estimating equations for cluster correlated data.