This package provides a color palette generator inspired by Mexican politics, with colors ranging from red on the left to gray in the middle and green on the right. Palette options range from only a few colors to several colors, but with discrete and continuous options to offer greatest flexibility to the user. This package allows for a range of applications, from mapping brief discrete scales (e.g., four colors for Morena, PRI, and PAN) to continuous interpolated arrays including dozens of shades graded from red to green.
The Pearson-ICA algorithm is a mutual information-based method for blind separation of statistically independent source signals. It has been shown that the minimization of mutual information leads to iterative use of score functions, i.e. derivatives of log densities. The Pearson system allows adaptive modeling of score functions. The flexibility of the Pearson system makes it possible to model a wide range of source distributions including asymmetric distributions. The algorithm is designed especially for problems with asymmetric sources but it works for symmetric sources as well.
Universal and robust algorithm for solving the total alkalinity-pH equation presented in G. Munhoven (2013) <doi:10.5194/gmd-6-1367-2013> and G. Munhoven (2021) <doi:10.5194/gmd-2020-447>. The total alkalinity-pH equation relates total alkalinity and pH for a given set of acid-base concentrations in a given water sample, among which carbonic acid. This package is particularly useful in marine chemistry involving dissolved inorganic carbon. Original package in Fortran can be found at <doi:10.5281/zenodo.4328965>.
Visual contour and 2D point and contour plots for binary classification modeling under algorithms such as glm', rf', gbm', nnet and svm', presented over two dimensions generated by famd and mca methods. Package FactoMineR for multivariate reduction functions and package MBA for interpolation functions are used. The package can be used to visualize the discriminant power of input variables and algorithmic modeling, explore outliers, compare algorithm behaviour, etc. It has been created initially for teaching purposes, but it has also many practical uses under the XAI paradigm.
CAGE is a widely used high throughput assay for measuring transcription start site (TSS) activity. CAGEfightR is an R/Bioconductor package for performing a wide range of common data analysis tasks for CAGE and 5'-end data in general. Core functionality includes: import of CAGE TSSs (CTSSs), tag (or unidirectional) clustering for TSS identification, bidirectional clustering for enhancer identification, annotation with transcript and gene models, correlation of TSS and enhancer expression, calculation of TSS shapes, quantification of CAGE expression as expression matrices and genome brower visualization.
An interface to Azure Data Explorer', also known as Kusto', a fast, distributed data exploration service from Microsoft: <https://azure.microsoft.com/en-us/products/data-explorer/>. Includes DBI and dplyr interfaces, with the latter modelled after the dbplyr package, whereby queries are translated from R into the native KQL query language and executed lazily. On the admin side, the package extends the object framework provided by AzureRMR to support creation and deletion of databases, and management of database principals. Part of the AzureR family of packages.
This package provides a simple interface to the Microsoft Graph API <https://learn.microsoft.com/en-us/graph/overview>. Graph is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the Azure Active Directory part, with a view to supporting interoperability of R and Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the AzureR family of packages.
Usually, it is difficult to plot choropleth maps for Bangladesh in R'. The bangladesh package provides ready-to-use shapefiles for different administrative regions of Bangladesh (e.g., Division, District, Upazila, and Union). This package helps users to draw thematic maps of administrative regions of Bangladesh easily as it comes with the sf objects for the boundaries. It also provides functions allowing users to efficiently get specific area maps and center coordinates for regions. Users can also search for a specific area and calculate the centroids of those areas.
This package provides methods for probabilistic reconciliation of hierarchical forecasts of time series. The available methods include analytical Gaussian reconciliation (Corani et al., 2021) <doi:10.1007/978-3-030-67664-3_13>, MCMC reconciliation of count time series (Corani et al., 2024) <doi:10.1016/j.ijforecast.2023.04.003>, Bottom-Up Importance Sampling (Zambon et al., 2024) <doi:10.1007/s11222-023-10343-y>, methods for the reconciliation of mixed hierarchies (Mix-Cond and TD-cond) (Zambon et al., 2024) <https://proceedings.mlr.press/v244/zambon24a.html>.
Shed light on black box machine learning models by the help of model performance, variable importance, global surrogate models, ICE profiles, partial dependence (Friedman J. H. (2001) <doi:10.1214/aos/1013203451>), accumulated local effects (Apley D. W. (2016) <doi:10.48550/arXiv.1612.08468>), further effects plots, interaction strength, and variable contribution breakdown (Gosiewska and Biecek (2019) <doi:10.48550/arXiv.1903.11420>). All tools are implemented to work with case weights and allow for stratified analysis. Furthermore, multiple flashlights can be combined and analyzed together.
Publication-ready regional gene locus plots similar to those produced by the web interface LocusZoom <https://my.locuszoom.org>, but running locally in R. Genetic or genomic data with gene annotation tracks are plotted via R base graphics, ggplot2 or plotly', allowing flexibility and easy customisation including laying out multiple locus plots on the same page. It uses the LDlink API <https://ldlink.nih.gov/?tab=apiaccess> to query linkage disequilibrium data from the 1000 Genomes Project and can overlay this on plots <doi:10.1093/bioadv/vbaf006>.
This package provides access to coded election programmes from the Manifesto Corpus and to the Manifesto Project's Main Dataset and routines to analyse this data. The Manifesto Project <https://manifesto-project.wzb.eu> collects and analyses election programmes across time and space to measure the political preferences of parties. The Manifesto Corpus contains the collected and annotated election programmes in the Corpus format of the package tm to enable easy use of text processing and text mining functionality. Specific functions for scaling of coded political texts are included.
Maximum likelihood estimates are obtained via an EM algorithm with either a first-order or a fully exponential Laplace approximation as documented by Broatch and Karl (2018) <doi:10.48550/arXiv.1710.05284>, Karl, Yang, and Lohr (2014) <doi:10.1016/j.csda.2013.11.019>, and by Karl (2012) <doi:10.1515/1559-0410.1471>. Karl and Zimmerman <doi:10.1016/j.jspi.2020.06.004> use this package to illustrate how the home field effect estimator from a mixed model can be biased under nonrandom scheduling.
This package provides a critical first step in systematic literature reviews and mining of academic texts is to identify relevant texts from a range of sources, particularly databases such as Web of Science or Scopus'. These databases often export in different formats or with different metadata tags. synthesisr expands on the tools outlined by Westgate (2019) <doi:10.1002/jrsm.1374> to import bibliographic data from a range of formats (such as bibtex', ris', or ciw') in a standard way, and allows merging and deduplication of the resulting dataset.
Analyzes shooting data with respect to group shape, precision, and accuracy. This includes graphical methods, descriptive statistics, and inference tests using standard, but also non-parametric and robust statistical methods. Implements distributions for radial error in bivariate normal variables. Works with files exported by OnTarget PC/TDS', Silver Mountain e-target, ShotMarker e-target, SIUS e-target, or Taran', as well as with custom data files in text format. Supports inference from range statistics such as extreme spread. Includes a set of web-based graphical user interfaces.
This package provides a collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random.
This package provides simple, flexible assertions on data.frame or data.table objects with verbose output for vetting. While other assertion packages apply towards more general use-cases, assertable is tailored towards tabular data. It includes functions to check variable names and values, whether the dataset contains all combinations of a given set of unique identifiers, and whether it is a certain length. In addition, assertable includes utility functions to check the existence of target files and to efficiently import multiple tabular data files into one data.table.
This is a package for converting natural language text into tokens. It includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the stringi and Rcpp packages for fast yet correct tokenization in UTF-8 encoding.
The function missForest in this package is used to impute missing values, particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data, including complex interactions and non-linear relations. It yields an OOB imputation error estimate without the need of a test set or elaborate cross- validation. It can be run in parallel to save computation time.
Guile-Reader is a simple framework for building readers for GNU Guile.
The idea is to make it easy to build procedures that extend Guile’s read procedure. Readers supporting various syntax variants can easily be written, possibly by re-using existing “token readers” of a standard Scheme readers. For example, it is used to implement Skribilo’s R5RS-derived document syntax.
Guile-Reader’s approach is similar to Common Lisp’s “read table”, but hopefully more powerful and flexible (for instance, one may instantiate as many readers as needed).
Estimate the linear and nonlinear autoregressive distributed lag (ARDL & NARDL) models and the corresponding error correction models, and test for longrun and short-run asymmetric. The general-to-specific approach is also available in estimating the ARDL and NARDL models. The Pesaran, Shin & Smith (2001) (<doi:10.1002/jae.616>) bounds test for level relationships is also provided. The ardl.nardl package also performs short-run and longrun symmetric restrictions available at Shin et al. (2014) <doi:10.1007/978-1-4899-8008-3_9> and their corresponding tests.
This package provides functions to access data from public RESTful APIs including World Bank API and REST Countries API', retrieving real-time or historical information related to Algeria. The package enables users to query economic indicators and international demographic and geopolitical statistics in a reproducible way. It is designed for researchers, analysts, and developers who require reliable and programmatic access to Algerian data through established APIs. For more information on the APIs, see: World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392> and REST Countries API <https://restcountries.com/>.
Download data from the time-series databases of the Bundesbank, the German central bank. See the overview at the Bundesbank website (<https://www.bundesbank.de/en/statistics/time-series-databases>) for available series. The package provides only a single function, getSeries(), which supports both traditional and real-time datasets; it will also download meta data if available. Downloaded data can automatically be arranged in various formats, such as data frames or zoo series. The data may optionally be cached, so as to avoid repeated downloads of the same series.
This package provides a collection of novel tools for generating species distribution and abundance models (SDM) that are dynamic through both space and time. These highly flexible functions incorporate spatial and temporal aspects across key SDM stages; including when cleaning and filtering species occurrence data, generating pseudo-absence records, assessing and correcting sampling biases and autocorrelation, extracting explanatory variables and projecting distribution patterns. Throughout, functions utilise Google Earth Engine and Google Drive to minimise the computing power and storage demands associated with species distribution modelling at high spatio-temporal resolution.