Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the recipes package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
This package provides access to the text shaping functionality in the HarfBuzz library and the bidirectional algorithm in the Fribidi library. This is a low-level utility package mainly for graphic devices that expands upon the font tool-set provided by the systemfonts package.
Converts text into speech using various text-to-speech (TTS) engines and provides an unified interface for accessing their functionality. With this package, users can easily generate audio files of spoken words, phrases, or sentences from plain text data. The package supports multiple TTS engines, including Google's Cloud Text-to-Speech API', Amazon Polly', Microsoft's Cognitive Services Text to Speech REST API', and a free TTS engine called Coqui TTS'.
It analyzes text to create a count of top n-grams, including tokens (one-word), bigrams(two-word), and trigrams (three-word), while removing all stopwords. It also plots the n-grams and corresponding counts as a bar chart.
Allows users to analyze text and classify emotions such as happiness, sadness, anger, fear, and neutrality. It combines text preprocessing, TF-IDF (Term Frequency-Inverse Document Frequency) feature extraction, and Random Forest classification to predict emotions and map them to corresponding emojis for enhanced sentiment visualization.
This is a companion package for the text2sdg package. It contains the trained ensemble models needed by the detect_sdg function from the text2sdg package. See Wulff, Meier and Mata (2023) <arXiv:2301.11353> and Meier, Wulff and Mata (2021) <arXiv:2110.05856> for reference.
This package provides functionalities based on the paper "Time Varying Dictionary and the Predictive Power of FED Minutes" (Lima, 2018) <doi:10.2139/ssrn.3312483>. It selects the most predictive terms, that we call time-varying dictionary using supervised machine learning techniques as lasso and elastic net.
Graphic interface for text analysis, implement a few methods such as biplots, correspondence analysis, co-occurrence, clustering, topic models, correlations and sentiments.
This package provides a lightweight and focused text annotation tool built with shiny'. Provides an interactive graphical user interface for coding text documents, managing code hierarchies, creating memos, and analyzing coding patterns. Features include code co-occurrence analysis, visualization of coding patterns, comparison of multiple coding sets, and export capabilities. Supports collaborative qualitative research through standardized annotation formats and analysis tools.
Find similarities between texts using the Smith-Waterman algorithm. The algorithm performs local sequence alignment and determines similar regions between two strings. The Smith-Waterman algorithm is explained in the paper: "Identification of common molecular subsequences" by T.F.Smith and M.S.Waterman (1981), available at <doi:10.1016/0022-2836(81)90087-5>. This package implements the same logic for sequences of words and letters instead of molecular sequences.
Compute a non-overlapping layout of text boxes to label multiple overlain curves. For each curve, iteratively search for an adjacent x,y position for the text box that does not overlap with the other curves. If this process fails, then offsets are computed to add to the y values for each curve, that results in sufficient space to add all of the text labels.