r-wordpiece-data 2.0.0
Channel: guix-cran
Licenses: FSDG-compatible
Synopsis: Data for Wordpiece-Style Tokenization
Description:
This package provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.
Total results: 1