_            _    _        _         _
      /\ \         /\ \ /\ \     /\_\      / /\
      \_\ \       /  \ \\ \ \   / / /     / /  \
      /\__ \     / /\ \ \\ \ \_/ / /     / / /\ \__
     / /_ \ \   / / /\ \ \\ \___/ /     / / /\ \___\
    / / /\ \ \ / / /  \ \_\\ \ \_/      \ \ \ \/___/
   / / /  \/_// / /   / / / \ \ \        \ \ \
  / / /      / / /   / / /   \ \ \   _    \ \ \
 / / /      / / /___/ / /     \ \ \ /_/\__/ / /
/_/ /      / / /____\/ /       \ \_\\ \/___/ /
\_\/       \/_________/         \/_/ \_____\/
sentencepiece 0.1.97
Channel: guix
Location: gnu/packages/machine-learning.scm (gnu packages machine-learning)
Home page: https://github.com/google/sentencepiece
Licenses: ASL 2.0
Synopsis: Unsupervised tokenizer for Neural Network-based text generation
Description:

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units---e.g., byte-pair-encoding (BPE) and unigram language model---with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre- or post-processing.

Total results: 3