The Unigram algorithm is a subword tokenization algorithm based on the unigram language model in NLP. It is used to preprocess text input for some deep learning language models.
Enter training text in the text area below and press the 'Train' button to train the Unigram model of given target size. Note that the algorithm here is very slow except for small texts.
In the Tokenization section below, you can interactively see the tokenization result of input text.