using TinySegmenter join(tokenize("私の名前は中野です"), " | ") # "私 | の | 名前 | は | 中野 | です"
The return value of
tokenize is an array of substrings of the string input,
giving the locations of the tokens in the text. (Substrings are represented
SubString Julia type.)
The following are times in seconds for a benchmark (see benchmark/README.md) of TinySegmenter implementations in different languages tokenizing a large (243kB) Japanese text:
The benchmark was performed on the following machine:
4 months ago