System:
Reproduce
tokens = wt.tokenize("༄༅། །བློ་སྦྱོང་དོན་?")
print(tokens[0])
Output
text: "༄༅། །"
char_types: |NORMAL_PUNCT|NORMAL_PUNCT|NORMAL_PUNCT|TRANSPARENT|NORMAL_PUNCT|
chunk_type: PUNCT
start: 0
len: 5
Expected output:
text: "༄༅། །"
char_types: |NORMAL_PUNCT|NORMAL_PUNCT|NORMAL_PUNCT|TRANSPARENT|NORMAL_PUNCT|
chunk_type: PUNCT
pos: PUNCT
start: 0
len: 5
System:
Reproduce
Output
Expected output: