Skip to content

dev-d-25/Tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokenizer

An interactive, tokenizer playground to explore how text breaks into tokens, how IDs are assigned using an external vocabulary file


✨ Features

  • Type or paste text to see tokens in real time
  • Token → ID mapping with color-coded token types:
    • 🟩 Green → existing vocab word
    • 🟥 Red → newly learned word
    • 🟦 Blue → UTF-8 raw byte (symbols)
    • 🟧 Orange → punctuation/symbol
    • ⬜ Grey → unknown token ID
  • Decode by entering space- or comma-separated token IDs
  • Legend for quick type reference
  • Custom tokenizer logic — no external libs
  • External vocab support — load vocab.json for consistent tokenization

🛠 Tech Stack

  • HTML + CSS
  • JavaScript
  • External vocab.json for token mapping

🚀 Getting Started

Clone & Open Locally

git clone https://github.com/dev-d-25/Tokenizer.git
cd Tokenizer

Use VScode Live Server to view the Web UI

About

Simple Web-Based Tokenizer & Decoder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors