tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 1,906 public repositories matching this topic...

rasbt / LLMs-from-scratch

theseer / tokenizer

Chevrotain / chevrotain

dqbd / tiktokenizer

roshan-research / hazm

natasha / natasha

uhop / stream-json

lovit / soynlp

ikawaha / kagome

no-context / moo

wangfenjin / simple

niieani / gpt-tokenizer

BLKSerene / Wordless

risesoft-y9 / Data-Labeling

mathewsanders / Mustard

cbaziotis / ekphrasis

open-korean-text / open-korean-text

lindera / lindera

therealoliver / Deepdive-llama3-from-scratch

jflex-de / jflex

Related topics