A rigorous JMH benchmark suite that runs the exact same SQL queries as
SQLGlot's benchmarks/parse.py,
enabling a direct, apples-to-apples comparison of parse times across
JSqlParser, SQLGlot, and polyglot-sql.
In addition to the synthetic queries defined by SQLGlot, this benchmark includes a large collection of real-world SQL extracted from JSqlParser's own test suite — covering DDL, DML, dialect-specific syntax, and edge cases that stress-test parser breadth as much as raw speed.
| Name | Description |
|---|---|
tpch |
Realistic TPC-H Q2 (CTEs, sub-queries, joins) |
short_query |
Tiny SELECT with CASE |
deep_arithmetic |
500-deep 1+0+1+2+… and 2*0*1*2*… |
large_in |
20 000-element IN lists (strings + numbers) |
values |
INSERT … VALUES with 2 000 × 20 cells |
many_joins |
199 JOINs |
many_unions |
500 UNION ALLs |
nested_subqueries |
20-deep nested (SELECT * FROM …) |
many_columns |
SELECT with 1 000 columns |
large_case |
CASE with 1 000 WHEN branches |
complex_where |
200 ANDed compound predicates |
many_ctes |
200 CTEs |
many_windows |
200 window functions |
nested_functions |
20-deep nested COALESCE |
large_strings |
500 × 100-char string literals |
many_numbers |
10 000 numeric literals in SELECT |
The queries/ directory contains SQL files drawn from JSqlParser's extensive
test suite — CREATE TABLE, ALTER, INSERT, SELECT, MERGE, DELETE,
UPDATE, Hive, PostgreSQL, SQLite, and more. Each file bundles dozens to
hundreds of statements that exercise real dialect features: MySQL key syntax,
Oracle tablespaces, PostgreSQL arrays, Hive-specific DDL, complex nested
brackets, window functions, and many others.
These files were pre-filtered against SQLGlot (see sqlglot_prefilter.py):
statements that SQLGlot cannot parse are separated into *_bad.sql files and
excluded from the benchmark. The remaining *_good.sql files contain only
statements that both parsers accept, ensuring a fair comparison.
Drop .sql files into the queries/ directory. Both sides pick them up
automatically — the filename (without .sql) becomes the benchmark name.
queries/
├── hive_retention.sql → benchmark name: "hive_retention"
├── my_complex_report.sql → benchmark name: "my_complex_report"
└── ...
Rules for .sql files:
- One or more SQL statements per file (semicolons separate statements)
- UTF-8 encoding
- The filename stem must not collide with a built-in query name
# run just your custom query
./gradlew jmhRun -Pquery=hive_retention
# or in the Python comparison
python compare_parsers.py --queries hive_retention --skip-java./gradlew jmhRunResults go to build/results/jmh/results.json.
./gradlew jmhRun -Pquery=tpch./gradlew quickBenchThe included compare_parsers.py benchmarks all available parsers and
prints a multi-column comparison table. Each Python variant runs in its own
subprocess to isolate mypyc .so loading from pure-Python measurement.
| Variant | Install | What it is |
|---|---|---|
sqlglot(pure) |
pip install sqlglot |
Interpreted Python — no C extensions |
sqlglot[c] |
pip install "sqlglot[c]" |
mypyc-compiled C extensions (includes Rust tokenizer) |
polyglot-sql |
pip install polyglot-sql |
Full Rust rewrite of SQLGlot (PyO3 bindings) |
If you install
sqlglot[c], the script automatically benchmarks both the C-extension and pure-Python variants. It temporarily hides the.sofiles to get a clean interpreted-Python measurement (the same technique SQLGlot's ownmake hidec/make showcuses).
Install whichever variants you want to compare. Unavailable parsers are silently skipped.
python -m venv .venv
source .venv/bin/activate
pip install "sqlglot" "sqlglot[c]" "polyglot-sql" --upgrade pip
# full rigorous run — both sides use matched settings
# Python: 2 forks x (3 warmup x 3s + 5 measure x 3s) = ~48s/query
# JMH: 2 forks x (3 warmup x 3s + 5 measure x 3s) = ~48s/query
python compare_parsers.py
# quick smoke test — ~2 min total
python compare_parsers.py --queries tpch,short_query,many_joins \
--py-forks 1 --py-warmup-iters 1 --py-warmup-time 2 \
--py-measure-iters 2 --py-measure-time 2 \
--jmh-warmup 1 --jmh-iterations 2 --jmh-forks 1
# only Python parsers (no Java/Gradle needed)
python compare_parsers.py --skip-java
# only JSqlParser (skip Python)
python compare_parsers.py --skip-pythonThe Python parameters mirror JMH's flags:
| Python flag | JMH equivalent | Default | Meaning |
|---|---|---|---|
--py-forks |
-f |
2 | Separate processes |
--py-warmup-iters |
-wi |
3 | Warmup iterations per fork |
--py-warmup-time |
-w |
3s | Duration of each warmup iter |
--py-measure-iters |
-i |
5 | Measurement iterations per fork |
--py-measure-time |
-r |
3s | Duration of each measure iter |
=======================================================================================================
Parser Comparison (all times in ms/op, lower is better)
=======================================================================================================
Query | sqlglot(pure) | sqlglot[c] | polyglot-sql | JSqlParser
------------------------------------+-----------------+----------------+----------------+--------------
tpch | 2.52 ms | 710.5 us | 308.5 us * | 494.1 us
short_query | 192.6 us | 58.9 us | 57.6 us * | 62.9 us
deep_arithmetic | 7.68 ms | 2.40 ms | 950.1 us | 715.3 us *
large_in | 403.88 ms | 112.55 ms | 64.47 ms * | 153.77 ms
values | 469.46 ms | 145.54 ms | 68.74 ms * | 155.75 ms
many_joins | 11.58 ms | 3.27 ms | 1.19 ms * | 2.17 ms
many_unions | 45.08 ms | 10.82 ms | 3.88 ms * | 5.87 ms
nested_subqueries | 981.1 us | 256.9 us | 105.3 us * | 162.1 us
many_columns | 12.02 ms | 3.08 ms | 2.20 ms * | 3.95 ms
large_case | 36.55 ms | 10.86 ms | 4.94 ms * | 5.27 ms
complex_where | 30.74 ms | 7.79 ms | 3.44 ms * | 4.24 ms
many_ctes | 18.22 ms | 4.05 ms | 1.21 ms * | 1.79 ms
many_windows | 21.66 ms | 5.64 ms | 2.26 ms * | 3.50 ms
nested_functions | 614.4 us | 210.6 us | 156.7 us * | 317.9 us
large_strings | 4.75 ms | 1.65 ms * | 1.73 ms | 2.87 ms
many_numbers | 98.69 ms | 33.35 ms | 16.74 ms * | 21.77 ms
AlterTest_good | 13.06 ms | 5.28 ms | --- | 1.12 ms *
CreateSequenceTest_good | 1.66 ms | 473.5 us | --- | 109.0 us *
CreateTableTest_good | 27.01 ms | 11.91 ms | --- | 2.47 ms *
DeleteTest_good | 2.41 ms | 757.5 us | --- | 329.0 us *
HiveTest_good | 537.2 us | 210.8 us | --- | 138.5 us *
InsertTest_good | 7.58 ms | 2.34 ms | --- | 1.30 ms *
MergeTest_good | 2.32 ms | 865.0 us | --- | 326.7 us *
NestedBracketsPerformanceTest_good | 9.96 ms | 2.85 ms | --- | 2.27 ms *
PostgresTest_good | 1.48 ms | 536.5 us | --- | 249.9 us *
SQLiteTest_good | 133.0 us | 48.9 us | 44.8 us * | 46.1 us
SelectTest_good | 206.88 ms | 84.85 ms | --- | 4.40 ms *
UpdateTest_good | 5.57 ms | 1.88 ms | --- | 906.0 us *
WindowFunctionTest_good | 244.8 us | 105.5 us | 57.6 us * | 100.2 us
deep_function_parameters_good | 7.44 ms | 2.95 ms | 1.02 ms * | 2.11 ms
hive_retention_good | 5.65 ms | 2.57 ms | 664.1 us * | 1.48 ms
issue2401_good | 27.52 ms | 10.27 ms | 3.02 ms * | 9.66 ms
------------------------------------+-----------------+----------------+----------------+--------------
* = fastest (within 2%) --- = not available / parse error
-
polyglot-sql is the fastest on synthetic queries — but it fails to parse the majority of real-world SQL. A parser that only supports a narrow slice of SQL syntax can afford to be very fast on what it does handle.
-
JSqlParser dominates on real-world SQL. On every real-world test-suite file —
SELECT,CREATE TABLE,ALTER,INSERT,UPDATE,MERGE,DELETE, Hive, PostgreSQL — JSqlParser is the fastest parser, often by a wide margin (e.g.SelectTest_goodat 4.40 ms vs sqlglot[c]'s 84.85 ms — a 19× difference). These are the queries that matter in production. -
Breadth and speed are not a trade-off. JSqlParser parses a substantially larger set of SQL dialects and syntax variants than either SQLGlot or polyglot-sql. The
---entries in the table — queries that a parser could not handle at all — appear exclusively in the SQLGlot/polyglot-sql columns, never in JSqlParser's. Broad dialect coverage and competitive performance go hand in hand. -
On the synthetic queries, polyglot-sql (Rust) leads in token-heavy workloads like
large_inandvalueswhere native-code tokenisation shines. JSqlParser holds its own as the second-fastest on most of these, and takes the top spot ondeep_arithmetic— all while running on the JVM with no native extensions.