Skip to content

manticore-projects/jsqlparser-bench

Repository files navigation

JSqlParser Parse Benchmark

Benchmark Score

A rigorous JMH benchmark suite that runs the exact same SQL queries as SQLGlot's benchmarks/parse.py, enabling a direct, apples-to-apples comparison of parse times across JSqlParser, SQLGlot, and polyglot-sql.

In addition to the synthetic queries defined by SQLGlot, this benchmark includes a large collection of real-world SQL extracted from JSqlParser's own test suite — covering DDL, DML, dialect-specific syntax, and edge cases that stress-test parser breadth as much as raw speed.

Queries

Synthetic queries (from SQLGlot)

Name Description
tpch Realistic TPC-H Q2 (CTEs, sub-queries, joins)
short_query Tiny SELECT with CASE
deep_arithmetic 500-deep 1+0+1+2+… and 2*0*1*2*…
large_in 20 000-element IN lists (strings + numbers)
values INSERT … VALUES with 2 000 × 20 cells
many_joins 199 JOINs
many_unions 500 UNION ALLs
nested_subqueries 20-deep nested (SELECT * FROM …)
many_columns SELECT with 1 000 columns
large_case CASE with 1 000 WHEN branches
complex_where 200 ANDed compound predicates
many_ctes 200 CTEs
many_windows 200 window functions
nested_functions 20-deep nested COALESCE
large_strings 500 × 100-char string literals
many_numbers 10 000 numeric literals in SELECT

Real-world queries (from JSqlParser's test suite)

The queries/ directory contains SQL files drawn from JSqlParser's extensive test suite — CREATE TABLE, ALTER, INSERT, SELECT, MERGE, DELETE, UPDATE, Hive, PostgreSQL, SQLite, and more. Each file bundles dozens to hundreds of statements that exercise real dialect features: MySQL key syntax, Oracle tablespaces, PostgreSQL arrays, Hive-specific DDL, complex nested brackets, window functions, and many others.

These files were pre-filtered against SQLGlot (see sqlglot_prefilter.py): statements that SQLGlot cannot parse are separated into *_bad.sql files and excluded from the benchmark. The remaining *_good.sql files contain only statements that both parsers accept, ensuring a fair comparison.

Custom queries

Drop .sql files into the queries/ directory. Both sides pick them up automatically — the filename (without .sql) becomes the benchmark name.

queries/
├── hive_retention.sql      → benchmark name: "hive_retention"
├── my_complex_report.sql   → benchmark name: "my_complex_report"
└── ...

Rules for .sql files:

  • One or more SQL statements per file (semicolons separate statements)
  • UTF-8 encoding
  • The filename stem must not collide with a built-in query name
# run just your custom query
./gradlew jmhRun -Pquery=hive_retention

# or in the Python comparison
python compare_parsers.py --queries hive_retention --skip-java

Running

Full JMH benchmark (rigorous)

./gradlew jmhRun

Results go to build/results/jmh/results.json.

Single query

./gradlew jmhRun -Pquery=tpch

Quick sanity check (no JMH, best-of-5)

./gradlew quickBench

Comparing parsers

The included compare_parsers.py benchmarks all available parsers and prints a multi-column comparison table. Each Python variant runs in its own subprocess to isolate mypyc .so loading from pure-Python measurement.

Python parser variants (auto-detected)

Variant Install What it is
sqlglot(pure) pip install sqlglot Interpreted Python — no C extensions
sqlglot[c] pip install "sqlglot[c]" mypyc-compiled C extensions (includes Rust tokenizer)
polyglot-sql pip install polyglot-sql Full Rust rewrite of SQLGlot (PyO3 bindings)

If you install sqlglot[c], the script automatically benchmarks both the C-extension and pure-Python variants. It temporarily hides the .so files to get a clean interpreted-Python measurement (the same technique SQLGlot's own make hidec / make showc uses).

Install whichever variants you want to compare. Unavailable parsers are silently skipped.

Usage

python -m venv .venv
source .venv/bin/activate
pip install "sqlglot" "sqlglot[c]" "polyglot-sql" --upgrade pip

# full rigorous run — both sides use matched settings
# Python: 2 forks x (3 warmup x 3s + 5 measure x 3s) = ~48s/query
# JMH:    2 forks x (3 warmup x 3s + 5 measure x 3s) = ~48s/query
python compare_parsers.py

# quick smoke test — ~2 min total
python compare_parsers.py --queries tpch,short_query,many_joins \
    --py-forks 1 --py-warmup-iters 1 --py-warmup-time 2 \
    --py-measure-iters 2 --py-measure-time 2 \
    --jmh-warmup 1 --jmh-iterations 2 --jmh-forks 1

# only Python parsers (no Java/Gradle needed)
python compare_parsers.py --skip-java

# only JSqlParser (skip Python)
python compare_parsers.py --skip-python

The Python parameters mirror JMH's flags:

Python flag JMH equivalent Default Meaning
--py-forks -f 2 Separate processes
--py-warmup-iters -wi 3 Warmup iterations per fork
--py-warmup-time -w 3s Duration of each warmup iter
--py-measure-iters -i 5 Measurement iterations per fork
--py-measure-time -r 3s Duration of each measure iter

Example output

  =======================================================================================================
    Parser Comparison  (all times in ms/op, lower is better)
  =======================================================================================================

                               Query  |  sqlglot(pure)  |    sqlglot[c]  |  polyglot-sql  |    JSqlParser
  ------------------------------------+-----------------+----------------+----------------+--------------
                                tpch  |        2.52 ms  |      710.5 us  |    308.5 us *  |      494.1 us
                         short_query  |       192.6 us  |       58.9 us  |     57.6 us *  |       62.9 us
                     deep_arithmetic  |        7.68 ms  |       2.40 ms  |      950.1 us  |    715.3 us *
                            large_in  |      403.88 ms  |     112.55 ms  |    64.47 ms *  |     153.77 ms
                              values  |      469.46 ms  |     145.54 ms  |    68.74 ms *  |     155.75 ms
                          many_joins  |       11.58 ms  |       3.27 ms  |     1.19 ms *  |       2.17 ms
                         many_unions  |       45.08 ms  |      10.82 ms  |     3.88 ms *  |       5.87 ms
                   nested_subqueries  |       981.1 us  |      256.9 us  |    105.3 us *  |      162.1 us
                        many_columns  |       12.02 ms  |       3.08 ms  |     2.20 ms *  |       3.95 ms
                          large_case  |       36.55 ms  |      10.86 ms  |     4.94 ms *  |       5.27 ms
                       complex_where  |       30.74 ms  |       7.79 ms  |     3.44 ms *  |       4.24 ms
                           many_ctes  |       18.22 ms  |       4.05 ms  |     1.21 ms *  |       1.79 ms
                        many_windows  |       21.66 ms  |       5.64 ms  |     2.26 ms *  |       3.50 ms
                    nested_functions  |       614.4 us  |      210.6 us  |    156.7 us *  |      317.9 us
                       large_strings  |        4.75 ms  |     1.65 ms *  |       1.73 ms  |       2.87 ms
                        many_numbers  |       98.69 ms  |      33.35 ms  |    16.74 ms *  |      21.77 ms
                      AlterTest_good  |       13.06 ms  |       5.28 ms  |           ---  |     1.12 ms *
             CreateSequenceTest_good  |        1.66 ms  |      473.5 us  |           ---  |    109.0 us *
                CreateTableTest_good  |       27.01 ms  |      11.91 ms  |           ---  |     2.47 ms *
                     DeleteTest_good  |        2.41 ms  |      757.5 us  |           ---  |    329.0 us *
                       HiveTest_good  |       537.2 us  |      210.8 us  |           ---  |    138.5 us *
                     InsertTest_good  |        7.58 ms  |       2.34 ms  |           ---  |     1.30 ms *
                      MergeTest_good  |        2.32 ms  |      865.0 us  |           ---  |    326.7 us *
  NestedBracketsPerformanceTest_good  |        9.96 ms  |       2.85 ms  |           ---  |     2.27 ms *
                   PostgresTest_good  |        1.48 ms  |      536.5 us  |           ---  |    249.9 us *
                     SQLiteTest_good  |       133.0 us  |       48.9 us  |     44.8 us *  |       46.1 us
                     SelectTest_good  |      206.88 ms  |      84.85 ms  |           ---  |     4.40 ms *
                     UpdateTest_good  |        5.57 ms  |       1.88 ms  |           ---  |    906.0 us *
             WindowFunctionTest_good  |       244.8 us  |      105.5 us  |     57.6 us *  |      100.2 us
       deep_function_parameters_good  |        7.44 ms  |       2.95 ms  |     1.02 ms *  |       2.11 ms
                 hive_retention_good  |        5.65 ms  |       2.57 ms  |    664.1 us *  |       1.48 ms
                      issue2401_good  |       27.52 ms  |      10.27 ms  |     3.02 ms *  |       9.66 ms
  ------------------------------------+-----------------+----------------+----------------+--------------

  * = fastest (within 2%)          --- = not available / parse error

Observations

  1. polyglot-sql is the fastest on synthetic queries — but it fails to parse the majority of real-world SQL. A parser that only supports a narrow slice of SQL syntax can afford to be very fast on what it does handle.

  2. JSqlParser dominates on real-world SQL. On every real-world test-suite file — SELECT, CREATE TABLE, ALTER, INSERT, UPDATE, MERGE, DELETE, Hive, PostgreSQL — JSqlParser is the fastest parser, often by a wide margin (e.g. SelectTest_good at 4.40 ms vs sqlglot[c]'s 84.85 ms — a 19× difference). These are the queries that matter in production.

  3. Breadth and speed are not a trade-off. JSqlParser parses a substantially larger set of SQL dialects and syntax variants than either SQLGlot or polyglot-sql. The --- entries in the table — queries that a parser could not handle at all — appear exclusively in the SQLGlot/polyglot-sql columns, never in JSqlParser's. Broad dialect coverage and competitive performance go hand in hand.

  4. On the synthetic queries, polyglot-sql (Rust) leads in token-heavy workloads like large_in and values where native-code tokenisation shines. JSqlParser holds its own as the second-fastest on most of these, and takes the top spot on deep_arithmetic — all while running on the JVM with no native extensions.

About

SQLGLot vs. JSQLParser speed comparison

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors