Skip to content

Conversation

@corydolphin
Copy link
Contributor

Refactor all of the GraphQL AST Nodes to use Python dataclasses to provide
better type safety, immutability guarantees, and cleaner code while maintaining
backwards compatibility with existing APIs.

Benchmark comparison (837f604 base vs dataclasses):

Benchmark Base Dataclass Change
test_parse_large_query 33,108 18,689 44% faster
test_parse_kitchen_sink 577 361 37% faster
test_pickle_large_query_decode 18,520 5,549 70% faster (3x)
test_pickle_large_query_encode 9,038 4,117 54% faster (2x)
test_pickle_large_query_round 28,048 10,206 64% faster (3x)
test_many_repeated_fields 15,918 14,909 6% faster
test_execute_basic_sync 310 292 6% faster
test_execute_basic_async 354 338 5% faster

Introduce benchmarks using a large (~117KB) GraphQL query to measure
parse and pickle serialization performance. These provide a baseline
for comparing serialization approaches in subsequent commits.

Baseline performance (measrued on a Macbook Pro M4 Max):
- Parse:         81ms
- Pickle encode: 24ms
- Pickle decode: 42ms
- Roundtrip:     71ms
Prepares AST for immutability by using tuples instead of lists for
collection fields. This aligns with the JavaScript GraphQL library
which uses readonly arrays, and enables future frozen datastructures.
Python 3.9 reaches end-of-life October 2025. Python 3.10 adoption is
now mainstream - major frameworks (strawberry, Django 5.0, FastAPI) require it.

This enables modern Python features:
- Dataclasses with `kw_only`
- Union types with `|` syntax (PEP 604)
- isinstance() with union types directly
- match statements for pattern matching
Modifies the AST visitor to use copy-on-write semantics when applying
edits. Instead of mutating nodes in place, the visitor now creates new
node instances with the edited values. This prepares for frozen AST
nodes while maintaining backwards compatibility.

The visitor accumulates edits and applies them by constructing new
nodes, enabling the transition to immutable data structures.
- Update test_visitor.py to properly type-annotate the visitor class attribute
  and add assertion before using selection_set
- Update test_schema_parser.py to use more precise types that match GraphQL spec:
  - NonNullTypeNode's inner type can only be NamedTypeNode or ListTypeNode
  - Schema definitions use ConstDirectiveNode, not DirectiveNode
  - Default values use ConstValueNode, not ValueNode
  - OperationTypeDefinition's type_ must be NamedTypeNode
- Handle token.value being str | None by using `or ""` fallback
- Fix parse_variable_definition to not use `and` for side effects
- Use properly typed variable in parse_nullability_assertion
- These fixes prepare for stricter type checking in frozen dataclasses
…r parsing)

Refactor all of the GraphQL AST Nodes to use Python dataclasses to provide
better type safety, immutability guarantees, and cleaner code while maintaining
backwards compatibility with existing APIs.

Benchmark comparison (837f604 base vs dataclasses):

| Benchmark                       |   Base |  Dataclass | Change          |
|---------------------------------|--------|------------|-----------------|
| test_parse_large_query          | 33,108 |     18,689 | 44% faster      |
| test_parse_kitchen_sink         |    577 |        361 | 37% faster      |
| test_pickle_large_query_decode  | 18,520 |      5,549 | 70% faster (3x) |
| test_pickle_large_query_encode  |  9,038 |      4,117 | 54% faster (2x) |
| test_pickle_large_query_round   | 28,048 |     10,206 | 64% faster (3x) |
| test_many_repeated_fields       | 15,918 |     14,909 | 6% faster       |
| test_execute_basic_sync         |    310 |        292 | 6% faster       |
| test_execute_basic_async        |    354 |        338 | 5% faster       |
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 8, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 23.43%

Comparing corydolphin:convert-ast-to-dataclasses (c04825e) with main (020ec1f)

Summary

⚡ 5 improved benchmarks
❌ 1 regressed benchmark
✅ 12 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_validate_introspection_query 15 ms 19.6 ms -23.43%
test_pickle_large_query_roundtrip 317.9 ms 97.1 ms ×3.3
test_pickle_large_query_encode 110.6 ms 55.9 ms +97.97%
test_parse_kitchen_sink 6.8 ms 4.3 ms +58.55%
test_parse_large_query 419.7 ms 238.1 ms +76.29%
test_pickle_large_query_decode 207.9 ms 41.6 ms ×5

@Cito
Copy link
Member

Cito commented Jan 8, 2026

Baseline performance (before PR #250):

Benchmark M4 Ryzen9 Codspeed
parse 81 46 418
pickle encode 24 16 110
picke decode 42 39 209
pickle roundtrip 71 58 319

Performance (after PR #256):

Benchmark M4 Ryzen9 Codspeed
parse 19 24 238
pickle encode 4 5 56
picke decode 6 7 42
pickle roundtrip 10 12 97

(mean benchmark time in ms)

@Cito
Copy link
Member

Cito commented Jan 8, 2026

Thanks @corydolphin . These are in fact impressive gains in parsing and deser of large queries.

Codspeed complains about a degredation in the "validate introspection query" benchmark. I think this is just because of the high variability in this test. Can you double check that it's not caused by this PR?

Generally, I think we should split some of the benchmarks and have created issue #257 to remember.

I will already merge and fix the linting error later.

@Cito Cito merged commit 629b763 into graphql-python:main Jan 8, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants