-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events #53625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| val queryId: UUID = UUIDv7Generator.generate() | ||
|
|
||
| // Tracks how many times this QueryExecution has been executed. | ||
| // Used by SQLExecution to determine whether to use qplQueryId or generate a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Used by SQLExecution to determine whether to use qplQueryId or generate a new one. | |
| // Used by SQLExecution to determine whether to use the existing queryId or generate a new one. |
| val outputMode: OutputMode, | ||
| val checkpointLocation: String, | ||
| val queryId: UUID, | ||
| override val queryId: UUID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this use UUIDv7?
| rootExecutionId: Option[Long], | ||
| // A unique identifier for the query execution. For the first execution it equals | ||
| // QueryExecution.queryId, for subsequent executions a new UUIDv7 is generated. | ||
| queryId: Option[UUID] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for better compatibility, can we add it at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we should put it in SparkListenerSQLExecutionEnd as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also check JsonProtocol. We should set this field to None when reading legacy event logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a UUIDv7-based queryId to SQL execution events to provide a globally unique, time-ordered identifier for tracking SQL queries across systems. The key changes include:
- Introduction of a UUIDv7 generator for creating time-ordered unique identifiers
- Addition of
queryIdfield toQueryExecutionand propagation through the SQL execution lifecycle - Modification of
SparkListenerSQLExecutionStartto include thequeryId
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sql/core/src/main/scala/org/apache/spark/sql/util/UUIDv7Generator.scala | New UUIDv7 generator implementation following RFC draft specification |
| sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala | Adds queryId field and executionCount tracking to QueryExecution |
| sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala | Implements queryId propagation via SparkContext local properties and generation logic |
| sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala | Adds queryId parameter to SparkListenerSQLExecutionStart event |
| sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala | Updates event handler to extract queryId from events |
| sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/runtime/IncrementalExecution.scala | Declares queryId as override since it's now inherited from QueryExecution |
| sql/core/src/test/scala/org/apache/spark/sql/util/UUIDv7GeneratorSuite.scala | Comprehensive test suite for UUIDv7 generator covering format, uniqueness, monotonicity, and timestamp accuracy |
| sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala | Tests for queryId propagation in concurrent and sequential execution scenarios |
| sql/core/src/test/scala/org/apache/spark/sql/execution/ui/*.scala | Updates test event constructors to include None for queryId parameter |
| sql/core/src/test/scala/org/apache/spark/sql/execution/history/*.scala | Updates test event constructors to include None for queryId parameter |
| sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListenerSuite.scala | Updates test event constructors to include None for queryId parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#section-5.2 | ||
| */ | ||
|
|
||
| private val random = new Random() |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shared Random instance in the object is not thread-safe. Since generateFrom can be called from multiple threads concurrently (via SQLExecution.withNewExecutionId0), the shared Random instance may produce non-unique UUIDs due to race conditions in Random.nextLong().
Consider using ThreadLocalRandom.current() instead of a shared Random instance to ensure thread-safety. ThreadLocalRandom is the standard approach for concurrent random number generation in Java/Scala and is used elsewhere in the Spark codebase.
| val timestampMs = epochMilli & 0xFFFFFFFFFFFFL | ||
|
|
||
| // 12 bits, avoid LSB as most HW clocks have resolution in range of 10-40 ns | ||
| val randA = (nano>> 4) & 0xFFF |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after the right shift operator. The code has (nano>> 4) but should be (nano >> 4) for consistency with Scala conventions and the rest of the codebase. You can see this convention in other files like HashedRelation.scala which uses (address >>> SIZE_BITS) with spaces.
| val randA = (nano>> 4) & 0xFFF | |
| val randA = (nano >> 4) & 0xFFF |
| val queryId: UUID = UUIDv7Generator.generate() | ||
|
|
||
| // Tracks how many times this QueryExecution has been executed. | ||
| // Used by SQLExecution to determine whether to use qplQueryId or generate a new one. |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to "qplQueryId" which appears to be a typo or outdated term. Based on the code and PR description, this should likely be "queryId" for clarity and consistency with the actual field name used throughout the codebase.
| // Used by SQLExecution to determine whether to use qplQueryId or generate a new one. | |
| // Used by SQLExecution to determine whether to use queryId or generate a new one. |
What changes were proposed in this pull request?
Add a new UUIDv7
queryIdobject to SparkListenerSQLExecutionStart and propagate it through the SQL execution lifecycle via SparkContext local properties.Currently, Spark uses
executionIdto connect jobs, stages, and tasks with SQL executions. However, this field is not globally unique, as multiple Spark applications can include the sameexecutionIds. UUIDv7 allows for a time-ordered, globally unique identifier for improved telemetry across systems.In a separate PR, plan to add
queryIdas a new field to SparkUI.Why are the changes needed?
Add a globally unique, time-ordered identifier for Spark SQL query execution events.
Does this PR introduce any user-facing change?
No, this PR simply adds the internal queryId which is not yet surfaced.
How was this patch tested?
Added tests for UUIDv7 generator and SQLExecution queryId propagation.
Was this patch authored or co-authored using generative AI tooling?
No.