-
Notifications
You must be signed in to change notification settings - Fork 56
DRIVER-153: negotiate and implement SCYLLA_USE_METADATA_ID extension #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3062,9 +3062,7 @@ def _create_response_future(self, query, parameters, trace, custom_payload, | |
| message = ExecuteMessage( | ||
| prepared_statement.query_id, query.values, cl, | ||
| serial_cl, fetch_size, paging_state, timestamp, | ||
| skip_meta=bool(prepared_statement.result_metadata), | ||
| continuous_paging_options=continuous_paging_options, | ||
| result_metadata_id=prepared_statement.result_metadata_id) | ||
| continuous_paging_options=continuous_paging_options) | ||
| elif isinstance(query, BatchStatement): | ||
| if self._protocol_version < 2: | ||
| raise UnsupportedOperation( | ||
|
|
@@ -5008,6 +5006,17 @@ def _query(self, host, message=None, cb=None): | |
| self._connection = connection | ||
| result_meta = self.prepared_statement.result_metadata if self.prepared_statement else [] | ||
|
|
||
| if self.prepared_statement and isinstance(message, ExecuteMessage): | ||
| has_result_metadata_id = self.prepared_statement.result_metadata_id is not None | ||
| has_result_metadata = bool(self.prepared_statement.result_metadata) | ||
| can_skip_meta = has_result_metadata_id and has_result_metadata and ( | ||
| ProtocolVersion.uses_prepared_metadata(connection.protocol_version) | ||
| or connection.features.use_metadata_id | ||
| ) | ||
| message.skip_meta = can_skip_meta | ||
| message.result_metadata_id = self.prepared_statement.result_metadata_id if can_skip_meta else None | ||
| message.use_metadata_id = connection.features.use_metadata_id | ||
|
|
||
|
Comment on lines
+5009
to
+5019
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, so instead of setting those when creating |
||
| if cb is None: | ||
| cb = partial(self._set_result, host, connection, pool) | ||
|
|
||
|
|
@@ -5171,6 +5180,27 @@ def _set_result(self, host, connection, pool, response): | |
| self._paging_state = response.paging_state | ||
| self._col_names = response.column_names | ||
| self._col_types = response.column_types | ||
| new_result_metadata_id = getattr(response, 'result_metadata_id', None) | ||
| if self.prepared_statement and new_result_metadata_id is not None: | ||
| if response.column_metadata: | ||
| # Write result_metadata before result_metadata_id intentionally: | ||
| # a concurrent reader that still sees the old metadata_id will | ||
| # ask the server for full metadata and recover safely; a reader | ||
| # that sees the new metadata_id together with the new metadata | ||
| # is immediately correct. The opposite write order could expose | ||
| # a window where a reader uses a new metadata_id with stale metadata. | ||
| # Note: correctness of this ordering relies on CPython's GIL making | ||
| # individual attribute reads/writes effectively atomic. Other Python | ||
| # implementations (PyPy, Jython, etc.) may not provide this guarantee. | ||
| self.prepared_statement.result_metadata = response.column_metadata | ||
| else: | ||
| log.warning( | ||
| "Server sent a new result_metadata_id but no column metadata " | ||
| "for prepared statement %r. The cached column metadata will not " | ||
| "be updated; only result_metadata_id is refreshed.", | ||
| getattr(self.prepared_statement, 'query_id', None) | ||
| ) | ||
| self.prepared_statement.result_metadata_id = new_result_metadata_id | ||
|
Comment on lines
+5183
to
+5203
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the concurrency model here? I don't think this code works if its possible for it to execute concurrently.
Now we have |
||
| if getattr(self.message, 'continuous_paging_options', None): | ||
| self._handle_continuous_paging_first_response(connection, response) | ||
| else: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -573,6 +573,9 @@ def _write_query_params(self, f, protocol_version): | |
| if self.timestamp is not None: | ||
| flags |= _PROTOCOL_TIMESTAMP_FLAG | ||
|
|
||
| if self.skip_meta: | ||
| flags |= _SKIP_METADATA_FLAG | ||
|
|
||
| if self.keyspace is not None: | ||
| if ProtocolVersion.uses_keyspace_flag(protocol_version): | ||
| flags |= _WITH_KEYSPACE_FLAG | ||
|
|
@@ -629,9 +632,11 @@ class ExecuteMessage(_QueryMessage): | |
| def __init__(self, query_id, query_params, consistency_level, | ||
| serial_consistency_level=None, fetch_size=None, | ||
| paging_state=None, timestamp=None, skip_meta=False, | ||
| continuous_paging_options=None, result_metadata_id=None): | ||
| continuous_paging_options=None, result_metadata_id=None, | ||
| use_metadata_id=False): | ||
| self.query_id = query_id | ||
| self.result_metadata_id = result_metadata_id | ||
| self.use_metadata_id = use_metadata_id | ||
| super(ExecuteMessage, self).__init__(query_params, consistency_level, serial_consistency_level, fetch_size, | ||
| paging_state, timestamp, skip_meta, continuous_paging_options) | ||
|
|
||
|
|
@@ -640,8 +645,8 @@ def _write_query_params(self, f, protocol_version): | |
|
|
||
| def send_body(self, f, protocol_version): | ||
| write_string(f, self.query_id) | ||
| if ProtocolVersion.uses_prepared_metadata(protocol_version): | ||
| write_string(f, self.result_metadata_id) | ||
| if ProtocolVersion.uses_prepared_metadata(protocol_version) or self.use_metadata_id: | ||
| write_string(f, self.result_metadata_id if self.result_metadata_id is not None else b'') | ||
| self._write_query_params(f, protocol_version) | ||
|
Comment on lines
646
to
650
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A more elegant approach, it seems to me, is to not have |
||
|
|
||
|
|
||
|
|
@@ -745,7 +750,7 @@ def decode_row(row): | |
|
|
||
| def recv_results_prepared(self, f, protocol_version, protocol_features, user_type_map): | ||
| self.query_id = read_binary_string(f) | ||
| if ProtocolVersion.uses_prepared_metadata(protocol_version): | ||
| if ProtocolVersion.uses_prepared_metadata(protocol_version) or protocol_features.use_metadata_id: | ||
| self.result_metadata_id = read_binary_string(f) | ||
| else: | ||
| self.result_metadata_id = None | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -156,3 +156,47 @@ https://github.com/scylladb/scylladb/blob/master/docs/dev/protocol-extensions.md | |
|
|
||
| Details on the sending tablet information to the drivers | ||
| https://github.com/scylladb/scylladb/blob/master/docs/dev/protocol-extensions.md#sending-tablet-info-to-the-drivers | ||
|
|
||
|
|
||
| Prepared Statement Metadata Caching (``SCYLLA_USE_METADATA_ID``) | ||
| ---------------------------------------------------------------- | ||
|
|
||
| When executing prepared SELECT statements, the driver normally requests the server | ||
| to skip sending full result metadata with each response (``skip_meta`` optimization), | ||
| relying on the metadata cached from the initial ``PREPARE`` call. However, if the | ||
| table schema changes after a statement is prepared (e.g., a column is added, removed, | ||
| or its type is altered), this cached metadata becomes stale — leading to decoding | ||
| errors or incorrect data. | ||
|
|
||
| ScyllaDB solves this by backporting the ``metadata_id`` mechanism from CQL native | ||
| protocol v5 as a v4 extension: ``SCYLLA_USE_METADATA_ID``. When this extension is | ||
| negotiated, the server includes a hash of the result metadata in the ``PREPARE`` | ||
| response. The driver sends this hash back with every ``EXECUTE`` request. If the | ||
| schema has changed, the server sets the ``METADATA_CHANGED`` flag and returns the | ||
| new metadata hash together with the updated column definitions. The driver | ||
| automatically updates its cache and uses the new metadata to decode the current | ||
| response — all transparently, with no application code change required. | ||
|
|
||
| **Behaviour summary:** | ||
|
|
||
| - Automatically negotiated at connection time when the ScyllaDB node supports it. | ||
| - ``skip_meta`` is enabled (metadata omitted from EXECUTE responses) only when it | ||
| is safe: the connection must have negotiated ``SCYLLA_USE_METADATA_ID`` (or use | ||
| CQL v5), *and* the prepared statement must carry a ``result_metadata_id`` obtained | ||
| from PREPARE. | ||
| - When a schema change is detected by the server, the driver refreshes both the | ||
| cached column metadata and the metadata hash for that prepared statement so that | ||
| all subsequent executions benefit immediately. | ||
| - Statements prepared before the extension was negotiated (e.g., during a rolling | ||
| upgrade) retain ``result_metadata_id=None`` and fall back to always requesting | ||
| full metadata, which is the safest option. | ||
|
|
||
|
Comment on lines
+190
to
+193
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, but at some point we'll receive the metadata and id, and then it will be skipped. I'm saying that because this point may suggest that clients need to be restarted after such rolling upgrade, but they don't |
||
| **Current scope:** the optimization applies to any prepared statement whose | ||
| ``PREPARE`` response includes non-empty result columns — in practice, SELECT | ||
| queries. UPDATE/INSERT/DELETE statements naturally return no result columns, so | ||
| their ``result_metadata`` is always empty and ``skip_meta`` is never set for | ||
| them. There is no code-level restriction to SELECT; the behaviour follows | ||
| directly from the data. | ||
|
|
||
| For full protocol details see the ScyllaDB CQL extensions documentation: | ||
| https://opensource.docs.scylladb.com/stable/cql/cql-extensions.html | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this link? Opensource is LONG deprecated, and this link is about CQL language extension, not CQL protocol extensions - totally irrelevant here. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is
bool(self.prepared_statement.result_metadata)true?Is it true if server sent result metadata with 0 columns?