fix: handle missing schema_id in ObjectCache for Iceberg v1 tables#2280
fix: handle missing schema_id in ObjectCache for Iceberg v1 tables#2280vovacf201 wants to merge 1 commit intoapache:mainfrom
Conversation
ObjectCache::get_manifest_list panics on .unwrap() when building the cache key for snapshots without a schema_id (Iceberg v1 format tables don't require this field). Fall back to table_metadata.current_schema_id() when snapshot.schema_id() is None.
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @vovacf201 for this fix!
There was a problem hiding this comment.
There is a dir table_metadata under testdata, and we should move it to that dir. We also should rename it to TableMetadataV1SnapshotNoSchemaId
| /// the snapshot has no schema_id (Iceberg v1 format tables). | ||
| #[tokio::test] | ||
| async fn test_get_manifest_list_v1_snapshot_without_schema_id() { | ||
| let tmp_dir = TempDir::new().unwrap(); |
There was a problem hiding this comment.
There are example above to use TestTableFixture to simplify the tests.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
Which issue does this PR close?
N/A (discovered in production via panic stack trace)
What changes are included in this PR?
ObjectCache::get_manifest_list panics when building the cache key for snapshots that don't have a schema_id set. This happens with Iceberg v1 format tables, where schema_id is optional on snapshots.
The fix falls back to table_metadata.current_schema_id() when snapshot.schema_id() returns None, consistent with how Snapshot::schema() resolves the schema elsewhere in the codebase.
Are these changes tested?
Existing test_get_manifest_list_and_manifest_from_default_cache covers the cache path but uses v2 metadata where schema_id is always present.