refactor(amber): stop hardcoding S3 in REST catalog init#4988
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4988 +/- ##
============================================
+ Coverage 42.64% 42.66% +0.02%
- Complexity 2188 2189 +1
============================================
Files 1045 1045
Lines 39876 39870 -6
Branches 4205 4205
============================================
+ Hits 17004 17010 +6
+ Misses 21811 21800 -11
+ Partials 1061 1060 -1
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works. |
Thanks for the questions. Why the change is needed. When a Lakekeeper warehouse is created, the S3 settings (endpoint, region, credentials, path-style, etc.) are already registered against that warehouse on the server side. At REST-catalog init the client only needs the warehouse identifier and uri — Lakekeeper resolves and serves the S3 config from the warehouse record. The previously hardcoded s3.* properties from StorageConfig were therefore redundant on the client; deleting them lets each warehouse own its own storage settings instead of all warehouses being forced onto the system bucket. I'll tighten the PR description to say just this. About tests. End-to-end verification needs a running Lakekeeper, which CI doesn't have yet. #4276 (draft) adds Lakekeeper to CI; once that lands I'll layer an integration test on top of it that creates a warehouse with its own S3 settings, opens a REST catalog with only warehouse + uri, and round-trips a table. |
Align test_iceberg_rest_catalog_integration.py with create_rest_catalog's new signature after S3 settings stopped being passed at catalog init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
now as #4276 is merged, can we hook up with new tests? |
With #4276 merged, which brought over the Lakekeeper CI job and the two integration tests. These two tests are testing the createRestCatalog in Scala and python. so this PR is covered end-to-end. With CI passed, I think we can confirm that this change works. |
sg. thanks! let's also make sure the coverage is filled, this can make sure your changes in this PR are actually being tested in the CI.
See more #4988 (comment) |
Amber integration job runs without jacoco, so IcebergRestCatalogIntegrationSpec does not register on codecov. Add a unit test that drives createRestCatalog far enough to construct the property Map; .initialize then throws because no Lakekeeper is up in unit-test scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It seems like the amber-integration job doesn't upload to Codecov. Added a mock unit test in IcebergUtilSpec to satisfy the patch number; it just drives createRestCatalog until .initialize throws (no Lakekeeper in unit-test scope). Real coverage still comes from the integration test. |
|
Yes integration test is for now designed not to alter coverage report: we rely on unit tests. |
Tightens the previous coverage-only test: instead of intercepting any Exception, assert RESTException specifically. The property Map is built before .initialize, so a RESTException from either an unreachable Lakekeeper or a missing warehouse confirms the Map composition is sound and the failure is server-side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What changes were proposed in this PR?
Stop hardcoding
s3.endpoint,s3.region,s3.path-style-access,s3.access-key-idands3.secret-access-keyat REST-catalog init in bothIcebergUtil.createRestCatalog(Scala) andiceberg_utils.create_rest_catalog(Python). Both helpers now pass onlywarehouse+ cataloguri(and on the Scala side theFileIOimpl hint).Why: When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs
warehouse+uri— Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcodedStorageConfig.s3*values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings.StorageConfig.s3*itself is kept —pytexera/storage/large_binary_manager.pystill uses it for the non-Icebergtexera-large-binariesbucket (R UDF large-binary support), which is out of scope.Any related issues, documentation, discussions?
Closes #4987
How was this PR tested?
sbt "WorkflowCore/compile"— passes; verifies no other Scala caller depends on the removed properties.ast.parse; the only caller (iceberg_catalog_instance.py) is updated to match the newcreate_rest_catalogsignature.End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only
warehouse+uri→ table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)