refactor(amber): stop hardcoding S3 in REST catalog init by mengw15 · Pull Request #4988 · apache/texera

mengw15 · 2026-05-08T20:49:40Z

What changes were proposed in this PR?

Stop hardcoding s3.endpoint, s3.region, s3.path-style-access, s3.access-key-id and s3.secret-access-key at REST-catalog init in both IcebergUtil.createRestCatalog (Scala) and iceberg_utils.create_rest_catalog (Python). Both helpers now pass only warehouse + catalog uri (and on the Scala side the FileIO impl hint).

Why: When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs warehouse + uri — Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcoded StorageConfig.s3* values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings.

StorageConfig.s3* itself is kept — pytexera/storage/large_binary_manager.py still uses it for the non-Iceberg texera-large-binaries bucket (R UDF large-binary support), which is out of scope.

Any related issues, documentation, discussions?

Closes #4987

How was this PR tested?

sbt "WorkflowCore/compile" — passes; verifies no other Scala caller depends on the removed properties.
Python edits parse cleanly via ast.parse; the only caller (iceberg_catalog_instance.py) is updated to match the new create_rest_catalog signature.

End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only warehouse + uri → table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

codecov-commenter · 2026-05-08T20:52:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.66%. Comparing base (62d4489) to head (2ebe567).

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #4988      +/-   ##
============================================
+ Coverage     42.64%   42.66%   +0.02%     
- Complexity     2188     2189       +1     
============================================
  Files          1045     1045              
  Lines         39876    39870       -6     
  Branches       4205     4205              
============================================
+ Hits          17004    17010       +6     
+ Misses        21811    21800      -11     
+ Partials       1061     1060       -1

Flag	Coverage Δ		*Carryforward flag
access-control-service	`39.53% <ø> (ø)`
agent-service	`33.72% <ø> (ø)`		Carriedforward from 62d4489
amber	`43.35% <100.00%> (+0.05%)`	⬆️
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`0.00% <ø> (ø)`
file-service	`32.18% <ø> (ø)`
frontend	`33.85% <ø> (ø)`		Carriedforward from 62d4489
python	`88.90% <ø> (ø)`
workflow-compiling-service	`47.72% <ø> (ø)`		Carriedforward from 62d4489

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Yicong-Huang · 2026-05-08T23:48:03Z

Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works.

mengw15 · 2026-05-09T00:30:57Z

Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works.

Thanks for the questions.

Why the change is needed. When a Lakekeeper warehouse is created, the S3 settings (endpoint, region, credentials, path-style, etc.) are already registered against that warehouse on the server side. At REST-catalog init the client only needs the warehouse identifier and uri — Lakekeeper resolves and serves the S3 config from the warehouse record. The previously hardcoded s3.* properties from StorageConfig were therefore redundant on the client; deleting them lets each warehouse own its own storage settings instead of all warehouses being forced onto the system bucket. I'll tighten the PR description to say just this.

About tests. End-to-end verification needs a running Lakekeeper, which CI doesn't have yet. #4276 (draft) adds Lakekeeper to CI; once that lands I'll layer an integration test on top of it that creates a warehouse with its own S3 settings, opens a REST catalog with only warehouse + uri, and round-trips a table.

Align test_iceberg_rest_catalog_integration.py with create_rest_catalog's new signature after S3 settings stopped being passed at catalog init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Yicong-Huang · 2026-05-11T06:50:37Z

now as #4276 is merged, can we hook up with new tests?

mengw15 · 2026-05-11T12:53:49Z

now as #4276 is merged, can we hook up with new tests?

With #4276 merged, which brought over the Lakekeeper CI job and the two integration tests.

These two tests are testing the createRestCatalog in Scala and python. so this PR is covered end-to-end. With CI passed, I think we can confirm that this change works.

Yicong-Huang · 2026-05-11T17:03:32Z

These two tests are testing the createRestCatalog in Scala and python. so this PR is covered end-to-end. With CI passed, I think we can confirm that this change works.

sg. thanks! let's also make sure the coverage is filled, this can make sure your changes in this PR are actually being tested in the CI.

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.75%. Comparing base (14f8be4) to head (b7b3f84).

Files with missing lines	Patch %	Lines
...ala/org/apache/texera/amber/util/IcebergUtil.scala	0.00%	3 Missing ⚠️

See more #4988 (comment)

Amber integration job runs without jacoco, so IcebergRestCatalogIntegrationSpec does not register on codecov. Add a unit test that drives createRestCatalog far enough to construct the property Map; .initialize then throws because no Lakekeeper is up in unit-test scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mengw15 · 2026-05-11T19:05:04Z

These two tests are testing the createRestCatalog in Scala and python. so this PR is covered end-to-end. With CI passed, I think we can confirm that this change works.

sg. thanks! let's also make sure the coverage is filled, this can make sure your changes in this PR are actually being tested in the CI.

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.✅ Project coverage is 42.75%. Comparing base (14f8be4) to head (b7b3f84).

Files with missing lines Patch % Lines
...ala/org/apache/texera/amber/util/IcebergUtil.scala 0.00% 3 Missing ⚠️
See more #4988 (comment)

It seems like the amber-integration job doesn't upload to Codecov. Added a mock unit test in IcebergUtilSpec to satisfy the patch number; it just drives createRestCatalog until .initialize throws (no Lakekeeper in unit-test scope). Real coverage still comes from the integration test.

Yicong-Huang · 2026-05-11T20:37:51Z

Yes integration test is for now designed not to alter coverage report: we rely on unit tests.

Tightens the previous coverage-only test: instead of intercepting any Exception, assert RESTException specifically. The property Map is built before .initialize, so a RESTException from either an unreachable Lakekeeper or a missing warehouse confirms the Map composition is sound and the failure is server-side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mengw15 added 2 commits May 8, 2026 13:16

change

25f6020

change

de6d9b2

github-actions Bot added python refactor Refactor the code common labels May 8, 2026

github-actions Bot assigned mengw15 May 8, 2026

mengw15 requested a review from bobbai00 May 8, 2026 20:50

Merge branch 'main' into refactor/remove-hardcoded-s3-from-rest-catalog

754c29a

mengw15 and others added 3 commits May 10, 2026 09:29

Merge branch 'main' into refactor/remove-hardcoded-s3-from-rest-catalog

e497e3b

Merge branch 'main' into refactor/remove-hardcoded-s3-from-rest-catalog

d7d0461

test(amber): drop removed S3 kwargs from REST catalog integration test

4ef1076

Align test_iceberg_rest_catalog_integration.py with create_rest_catalog's new signature after S3 settings stopped being passed at catalog init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'main' into refactor/remove-hardcoded-s3-from-rest-catalog

b7b3f84

mengw15 and others added 2 commits May 11, 2026 10:45

ci: re-trigger workflows

1fb0526

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mengw15 and others added 2 commits May 11, 2026 18:01

Merge branch 'main' into refactor/remove-hardcoded-s3-from-rest-catalog

2ebe567

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(amber): stop hardcoding S3 in REST catalog init#4988

refactor(amber): stop hardcoding S3 in REST catalog init#4988
mengw15 wants to merge 11 commits into
apache:mainfrom
mengw15:refactor/remove-hardcoded-s3-from-rest-catalog

mengw15 commented May 8, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 8, 2026 •

edited

Loading

Uh oh!

Yicong-Huang commented May 8, 2026

Uh oh!

mengw15 commented May 9, 2026

Uh oh!

Yicong-Huang commented May 11, 2026

Uh oh!

mengw15 commented May 11, 2026

Uh oh!

Yicong-Huang commented May 11, 2026 •

edited

Loading

Uh oh!

mengw15 commented May 11, 2026

Uh oh!

Yicong-Huang commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mengw15 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yicong-Huang commented May 8, 2026

Uh oh!

mengw15 commented May 9, 2026

Uh oh!

Yicong-Huang commented May 11, 2026

Uh oh!

mengw15 commented May 11, 2026

Uh oh!

Yicong-Huang commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mengw15 commented May 11, 2026

Uh oh!

Yicong-Huang commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mengw15 commented May 8, 2026 •

edited

Loading

codecov-commenter commented May 8, 2026 •

edited

Loading

Yicong-Huang commented May 11, 2026 •

edited

Loading