Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776
Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776mkmkme wants to merge 2 commits into
Conversation
…s_and_rows_count_to_iceberg_data_object Parse record_count and size_bytes fields from iceberg manifest file
| if (info.record_count.has_value()) | ||
| LOG_TEST(log, "Iceberg record_count for '{}': {}", object_info->getPath(), *info.record_count); | ||
| if (info.file_size_in_bytes.has_value()) | ||
| LOG_TEST(log, "Iceberg file_size_in_bytes for '{}': {}", object_info->getPath(), *info.file_size_in_bytes); |
There was a problem hiding this comment.
Am I right that writing in log with 'test' level is an only place, where new data are used?
Audit: PR #1776 — Antalya 26.3 Backport of #100645 — Parse
|
ianton-ru
left a comment
There was a problem hiding this comment.
LGTM, but I don't understand a value of this PR. It makes a lot of work only to write two lines in log with 'test' level. Or I missed something.
Verification: PR #1776tats_logging PR-added tests — all GREEN4 parametrized cases × 3 integration jobs = 12 OK runs, 0 failures.
All four parametrizations pass on every job:
The new manifest-file stats path has clean positive coverage on both Iceberg spec versions and on plain-table / view-wrapped reads.
CI overview (head commit)
Test-level failures in DBZero. No Regression-workflow failures (chronic baseline on
|
| Suite | Fails |
|---|---|
| Swarms (Aarch64 + Release) | 227 |
| Parquet (Aarch64 + Release) | 34 |
S3Export partition (Aarch64 + Release) |
20 |
S3Export part (Aarch64 + Release) |
16 |
Same fingerprint as sibling antalya-26.3 PRs (1783, 1775, 1773, 1772, 1771, 1770, 1769, …). No new failure modes.
Caveat — partial frontport
PR lands on antalya-26.3 while companion features from antalya-26.1 are still being frontported in parallel. Final re-verify recommended once the rest of the bundle lands.
Verdict
Safe to merge.
- New integration test
test_iceberg_file_stats_loggingpasses 100% (12/12 integration runs) across all 4 parametrizations and 3 integration jobs. - New gtest for
datalake_table_stateserde compiles and runs green. - Zero test-level FAIL rows on this head.
- All remaining red checks are the recurring
antalya-26.3chronic regression baseline (Swarms / Parquet / S3Export), shared with sibling PRs.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Object information used for parsing data files in iceberg now contains the number of file rows and file size in bytes parsed from manifest file (ClickHouse#100645 by @divanik).
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: