[phase-31 3/4] Writer + pipeline wiring#6244
Merged
Conversation
3 tasks
c012908 to
780c585
Compare
3bbfb71 to
95c3596
Compare
08577b5 to
955f230
Compare
179ccd2 to
ed6d687
Compare
955f230 to
3e73d80
Compare
ed6d687 to
a4d0d36
Compare
3e73d80 to
2f78fe8
Compare
a4d0d36 to
f05d4e7
Compare
2f78fe8 to
2703ca5
Compare
Base automatically changed from
gtt/phase-31-compaction-metadata
to
gtt/phase-31-sort-schema
March 31, 2026 21:31
f05d4e7 to
4a0507e
Compare
2703ca5 to
8fce718
Compare
4a0507e to
bc9458d
Compare
8fce718 to
018a265
Compare
…, window, TableConfig Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0d97e82 to
c48b0de
Compare
a8dc7a2 to
4ce16a3
Compare
c48b0de to
ef0ba36
Compare
4ce16a3 to
b89e965
Compare
b89e965 to
b9566a6
Compare
ef0ba36 to
df6e699
Compare
… model, field lookup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire TableConfig-driven sort order into ParquetWriter and add self-describing Parquet file metadata for compaction: - ParquetWriter::new() takes &TableConfig, resolves sort fields at construction via parse_sort_fields() + ParquetField::from_name() - sort_batch() uses resolved fields with per-column direction (ASC/DESC) - SS-1 debug_assert verification: re-sort and check identity permutation - build_compaction_key_value_metadata(): embeds sort_fields, window_start, window_duration, num_merge_ops, row_keys (base64) in Parquet kv_metadata - SS-5 verify_ss5_kv_consistency(): kv_metadata matches source struct - write_to_file_with_metadata() replaces write_to_file() - prepare_write() shared method for bytes and file paths - ParquetWriterConfig gains to_writer_properties_with_metadata() - ParquetSplitWriter passes TableConfig through - All callers in quickwit-indexing updated with TableConfig::default() - 23 storage tests pass including META-07 self-describing roundtrip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b9566a6 to
b6eb595
Compare
df6e699 to
76b703a
Compare
Contributor
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76b703ad24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
mattmkim
approved these changes
Apr 6, 2026
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
Resolve merge conflicts by taking main's versions of otel_metrics.rs and arrow_metrics.rs (the PR didn't modify these files — conflicts came from the base branch divergence). Kept PR's table_config module export in quickwit-parquet-engine/src/lib.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-existing splits were serialized before the parquet_file field was added, so their JSON doesn't contain it. Adding #[serde(default)] makes deserialization fall back to empty string for old splits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the commit timeout fires and the accumulator contains only zero-column batches, union_fields is empty and concat_batches fails with "must either specify a row count or at least one column". Now flush_internal treats empty union_fields the same as empty pending_batches — resets state and returns None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Resolve Cargo.lock/Cargo.toml merge conflicts - P1 (sort column lookup): Already addressed by sort fields tag_ prefix fix — sort field names now match Parquet column names - P2 (window_start at epoch 0): Remove time_range.start_secs > 0 guard so window_start is computed for all batches when window_duration > 0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire TableConfig into ParquetWriter sort path and add self-describing Parquet file metadata for compaction (Phase 31 Metadata Foundation, PR 3 of 4).
Stacks on
gtt/phase-31-compaction-metadata(PR #6243).What's included
storage/writer.rs (rewritten):
ParquetWriter::new()takes&TableConfig, resolves sort field names to physical columnssort_batch()uses resolved fields with per-column ASC/DESC directiondebug_assertverification: re-sort output and check identity permutationbuild_compaction_key_value_metadata(): embeds sort_fields, window_start, window_duration, num_merge_ops, row_keys (base64+JSON) in Parquet kv_metadataverify_ss5_kv_consistency(): kv entries must match source structwrite_to_file_with_metadata()replaceswrite_to_file()prepare_write()shared prep for both bytes and file write pathsresolve_sort_fields(): parse sort schema, map to ParquetField, skip missing columnsstorage/config.rs:
to_writer_properties_with_metadata(sorting_cols, kv_metadata)accepts dynamic sort columns and optional KV metadatato_writer_properties()delegates with empty defaultssorting_columns()method (now in writer)storage/split_writer.rs:
ParquetSplitWriter::new()takes&TableConfigparameterquickwit-indexing (5 files):
ParquetSplitWriter::new()callers updated with&TableConfig::default()Verification
cargo build -p quickwit-parquet-engine -p quickwit-indexing✅cargo test -p quickwit-parquet-engine -- storage::✅ (23 tests)cargo clippy -p quickwit-parquet-engine --all-features --tests✅Test plan
🤖 Generated with Claude Code