feat(parquet): add metrics for parquet reader observability#258
feat(parquet): add metrics for parquet reader observability#258duanyyyyyyy wants to merge 4 commits intoalibaba:mainfrom
Conversation
Add row groups, rows, batch count, and latency metrics to ParquetFileBatchReader, matching the observability level of the ORC reader. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move parquet read metric constants into the existing ParquetMetrics class in parquet_format_defs.h, rather than defining a duplicate class in a new header. Fixes build error introduced in d5a8bfa. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| static inline const char READ_ROWS[] = "parquet.read.rows"; | ||
| static inline const char READ_BATCH_COUNT[] = "parquet.read.batch.count"; | ||
| static inline const char READ_NEXT_BATCH_LATENCY_MS[] = "parquet.read.next_batch.latency.ms"; | ||
| }; |
There was a problem hiding this comment.
Suggestion: for high-level categories, consider using . as the separator to represent hierarchy (e.g., parquet.read), while using - to separate words within the same level (e.g., batch-count).
| read_next_batch_latency_ms_ += timer.Get(); | ||
| metrics_->SetCounter(ParquetMetrics::READ_ROWS, read_rows_); | ||
| metrics_->SetCounter(ParquetMetrics::READ_BATCH_COUNT, read_batch_count_); | ||
| metrics_->SetCounter(ParquetMetrics::READ_NEXT_BATCH_LATENCY_MS, read_next_batch_latency_ms_); |
There was a problem hiding this comment.
Using a counter for READ_NEXT_BATCH_LATENCY_MS seems inappropriate. Since the prefetch reader aggregates metrics across all sub-readers, the resulting value is not very meaningful for a latency metric. If the goal is to capture end-to-end latency, I think it would be more appropriate to record it as a histogram at the framework level, rather than aggregating sub-reader metrics.
There was a problem hiding this comment.
This bug seems also exist in the orc format, we will fix this in the future
Add row groups, rows, batch count, and latency metrics to ParquetFileBatchReader, matching the observability level of the ORC reader.
Purpose
Linked issue: close #xxx
Tests
API and Format
Documentation
Generative AI tooling
Opus 4.6