Adds intermediate dataType to schema and use it for ingestion aggregation#16868
Adds intermediate dataType to schema and use it for ingestion aggregation#16868noob-se7en wants to merge 13 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #16868 +/- ##
============================================
- Coverage 63.25% 63.23% -0.02%
Complexity 1499 1499
============================================
Files 3174 3176 +2
Lines 190323 190430 +107
Branches 29080 29096 +16
============================================
+ Hits 120381 120422 +41
- Misses 60606 60654 +48
- Partials 9336 9354 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@Jackie-Jiang added intermediate field spec in schema: Like: |
|
I guess for transformation the ingestion itself, at row level, will throw exceptions and we won't wait till the segment build ? |
I don't understand the questions fully. Code changes are only in MutableSegmentImpl. This PR is only meant for supporting realtime ingestion aggregation (which happens during indexing of mutable segments) |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Well done.
Given the field type name cannot be changed in the future, do you see intermediate a common field type name in other DBs?
| @@ -49,11 +49,28 @@ public interface ValueAggregator<R, A> { | |||
| A getInitialAggregatedValue(@Nullable R rawValue); | |||
There was a problem hiding this comment.
Seems we can deprecate this method as long as A applyRawValue(A value, R rawValue);
| * Returns the initial aggregated value with the optional source data type provided for correct raw value handling. | ||
| * Default implementation delegates to {@link #getInitialAggregatedValue(Object)} for backward compatibility. | ||
| */ | ||
| default A getInitialAggregatedValue(@Nullable R rawValue, @Nullable DataType sourceDataType) { |
There was a problem hiding this comment.
Star-tree builder can also be switched to use the new set of methods
|
Taking a different approach in #18816, where user can add optional data type conversion for any source fields. |
Problem
Related to #16317 . TLDR: When the ingestion aggregation/tranformation happens on source column not present in schema, There can be exceptions thrown which occur from data type conversions since there is no info of those source column as they are not present in the schema.
Example: Ingestion aggregation:
sum(price), Here if price column is not part of schema, Pinot assumes it to be as Number but it can be String in source.PR
Add new intermediate field type like below to schema and use this info in ingestion aggregation.
Pending
Adding more tests. Opening this PR to get early reviews.