Skip to content

feat: support rangebitmap read and write#185

Open
fafacao86 wants to merge 6 commits intoalibaba:mainfrom
fafacao86:rangebitmap-inte
Open

feat: support rangebitmap read and write#185
fafacao86 wants to merge 6 commits intoalibaba:mainfrom
fafacao86:rangebitmap-inte

Conversation

@fafacao86
Copy link
Contributor

Purpose

Linked issue: close #146

Tests

UT in rangebtimap_file_index_test.cpp

  1. functional tests: single-chunk, multi-chunk, different types rangebitmap read and write
  2. different query patterns: EQ GT LT GTE LTE ISNULL, query existing data and non-existing data.
  3. different data patterns: normal numbers mixed with NULLs, floats with -0.0, +0.0, NaN etc.
  4. edge cases: empty rangebitmap, rangebitmap with ONLY NULLs.

IT in paimon::test::ReadInteWithIndexTest::CheckResultForRangeBitmap
data is generated using paimon-java v1.3.1.
Same data, same queries, with single-chunk and multi-chunk, result should be the same.

tests are mainly written by AI, reviewed by human.
test coverage:
range_bitmap_file_index.cpp is a little low(82.7%) is because No write integration test to cover CreateWriter Method.

[range_bitmap.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap.cpp.gcov.html)	
96.2%96.2%
96.2 %	203 / 211	100.0 %	19 / 19
[range_bitmap.h](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap.h.gcov.html)	
100.0%
100.0 %	5 / 5	100.0 %	2 / 2
[range_bitmap_file_index.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index.cpp.gcov.html)	
82.7%82.7%
82.7 %	110 / 133	96.6 %	28 / 29
[range_bitmap_file_index.h](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index.h.gcov.html)	
100.0%
100.0 %	1 / 1	100.0 %	2 / 2
[range_bitmap_file_index_factory.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index_factory.cpp.gcov.html)	
100.0%
100.0 %	4 / 4	100.0 %	4 / 4
[range_bitmap_file_index_factory.h](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index_factory.h.gcov.html)	
100.0%
100.0 %	2 / 2	100.0 %	1 / 1
[range_bitmap_file_index_test.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp.gcov.html)	
99.7%99.7%
99.7 %	317 / 318	100.0 %	61 / 61

API and Format

Documentation

Generative AI tooling

Generated-by: Kimi K2.5

const Literal& literal) {
return std::make_shared<BitmapIndexResult>(
[self = shared_from_this(), literal]() -> Result<RoaringBitmap32> {
if (!self->range_bitmap_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit curious — are there any cases where self->range_bitmap_ could be null?
From the current logic, it seems like it should always be initialized as long as no exception occurs during setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the constructor of paimon::RangeBitmapFileIndexReader::RangeBitmapFileIndexReader was public before, I changed it to private method later. So the defensive check is unneccessary.

PAIMON_ASSIGN_OR_RAISE(int32_t min_compare, key.CompareTo(min_));
PAIMON_ASSIGN_OR_RAISE(int32_t max_compare, key.CompareTo(max_));
PAIMON_ASSIGN_OR_RAISE(BitSliceIndexBitmap * bit_slice_ptr, this->GetBitSliceIndex());
if (min_compare == 0 && max_compare == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe BitSliceIndexBitmap* bit_slice_ptr?

return RoaringBitmap32();
}
PAIMON_ASSIGN_OR_RAISE(Dictionary * dictionary, this->GetDictionary());
PAIMON_ASSIGN_OR_RAISE(int32_t code, dictionary->Find(key));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style note: for consistency, could we place the pointer * next to the type instead of the variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-commit seems to be a little problem. I wrote Dictionary* dictionary and run pre-commit run -a, the pre-commit changes it to Dictionary * dictionary. I'll modify it and see if it pass the CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clang-format failed. I guess it might be related to the PAIMON_ASSIGN_OR_RAISE macro, clang-format cannot derive if Dictionary* dictionary is wether a varialble declaration or expressoin.
I checked other places in paimon-cpp, src/paimon/core/io/row_to_arrow_array_converter.h:136, auto seems to be the solution to this.

PAIMON_ASSIGN_OR_RAISE(auto* string_builder,
                       CastToTypedBuilder<arrow::StringBuilder>(array_builder));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* might be derived as a mathmatical multiplication, so clang-format put it in the middle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support RangeBitmap File Index

2 participants