feat(io): add bulk delete API to FileIO.#659
Conversation
|
@wgtmac Could you please help review this PR? This is my first code contribution to the iceberg-cpp module, and I would really appreciate your guidance and help. Thank you very much! |
| /// | ||
| /// \param file_location The location of the file to delete. | ||
| /// \return void if the delete succeeded, an error code if the delete failed. | ||
| virtual Status DeleteFile(const std::string& file_location) { |
There was a problem hiding this comment.
I changed this only to avoid the unused parameter warning in the default implementation, since file_location is not used when returning NotImplemented.
I thought it was a small cleanup while touching this area, but I agree it is not required for this PR. I can revert it if you prefer.
| /// | ||
| /// \param file_locations The locations of the files to delete. | ||
| /// \return void if all deletes succeeded, an error code if any delete failed. | ||
| virtual Status DeleteFiles(std::span<const std::string> file_locations) { |
There was a problem hiding this comment.
It would be better if you could make some changes that utilize this new API. I saw you mentioned ExpireSnapshots.
There was a problem hiding this comment.
Thanks, that makes sense. I originally planned to split this into two PRs: first add the FileIO::DeleteFiles API, then update ExpireSnapshots to use it.
I agree that this PR would be more useful if it also adopted the new API. I’ll take a look at updating ExpireSnapshots to use DeleteFiles for grouped file cleanup.
| #include "iceberg/test/matchers.h" | ||
|
|
||
| namespace iceberg { | ||
| namespace { |
There was a problem hiding this comment.
We don't need this or its counterpart. If you want to wrap RecordingFileIO in an anonymous namespace, you should shrink the scope instead.
| namespace { |
There was a problem hiding this comment.
Makes sense. I’ll shrink the anonymous namespace scope to cover only the helper RecordingFileIO and keep the tests outside of the iceberg namespace.
Summary
Add a new
FileIO::DeleteFiles(...)API as a bulk deletion entry point.The default implementation deletes files sequentially by calling the existing
DeleteFile(...)method and returns the first deletion error encountered.This PR only adds the API and backward-compatible fallback behavior. It does not
yet update
ExpireSnapshotsto useDeleteFiles(...), and it does not introduceparallel deletion.
Fixed: #658
Motivation
ExpireSnapshotsand other cleanup flows may need to delete many files. Abulk deletion API gives FileIO implementations a common extension point for
future optimized deletion strategies, such as storage-native batch deletion or
parallel fallback deletion.