Skip to content

[feature](file-cache) Disable cache writes after remote scan threshold#65058

Draft
bobhan1 wants to merge 9 commits into
apache:masterfrom
bobhan1:doris-disable-remote-scan-cache-write
Draft

[feature](file-cache) Disable cache writes after remote scan threshold#65058
bobhan1 wants to merge 9 commits into
apache:masterfrom
bobhan1:doris-disable-remote-scan-cache-write

Conversation

@bobhan1

@bobhan1 bobhan1 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:

Remote scans can read a large amount of data from object storage. Before this change, a query that had already exceeded its useful cache-write budget could still keep writing later remote cache misses into file cache, increasing cache churn and IO overhead.

This PR adds a query-wide remote-scan file-cache write limiter. After the configured per-query threshold is reached, later cache misses continue reading from remote storage but skip additional file-cache writes.

Segment footer and metadata reads can also write file-cache blocks. This PR adds optional accounting for those metadata reads and propagates the query IO context through segment preload, segment loading, lazy segment iterator initialization, column-reader metadata, and index metadata paths so the same query-wide limiter is applied consistently.

Release note

Add query-level remote-scan file-cache write limiting and optional segment metadata accounting.

Check List (For Author)

  • Test:
    • git diff --check
    • ./build.sh --be --fe --cloud -j100
    • ./run-be-ut.sh --run --filter=BlockFileCacheTest.get_or_set_remote_scan_cache_write_limiter_segment_meta_config:BlockFileCacheTest.cached_remote_file_reader_specialized_write_cache_stats:SegmentFooterCacheTest.GetSegmentFooterPropagatesIoContext:SegmentFooterCacheTest.OpenPropagatesIoContextToFooter:DorisFSDirectoryTest.FSIndexInputSetIoContextPropagatesQueryLimiter -j100
    • env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d cloud_p0/cache/remote_scan_no_write_file_cache -s test_file_cache_query_limit_segment_meta_profile -g docker -runMode=cloud -dockerSuiteParallel 1
  • Behavior changed: Yes. Remote scans can stop writing cache after the per-query threshold, and segment metadata cache writes can be included in that threshold.
  • Does this need documentation: Yes. The new user-visible knobs should be documented in a follow-up docs PR.

bobhan1 added 5 commits June 30, 2026 11:10
…file cache write index only (apache#9450)

pick selectdb/selectdb-core#9096
- Cherry-pick `a37184e9f097e789c4f4e0b40725a98fc49f2851` from
`tag-selectdb-cloud-26.0.3-minimax`.
- Support file cache write index-only behavior and related segment
index/footer preload plumbing.
- Adapt conflicts for this branch local writer, packed file, and
vertical compaction interfaces.

- Preserved this branch existing index writer interface instead of
importing source-branch-only `IndexFileWriterPtr` /
`create_index_file_writer` plumbing.
- Preserved packed-file behavior and added branch-local
`PackedAppendContext::write_file_cache` compatibility with legacy
default behavior.
- Adapted the new vertical compaction test to this branch 4-argument
`RowsetWriter::add_columns` API.
- Did not introduce `enable_file_cache_write_cumu_compaction_index_only`
or `enable_file_cache_write_base_compaction_index_only`, because this
branch did not originally have those configs.

- `git diff HEAD^ HEAD --check`
- `rg -n
"\\bIndexFileWriterPtr\\b|has_ann_index|create_index_file_writer|enable_file_cache_write_(base|cumu)_compaction_index_only|compaction_output_write_index_only|should_enable_compaction_cache_index_only"
be/src be/test regression-test docker/runtime/doris-compose/command.py
-S`
- `./run-be-ut.sh --run
--filter=CloudFileCacheWriteIndexOnlyConfigTest.* -j120`
- `./run-be-ut.sh --run --filter=CloudFileCacheWriteIndexOnly* -j120`
- `./build.sh --be --fe --cloud -j120`
- Rebuilt `foundationdb/foundationdb:7.1.26-single-layer` from remote
registry layers to avoid the local containerd overlay mount failure.
Verified the imported image has one rootfs layer.
- Rebuilt `bh-cluster-2` with cloud FE enterprise guard jar present in
`output/fe/lib/fe-enterprise.jar`, so FE can load
`org.apache.doris.cluster.ClusterGuard` in cloud docker mode.
- `env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u
ALL_PROXY -u all_proxy
DORIS_FDB_IMAGE=foundationdb/foundationdb:7.1.26-single-layer
./run-regression-test.sh --run -d
regression-test/suites/cloud_p0/cache/write_index_only -runMode=cloud`
- Result: `Test 3 suites, failed 0 suites, fatal 0 scripts, skipped 0
scripts`
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: The empty-range file cache loader unit test expected load_segment_index_to_file_cache to skip opening segment data, but it did not enable cloud mode, so the function returned at the cloud-mode config gate before reaching the empty-range guard. Enable cloud mode in the fixture, restore it in teardown, and make the S3 open path fail if an empty range ever tries to open the segment.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - `./run-be-ut.sh --run --filter=CloudFileCacheWriteIndexOnlyConfigTest.* -j100`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: TopN lazy materialization phase 2 may populate file cache while fetching deferred columns. This can pollute cache when the requested ranges are cache misses. Add a cloud-only session switch for the new PMultiGetRequestV2 path so phase-2 reads use cached blocks only when the full range is already downloaded, and otherwise read remote data directly without writing file cache. The change also exposes phase-2 file-cache counters in the MaterializeNode profile and covers row-store and column-store fetch paths.

### Release note

Added session variable `enable_topn_lazy_mat_phase2_no_write_file_cache` to avoid file-cache writes on TopN lazy materialization phase-2 cache misses.

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=BlockFileCacheTest.get_downloaded_blocks_if_fully_covered_is_read_only:BlockFileCacheTest.cached_remote_file_reader_remote_only_on_miss -j20
    - Build: ./build.sh --be --fe --cloud -j100
    - Format: build-support/check-format.sh
    - Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d cloud_p0/cache/topn_lazy_file_cache -s test_topn_lazy_mat_phase2_no_write_file_cache -g docker -runMode=cloud -dockerSuiteParallel 1
- Behavior changed: Yes. When the new session variable is enabled in cloud mode, TopN lazy materialization phase-2 cache misses read remote data without writing file cache.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: TopN lazy materialization phase 2 exposed only aggregated profile counters, which makes backend-level skew and IO differences hard to identify when phase-2 fetch fans out to multiple backends. Add aggregate rows/segments counters and per-backend rows, segments, and file-cache statistics for the new TopN lazy materialization V2 path. The per-backend values are accumulated in MaterializationSharedState so multiple phase-2 fetch calls in one query are reflected in the final profile.

### Release note

Added per-backend TopN lazy materialization phase-2 profile counters.

### Check List (For Author)

- Test:
    - Build: ./build.sh --be --fe --cloud -j100
    - Format: build-support/clang-format.sh; build-support/check-format.sh; git diff --check
    - Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d cloud_p0/cache/topn_lazy_file_cache -s test_topn_lazy_mat_phase2_no_write_file_cache -g docker -runMode=cloud -dockerSuiteParallel 1
- Behavior changed: Yes. TopN lazy materialization phase-2 profiles now include aggregate row/segment counts and per-backend detail counters.
- Does this need documentation: No
…able cache writes after remote scan threshold (apache#9151) (apache#9531)

pick selectdb/selectdb-core#9151 without the
dependency of fe session var enable_file_cache, and fix the control for
inverted index file

- Backport remote scan cache write limiting for
`remote_scan_no_write_file_cache_threshold_bytes`.
- Remove the dependency on FE session variable `enable_file_cache` so
the threshold still disables cache writes when session file-cache reads
are disabled.
- Add and update BE UT / cloud docker regression coverage for the
threshold behavior.

- `git diff --check`
- `./run-be-ut.sh --run
--filter=BlockFileCacheTest.file_cache_profile_remote_only_on_miss_state_counters:BlockFileCacheTest.remote_scan_cache_write_limiter_strict_budget:BlockFileCacheTest.remote_scan_cache_write_limiter_threshold_zero_and_negative:BlockFileCacheTest.remote_scan_cache_write_limiter_concurrent_budget:BlockFileCacheTest.get_or_set_remote_scan_cache_write_limiter_admission:BlockFileCacheTest.cached_remote_file_reader_policy_remote_only_with_scan_limiter:BlockFileCacheTest.cached_remote_file_reader_remote_scan_cache_write_limiter
-j100`
- `./build.sh --be --cloud -j100`
- `./build.sh --be -j100`
- Verified FE license jar in docker image `bh-cluster-2`.
- `env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u
ALL_PROXY -u all_proxy
DORIS_FDB_IMAGE=foundationdb/foundationdb:7.1.26-single-layer
./run-regression-test.sh --run -d
regression-test/suites/cloud_p0/cache/remote_scan_no_write_file_cache -s
test_remote_scan_no_write_file_cache_threshold -runMode=cloud
-dockerSuiteParallel 1`

(cherry picked from commit c45f7c05039841bc1c97be497b325cc8b015e116)
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 changed the title [feature](file-cache) Add file cache write controls for index and remote scans [feature](file-cache) Disable cache writes after remote scan threshold Jul 1, 2026
bobhan1 added 4 commits July 1, 2026 10:58
…remote scan cache limit session var (apache#9578)

- Rename `remote_scan_no_write_file_cache_threshold_bytes` to
`file_cache_query_limit_bytes` across FE session variables, thrift, and
BE query options.
- Keep the remote scan cache write limiter behavior unchanged while
aligning the byte-based option with `file_cache_query_limit_percent`
naming.
- Update the remote scan cache regression suite and make the
segment-footer cache-write assertion conditional on actual footer remote
IO.

- `./run-fe-ut.sh --run org.apache.doris.qe.SessionVariablesTest`
- `./build.sh --be --fe --cloud -j100`
- `mvn package -pl fe-enterprise/fe-license-cloud -am -DskipTests`
- `docker build -f docker/runtime/doris-compose/Dockerfile -t
bh-cluster-2 .`
- `env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u
ALL_PROXY -u all_proxy
DORIS_FDB_IMAGE=foundationdb/foundationdb:7.1.26-single-layer
./run-regression-test.sh --run -d
regression-test/suites/cloud_p0/cache/remote_scan_no_write_file_cache -s
test_remote_scan_no_write_file_cache_threshold -runMode=cloud
-dockerSuiteParallel 1 -image bh-cluster-2`
Add specialized file cache write metrics for inverted index and segment
footer index reads.

The query profile already exposed detailed file cache read counters for
`InvertedIndex` and `SegmentFooterIndex`, but write-back into file cache
was only visible through the aggregate `BytesWriteIntoCache` and
`WriteCacheIOUseTimer` counters. This change adds the corresponding
specialized write counters:

- `InvertedIndexWriteCacheIOUseTimer`
- `InvertedIndexBytesWriteIntoCache`
- `SegmentFooterIndexWriteCacheIOUseTimer`
- `SegmentFooterIndexBytesWriteIntoCache`

- Extend `FileCacheStatistics` with per-category write-cache bytes and
timer fields.
- Register and update the new `RuntimeProfile` counters under
`FileCache`.
- Route `CachedRemoteFileReader` write-back bytes and local write timer
through the existing `FileCacheReadType` classification.
- Add BE UT coverage for both profile counter reporting and real cached
remote reader miss-fill stats.

- `git diff --check`
- `./run-be-ut.sh --run
--filter=BlockFileCacheTest.file_cache_profile_specialized_write_cache_counters:BlockFileCacheTest.cached_remote_file_reader_specialized_write_cache_stats:BlockFileCacheTest.cached_remote_file_reader_remote_scan_cache_write_limiter:BlockFileCacheTest.cached_remote_file_reader_policy_remote_only_with_scan_limiter
-j100`
… segment meta query limit accounting (apache#9591)

Add a BE config `file_cache_query_limit_segment_meta` to control whether
segment footer and other segment metadata cache writes are counted by
`file_cache_query_limit_bytes`.

The default remains unchanged: segment metadata cache writes do not
consume the per-query write limit unless the new config is enabled.
Inverted index cache writes keep their existing behavior.

`file_cache_query_limit_bytes` is enforced through the per-query
`RemoteScanCacheWriteLimiter`, but several segment metadata read paths
created fresh `IOContext` instances or loaded metadata without receiving
the query `IOContext`.

As a result, segment footer and related segment metadata cache writes
were not consistently visible to the query-level limiter.

- Add mutable BE config `file_cache_query_limit_segment_meta`.
- Update `CacheContext` admission so segment metadata participates in
query write-limit accounting only when the config is enabled.
- Propagate query `IOContext` through segment footer, primary-key index,
column reader cache, and variant subcolumn reader paths.
- Keep the metadata classification as segment metadata
(`is_index_data=true`, `is_inverted_index=false`) so the behavior is not
confused with inverted indexes.
- Add BE UT coverage for CacheContext admission and segment footer
`IOContext` propagation.
- Add a cloud docker profile case that validates:
- segment metadata not counted: profile `BytesWriteIntoCache` can exceed
the query threshold by metadata writes;
  - segment metadata counted: admitted writes respect the threshold;
- tiny threshold: config off gives `BytesWriteIntoCache > 0`, config on
gives `BytesWriteIntoCache = 0`.

- `./run-be-ut.sh --run
--filter=BlockFileCacheTest.get_or_set_remote_scan_cache_write_limiter_segment_meta_config:SegmentFooterCacheTest.GetSegmentFooterPropagatesIoContext:DorisFSDirectoryTest.FSIndexInputSetIoContextPropagatesQueryLimiter
-j100`
- `./build.sh --be -j100`
- `cd fe && mvn -pl fe-enterprise/fe-license-cloud -am -DskipTests
package`
- Rebuilt docker image `bh-cluster-2` with rebuilt `doris_be` and
`fe-license-cloud-1.2-SNAPSHOT.jar` copied into
`output/fe/lib/fe-enterprise.jar`.
- Docker validation used
`DORIS_FDB_IMAGE=foundationdb/foundationdb:7.1.26-single-layer` and no
license URL.
- Existing added file-cache docker cases passed:
-
`remote_scan_no_write_file_cache/test_remote_scan_no_write_file_cache_threshold.groovy`
-
`topn_lazy_file_cache/test_topn_lazy_mat_phase2_no_write_file_cache.groovy`
  - all cases under `write_index_only`
- New docker case passed:
-
`remote_scan_no_write_file_cache/test_file_cache_query_limit_segment_meta_profile.groovy`
…ery IO context to preload segment meta (apache#9601)

fix selectdb/selectdb-core#9591

Fix query file-cache limit accounting for segment footer/meta reads done
by the parallel scanner preload path.

`file_cache_query_limit_segment_meta` only worked on paths that already
carried the query `IOContext`. `ParallelScannerBuilder::_load()`
preloads segment row counts by opening segment footers before scanner
construction, but that path did not pass the query IO context, so
footer/meta reads did not see the query id, file-cache profile stats, or
the query-wide `RemoteScanCacheWriteLimiter`.

This change threads an optional `IOContext` through `SegmentLoader`,
`BetaRowset::get_segment_num_rows`, and `Segment::open`, and builds a
query IO context for the parallel preload step. It also extends coverage
for `Segment::open` IOContext propagation and adds docker regression
coverage for the parallel preload behavior without `dry_run_query`.

- `git diff --check`
- `./run-be-ut.sh --run --filter=SegmentFooterCacheTest.* -j100`
- `mvn package -pl fe-enterprise/fe-license-cloud -am -DskipTests`
- `cp
fe/fe-enterprise/fe-license-cloud/target/fe-license-cloud-1.2-SNAPSHOT.jar
output/fe/lib/fe-enterprise.jar`
- `./build.sh --be -j100`
- `docker build -f docker/runtime/doris-compose/Dockerfile -t
bh-cluster-2 .`
- `env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u
ALL_PROXY -u all_proxy
DORIS_FDB_IMAGE=foundationdb/foundationdb:7.1.26-single-layer
./run-regression-test.sh --run -d
regression-test/suites/cloud_p0/cache/remote_scan_no_write_file_cache -s
test_remote_scan_no_write_file_cache_threshold -runMode=cloud
-dockerSuiteParallel 1 -image bh-cluster-2`
- `env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u
ALL_PROXY -u all_proxy
DORIS_FDB_IMAGE=foundationdb/foundationdb:7.1.26-single-layer
./run-regression-test.sh --run -d
regression-test/suites/cloud_p0/cache/remote_scan_no_write_file_cache -s
test_file_cache_query_limit_segment_meta_profile -runMode=cloud
-dockerSuiteParallel 1 -image bh-cluster-2`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants