Skip to content

*: support new columnar data source#10842

Open
yongman wants to merge 12 commits into
pingcap:masterfrom
yongman:pick-columnar-to-master
Open

*: support new columnar data source#10842
yongman wants to merge 12 commits into
pingcap:masterfrom
yongman:pick-columnar-to-master

Conversation

@yongman
Copy link
Copy Markdown
Member

@yongman yongman commented May 13, 2026

What problem does this PR solve?

Issue Number: close #10844

Problem Summary:

What is changed and how it works?


Add the new columnar storage as data source for TiDB X.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Manual Test

  1. Build tiflash with Option -DENABLE_NEXT_GEN_COLUMNAR=ON to enable building with tiflash-proxy-columnar
    cmake -GNinja -DCMAKE_BUILD_TYPE=DEBUG -DENABLE_NEXT_GEN=ON -DENABLE_NEXT_GEN_COLUMNAR=ON ..
  2. Run the tiflash compute node with option use_columnar=true in config.
  3. Run TiFlash workloads.
MySQL root@localhost:test> show create table t1;
+-------+-------------------------------------------------------------+
| Table | Create Table                                                |
+-------+-------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (                                         |
|       |   `a` int DEFAULT NULL                                      |
|       | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-------------------------------------------------------------+

1 row in set
Time: 0.019s
MySQL root@localhost:test> SET @@session.tidb_isolation_read_engines='tiflash';
Query OK, 0 rows affected
Time: 0.001s
MySQL root@localhost:test> explain analyze select * from t1;
+---------------------+---------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------+------+
| id                  | estRows | actRows | task         | access object | execution info                                                                                                                                                          | operator info                        | memory    | disk |
+---------------------+---------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------+------+
| TableReader_10      | 5.00    | 5       | root         |               | time:61.7ms, open:10.4ms, close:12µs, loops:2, RU:0.00, cop_task: {num: 2, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}, fetch_resp_duration: 51.2ms | MppVersion: 3, data:ExchangeSender_9 | 686 Bytes | N/A  |
| └─ExchangeSender_9  | 5.00    | 5       | mpp[tiflash] |               | tiflash_task:{time:40.8ms, loops:1, threads:1}, tiflash_network: {inner_zone_send_bytes: 56}                                                                            | ExchangeType: PassThrough            | N/A       | N/A  |
|   └─TableFullScan_8 | 5.00    | 5       | mpp[tiflash] | table:t1      | tiflash_task:{time:40.8ms, loops:1, threads:1}                                                                                                                          | keep order:false, stats:pseudo       | N/A       | N/A  |
+---------------------+---------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------+------+

3 rows in set
Time: 0.073s
MySQL root@localhost:test>

Summary by CodeRabbit

  • New Features

    • Columnar disaggregated reads via a TiFlash proxy path.
    • New runtime/config option to opt into columnar disaggregated behavior.
  • Chores

    • Added dedicated columnar proxy subproject and build-time flag to control columnar proxy support.
    • Build now conditionally includes/excludes columnar disaggregated code and reports the selected proxy mode in logs.
  • Bug Fixes

    • Improved proxy read error handling with clearer retry and snapshot semantics.

Review Change Stack

Signed-off-by: yongman <yming0221@gmail.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels May 13, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hehechen for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Adds an opt-in columnar disaggregated read path: new CMake option and submodule, parses and propagates use_columnar through config and startup, adjusts proxy config/start logic, routes StorageDisaggregated reads to proxy-backed columnar code, declares RNProxy types, implements proxy read tasks/streams/operators with error mapping and retry, and provides disabled-build fallbacks.

Changes

Columnar Disaggregated Storage Feature

Layer / File(s) Summary
Build and Submodule Setup
.gitmodules, CMakeLists.txt, cmake/find_tiflash_proxy.cmake, contrib/tiflash-proxy-cmake/CMakeLists.txt, contrib/tiflash-proxy-columnar, dbms/CMakeLists.txt, dbms/src/Common/config.h.in
Add ENABLE_NEXT_GEN_COLUMNAR option, new contrib/tiflash-proxy-columnar submodule and commit pin, proxy source selection and serverless-proxy checks, jemalloc feature selection update, and conditional inclusion of columnar sources.
Configuration Propagation
dbms/src/Core/TiFlashDisaggregatedMode.h, dbms/src/Core/TiFlashDisaggregatedMode.cpp, dbms/src/Interpreters/SharedContexts/Disagg.h, dbms/src/Interpreters/Context.cpp, dbms/src/Server/Server.cpp
Add use_columnar to DisaggOptions, parse flash.use_columnar, and propagate the flag into shared/global context and server startup wiring.
Proxy Configuration and Startup Wiring
dbms/src/Storages/KVStore/ProxyStateMachine.h, dbms/src/Common/ErrorCodes.cpp, dbms/src/Common/TiFlashBuildInfo.cpp
TiFlashProxyConfig accepts use_columnar; try-parse logic supports init-only proxy startup when columnar is enabled; getArgs() omits empty arg values; added COLUMNAR_SNAPSHOT_ERROR error code and build-time "columnar" feature string.
Storage Infrastructure & Helpers
dbms/src/Storages/KVStore/KVStore.h, dbms/src/Storages/KVStore/KVStore.cpp, dbms/src/Storages/KVStore/TMTContext.cpp, dbms/src/DataStreams/AddExtraTableIDColumnTransformAction.h, dbms/src/DataStreams/AddExtraTableIDColumnTransformAction.cpp, dbms/src/DataTypes/DataTypeDecimal.cpp, dbms/src/Storages/DeltaMerge/DMContext.cpp, dbms/src/Storages/KVStore/tests/*
KVStore adds restoreProxyHelper(). TMTContext preserves kvstore when columnar enabled and restores proxy helper on restore. Extra-table-id transform gains fill(); Decimal256 deserialization adds proxy-compatible decode fast path; removed unused config.h includes and minor formatting.
Storage Read Path Routing
dbms/src/Storages/StorageDisaggregated.h, dbms/src/Storages/StorageDisaggregated.cpp
StorageDisaggregated short-circuits/returns early in columnar mode; both read() overloads delegate to readThroughProxy() when isReadColumnar() is true. getExtraCastExpr() and extraCast() extended to consider pushed-down filter columns; added includes for DAGUtils and Disagg.
Columnar Read Component Type Declarations
dbms/src/Storages/StorageDisaggregatedColumnar.h
Declares RNProxyReader, RNProxyReadTask, RNProxyInputStream, and RNProxySourceOp types with options, factories, and lifecycle interfaces.
Columnar Proxy Read Implementation
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Implements isReadColumnar, both readThroughProxy overloads, merging pushed-down filters and generated-column placeholder execution, RNProxyReader creation with error mapping (region/snapshot/lock), RNProxyReadTask building with region validation and concurrency sizing, RNProxyInputStream deserialization and accounting, RNProxySourceOp operator execution, and backoff/retry variants.
Fallback Implementations for Disabled Build
dbms/src/Storages/StorageDisaggregatedRemote.cpp
When ENABLE_NEXT_GEN_COLUMNAR == 0: isReadColumnar() returns false; readThroughProxy() overloads abort at runtime; filterConditionsWithPushedDownFilters() forward to existing filterConditions().

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

approved, lgtm

Suggested reviewers

  • JinheLin
  • CalvinNeo
  • yudongusa

🐰 I hopped a branch and added a gate,
A proxy path for columnar fate,
Submodule pinned and flags aligned,
Readers, tasks, and streams combined,
Now rows leap home through cloud and crate!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'support new columnar data source' clearly and concisely summarizes the main change—adding a new columnar storage as a data source for the system.
Linked Issues check ✅ Passed The PR implements the core objective from issue #10844 by adding columnar storage support with build-time flag ENABLE_NEXT_GEN_COLUMNAR and runtime option use_columnar, enabling TiDB X to read from the new columnar format.
Out of Scope Changes check ✅ Passed All changes are scoped to the columnar storage feature; modifications span build configuration, proxy integration, storage classes, and data streaming—all directly supporting the feature objective.
Description check ✅ Passed PR description includes issue number, problem summary, changes overview, checklist items, and comprehensive manual test section with build command, configuration, and example MySQL output.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: yongman <yming0221@gmail.com>
@yongman yongman marked this pull request as ready for review May 14, 2026 08:17
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 14, 2026

@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
dbms/src/Storages/KVStore/ProxyStateMachine.h (1)

132-132: ⚖️ Poor tradeoff

Consider wrapping init_only logic in conditional compilation.

The init_only variable (line 135) and its assignment logic (lines 139-151) are only used within the #if SERVERLESS_PROXY == 1 block (lines 205-206). When SERVERLESS_PROXY is not defined, this variable and the associated logic are unused. Consider wrapping the entire init_only variable declaration and assignment in #if SERVERLESS_PROXY == 1 for cleaner conditional compilation.

♻️ Suggested refactor
     bool tryParseFromConfig(
         const Poco::Util::LayeredConfiguration & config,
         const DisaggregatedMode disaggregated_mode,
         const bool use_autoscaler,
         const bool use_columnar,
         const LoggerPtr & log)
     {
-        bool init_only = false;
         // tiflash_compute doesn't need proxy except when using columnar.
         if (disaggregated_mode == DisaggregatedMode::Compute && use_autoscaler)
         {
             if (use_columnar)
             {
+#if SERVERLESS_PROXY == 1
+                bool init_only = true;
                 LOG_INFO(
                     log,
                     "TiFlash Proxy will start because columnar is enabled with AutoScale Disaggregated Compute Mode "
                     "specified.");
-                init_only = true;
+#else
+                LOG_INFO(log, "TiFlash Proxy will not start because SERVERLESS_PROXY is not enabled.");
+                return false;
+#endif
             }

Also applies to: 135-151

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/KVStore/ProxyStateMachine.h` at line 132, The variable
init_only and its assignment logic are only used under the SERVERLESS_PROXY
feature flag; to avoid unused-code when SERVERLESS_PROXY != 1, move the
declaration and assignment of init_only inside a `#if` SERVERLESS_PROXY == 1 /
`#endif` block. Locate the init_only declaration and the block that assigns it
(inside the ProxyStateMachine constructor or where init_only is defined) and
wrap both the declaration and the subsequent assignment logic in the conditional
compilation so the symbol is only compiled when SERVERLESS_PROXY is enabled.
dbms/src/Storages/StorageDisaggregatedColumnar.h (1)

234-234: 💤 Low value

Remove or document the commented-out field.

The commented line //double duration_wait_ready_task_sec = 0; should either be removed if not needed or uncommented with documentation if planned for future use.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.h` at line 234, Remove or
properly document the commented-out field `duration_wait_ready_task_sec` in
StorageDisaggregatedColumnar: either delete the line `//double
duration_wait_ready_task_sec = 0;` if it's unused, or uncomment it and add a
brief comment explaining its purpose and units (e.g., "time spent waiting for
ready tasks in seconds") and why it's kept for future use so its intent is clear
to readers and static analysis.
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)

221-224: 💤 Low value

Consider improving readability of the boolean parameter.

The false parameter in convertTimeZoneByOffset lacks context. If the API doesn't support named parameters, consider adding a brief comment explaining what the boolean controls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 221 - 224,
The anonymous boolean literal `false` passed to convertTimeZoneByOffset reduces
readability; update the call site where timezone_info.timezone_offset is handled
to either replace `false` with a clearly named constant/enum (e.g., use a
constant like IGNORE_DST or DONT_APPLY_DST) or add an inline comment describing
the flag's meaning (for example: /* apply_dst = false */) so future readers
immediately understand what that parameter controls; locate the call to
convertTimeZoneByOffset near convertTimeZone and timezone_info.timezone_offset
in StorageDisaggregatedColumnar.cpp to make the change.

566-566: ⚡ Quick win

Use fmt-style Exception constructor.

These exceptions use the old-style constructor. As per coding guidelines, prefer the fmt-style constructor with error code first: throw Exception(ErrorCodes::SOME_CODE, "Message").

♻️ Suggested fix
-        throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");
-        throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error");
-        throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "unknown error type");

As per coding guidelines: "Prefer the fmt-style constructor for DB::Exception with error code first: throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);"

Also applies to: 571-571, 576-576

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 566, Replace
old-style DB::Exception construction with the fmt-style (error code first) for
the thrown exceptions in StorageDisaggregatedColumnar.cpp: change the calls that
currently use throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR)
(and the other two similar occurrences) to use throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") so the error code
is the first argument; ensure you update each instance that uses the old-ordered
arguments to the new fmt-style Exception(ErrorCodes::..., "message") form.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmake/find_tiflash_proxy.cmake`:
- Around line 63-64: Replace the invalid CMake message mode "FATAL" with the
correct "FATAL_ERROR" in the conditional that checks for the next-gen
tiflash-proxy; specifically update the message(...) call inside the if
(ENABLE_NEXT_GEN AND NOT ENABLE_COLUMNAR_DISAGG AND NOT EXISTS
".../contrib/tiflash-proxy-next-gen/.../cloud_helper.rs") branch so it uses
FATAL_ERROR (matching other occurrences like the one near line 58 and
find_tipb.cmake) to ensure configuration stops when the file is missing.

---

Nitpick comments:
In `@dbms/src/Storages/KVStore/ProxyStateMachine.h`:
- Line 132: The variable init_only and its assignment logic are only used under
the SERVERLESS_PROXY feature flag; to avoid unused-code when SERVERLESS_PROXY !=
1, move the declaration and assignment of init_only inside a `#if`
SERVERLESS_PROXY == 1 / `#endif` block. Locate the init_only declaration and the
block that assigns it (inside the ProxyStateMachine constructor or where
init_only is defined) and wrap both the declaration and the subsequent
assignment logic in the conditional compilation so the symbol is only compiled
when SERVERLESS_PROXY is enabled.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 221-224: The anonymous boolean literal `false` passed to
convertTimeZoneByOffset reduces readability; update the call site where
timezone_info.timezone_offset is handled to either replace `false` with a
clearly named constant/enum (e.g., use a constant like IGNORE_DST or
DONT_APPLY_DST) or add an inline comment describing the flag's meaning (for
example: /* apply_dst = false */) so future readers immediately understand what
that parameter controls; locate the call to convertTimeZoneByOffset near
convertTimeZone and timezone_info.timezone_offset in
StorageDisaggregatedColumnar.cpp to make the change.
- Line 566: Replace old-style DB::Exception construction with the fmt-style
(error code first) for the thrown exceptions in
StorageDisaggregatedColumnar.cpp: change the calls that currently use throw
Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) (and the other two
similar occurrences) to use throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
"lock error") so the error code is the first argument; ensure you update each
instance that uses the old-ordered arguments to the new fmt-style
Exception(ErrorCodes::..., "message") form.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.h`:
- Line 234: Remove or properly document the commented-out field
`duration_wait_ready_task_sec` in StorageDisaggregatedColumnar: either delete
the line `//double duration_wait_ready_task_sec = 0;` if it's unused, or
uncomment it and add a brief comment explaining its purpose and units (e.g.,
"time spent waiting for ready tasks in seconds") and why it's kept for future
use so its intent is clear to readers and static analysis.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0082d8d8-0032-40bb-9aac-04d5ea134b44

📥 Commits

Reviewing files that changed from the base of the PR and between ed4e382 and ed0a0a0.

📒 Files selected for processing (25)
  • .gitmodules
  • CMakeLists.txt
  • cmake/find_tiflash_proxy.cmake
  • contrib/tiflash-proxy-cmake/CMakeLists.txt
  • contrib/tiflash-proxy-columnar
  • dbms/CMakeLists.txt
  • dbms/src/Common/ErrorCodes.cpp
  • dbms/src/Common/config.h.in
  • dbms/src/Core/TiFlashDisaggregatedMode.cpp
  • dbms/src/Core/TiFlashDisaggregatedMode.h
  • dbms/src/DataStreams/AddExtraTableIDColumnTransformAction.cpp
  • dbms/src/DataStreams/AddExtraTableIDColumnTransformAction.h
  • dbms/src/DataTypes/DataTypeDecimal.cpp
  • dbms/src/Interpreters/Context.cpp
  • dbms/src/Interpreters/SharedContexts/Disagg.h
  • dbms/src/Server/Server.cpp
  • dbms/src/Storages/KVStore/KVStore.cpp
  • dbms/src/Storages/KVStore/KVStore.h
  • dbms/src/Storages/KVStore/ProxyStateMachine.h
  • dbms/src/Storages/KVStore/TMTContext.cpp
  • dbms/src/Storages/StorageDisaggregated.cpp
  • dbms/src/Storages/StorageDisaggregated.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.h
  • dbms/src/Storages/StorageDisaggregatedRemote.cpp

Comment thread cmake/find_tiflash_proxy.cmake Outdated
Signed-off-by: yongman <yming0221@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)

821-824: ⚡ Quick win

Log the swallowed destructor exception.

Keeping the destructor non-throwing is right, but silently discarding the failure makes accounting/logging issues invisible. tryLogCurrentException keeps the behavior and preserves the signal.

🪵 Suggested change
     catch (...)
     {
-        // Destructors must not throw.
+        tryLogCurrentException(log, "Failed to finalize proxy read accounting");
     }
As per coding guidelines, "In broad `catch (...)` paths, prefer `tryLogCurrentException(log, "context")` to avoid duplicated exception-formatting code"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 821 - 824,
The destructor currently swallows all exceptions with a bare catch(...) in
StorageDisaggregatedColumnar (or the related destructor shown) — preserve the
non-throwing behavior but log the exception by calling
tryLogCurrentException(log,
"StorageDisaggregatedColumnar::~StorageDisaggregatedColumnar") (or an
appropriate context string) inside the catch block; ensure you reference the
existing logger variable used in this translation unit and do not rethrow so
destructors remain noexcept.

555-577: ⚡ Quick win

Use the error-code-first DB::Exception constructor here.

These new throws use the legacy (message, code) overload. Please switch them to the repo-standard (code, fmt, ...) form for consistency.

🔧 Suggested cleanup
-        throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");
@@
-        throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error");
@@
-        throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(
+            ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
+            "unknown error type");
@@
-        throw Exception("read_block failed in tiflash-proxy", ErrorCodes::LOGICAL_ERROR);
+        throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in tiflash-proxy");
As per coding guidelines, "Prefer the fmt-style constructor for `DB::Exception` with error code first: `throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);`"

Also applies to: 849-851

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 555 - 577,
Replace the legacy DB::Exception constructor calls that pass (message, code)
with the fmt-style, error-code-first form; specifically, in the branches
handling ColumnarReaderErrorType::LockedError, ::PdClientError and the default
non-OK branch (where current throws are throw Exception("lock error",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), throw Exception("pd client error",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) and throw Exception("unknown error type",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR)), change them to throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") / throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error") / throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "unknown error type, error_type
{}", uint8_t(columnar_reader.error_type)) respectively; apply the same pattern
to the analogous throws near the other occurrence that uses the old overload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 241-277: In StorageDisaggregated::readThroughProxy, guard against
empty remote regions / no proxy tasks and empty pipeline before doing any sizing
or header work: after calling buildRemoteTableRanges(), if remote_table_ranges
is empty (or region_num==0) return an empty BlockInputStreams immediately; after
building read_proxy_tasks and populating pipeline.streams, if
pipeline.streams.empty() return early as well before accessing
pipeline.firstStream() or creating analyzer; ensure these early returns happen
before any divisions (regions_per_reader) or calls that assume a non-empty
header (analyzer, extraCast, filterConditionsWithPushedDownFilters) so
downstream code won’t dereference empty streams or divide by zero.
- Around line 936-1003: In RNProxySourceOp::readImpl, RNProxySourceOp::awaitImpl
and RNProxySourceOp::executeIOImpl replace all occurrences of the function-like
macros written without parentheses (e.g. "if likely (cond)" or "if unlikely
(cond)") with the canonical form that calls the macro around the condition (e.g.
"if (likely(cond))" / "if (unlikely(cond))"); ensure every check such as done,
t_block.has_value(), current_reader_idx < 0, and block && block.rows() uses the
corrected syntax so the macros expand to __builtin_expect properly.

---

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 821-824: The destructor currently swallows all exceptions with a
bare catch(...) in StorageDisaggregatedColumnar (or the related destructor
shown) — preserve the non-throwing behavior but log the exception by calling
tryLogCurrentException(log,
"StorageDisaggregatedColumnar::~StorageDisaggregatedColumnar") (or an
appropriate context string) inside the catch block; ensure you reference the
existing logger variable used in this translation unit and do not rethrow so
destructors remain noexcept.
- Around line 555-577: Replace the legacy DB::Exception constructor calls that
pass (message, code) with the fmt-style, error-code-first form; specifically, in
the branches handling ColumnarReaderErrorType::LockedError, ::PdClientError and
the default non-OK branch (where current throws are throw Exception("lock
error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), throw Exception("pd client error",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) and throw Exception("unknown error type",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR)), change them to throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") / throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error") / throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "unknown error type, error_type
{}", uint8_t(columnar_reader.error_type)) respectively; apply the same pattern
to the analogous throws near the other occurrence that uses the old overload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: db0c5d76-3eaf-4ef6-8e24-116bf2b71c3a

📥 Commits

Reviewing files that changed from the base of the PR and between ed0a0a0 and 77c5a0d.

📒 Files selected for processing (2)
  • dbms/src/Storages/KVStore/ProxyStateMachine.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • dbms/src/Storages/KVStore/ProxyStateMachine.h

Comment on lines +241 to +277
BlockInputStreams StorageDisaggregated::readThroughProxy(const Context & context, unsigned num_streams)
{
DAGPipeline pipeline;
const UInt64 start_ts = sender_target_mpp_task_id.gather_id.query_id.start_ts;
auto [remote_table_ranges, region_num] = buildRemoteTableRanges();
const auto generated_column_infos = genGeneratedColumnInfosForDisaggregatedRead(table_scan);
auto read_proxy_tasks = RNProxyReadTask::buildProxyReadTaskWithBackoff(
log,
context,
start_ts,
table_scan,
filter_conditions,
remote_table_ranges,
num_streams);
for (auto & task : read_proxy_tasks)
{
auto streams = task->getInputStreams();
pipeline.streams.insert(pipeline.streams.end(), streams.begin(), streams.end());
}
// Avoid reading generated columns from proxy, generate placeholders locally.
executeGeneratedColumnPlaceholder(generated_column_infos, log, pipeline);
NamesAndTypes source_columns;
source_columns.reserve(table_scan.getColumnSize());
const auto & stream_header = pipeline.firstStream()->getHeader();
for (const auto & col : stream_header)
{
source_columns.emplace_back(col.name, col.type);
}
analyzer = std::make_unique<DAGExpressionAnalyzer>(std::move(source_columns), context);

// Handle duration/timestamp cast for proxy path.
// We still execute pushed-down filters on RN side, so timestamp columns in those filters
// must also be converted from UTC to session timezone.
extraCast(*analyzer, pipeline, /*include_pushed_down_filter_columns=*/true);
// Handle filter
filterConditionsWithPushedDownFilters(*analyzer, pipeline);
return pipeline.streams;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle empty scans before sizing readers and building headers.

If region splitting returns no remote regions, Line 705 makes real_num_streams == 0 and Line 708 divides by zero. Even after guarding that, Line 264 and Line 314 still assume a non-empty task/header. Please return early for the empty-scan case before computing regions_per_reader and before creating the analyzer.

💡 Suggested guard rails
 BlockInputStreams StorageDisaggregated::readThroughProxy(const Context & context, unsigned num_streams)
 {
     DAGPipeline pipeline;
@@
     auto read_proxy_tasks = RNProxyReadTask::buildProxyReadTaskWithBackoff(
         log,
         context,
         start_ts,
         table_scan,
         filter_conditions,
         remote_table_ranges,
         num_streams);
+    if (read_proxy_tasks.empty())
+        return {};
+
     for (auto & task : read_proxy_tasks)
     {
         auto streams = task->getInputStreams();
         pipeline.streams.insert(pipeline.streams.end(), streams.begin(), streams.end());
     }
@@
 void StorageDisaggregated::readThroughProxy(
     PipelineExecutorContext & exec_context,
     PipelineExecGroupBuilder & group_builder,
     const Context & context,
     unsigned num_streams)
 {
@@
     auto read_proxy_tasks = RNProxyReadTask::buildProxyReadTaskWithBackoff(
         log,
         context,
         start_ts,
         table_scan,
         filter_conditions,
         remote_table_ranges,
         num_streams);
+    if (read_proxy_tasks.empty())
+        return;
+
     const auto generated_column_infos = genGeneratedColumnInfosForDisaggregatedRead(table_scan);
@@
     unsigned region_num = all_remote_regions_by_region.size();
     unsigned physical_table_num = physical_table_ids.size();
+    if (region_num == 0 || num_streams == 0)
+        return tasks;
+
     unsigned real_num_streams = std::min(num_streams, region_num);

Also applies to: 281-324, 703-708

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 241 - 277,
In StorageDisaggregated::readThroughProxy, guard against empty remote regions /
no proxy tasks and empty pipeline before doing any sizing or header work: after
calling buildRemoteTableRanges(), if remote_table_ranges is empty (or
region_num==0) return an empty BlockInputStreams immediately; after building
read_proxy_tasks and populating pipeline.streams, if pipeline.streams.empty()
return early as well before accessing pipeline.firstStream() or creating
analyzer; ensure these early returns happen before any divisions
(regions_per_reader) or calls that assume a non-empty header (analyzer,
extraCast, filterConditionsWithPushedDownFilters) so downstream code won’t
dereference empty streams or divide by zero.

Comment thread dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
dbms/src/Storages/KVStore/ProxyStateMachine.h (2)

100-110: 💤 Low value

args.reserve is now under-sized.

With the new key-only emission for empty values, each entry can push up to two elements (key + value). The reservation should account for that to avoid reallocations:

-        args.reserve(val_map.size() + 1);
+        args.reserve(val_map.size() * 2 + 1);

Also worth noting: this changes the semantics of any existing flash.proxy.* entry whose value happens to be empty — previously such entries emitted an empty-string argument, now the value is dropped. If that's intended (likely yes), consider a short comment to that effect.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/KVStore/ProxyStateMachine.h` around lines 100 - 110, The
args.reserve in the function building the argv vector (in ProxyStateMachine.h)
is too small: because each map entry may push both key and value (or only key
when value is empty), reserve should account for up to two elements per entry
plus the initial program name—e.g. reserve(val_map.size() * 2 + 1). Update the
reserve call accordingly and add a short comment near this loop explaining that
empty values intentionally emit only the key (dropping empty-string values) to
document the semantics change.

135-136: 💤 Low value

Prefer [[maybe_unused]] over UNUSED(...).

A [[maybe_unused]] attribute on the declaration is more idiomatic than a separate UNUSED(...) macro call, and it goes away naturally if the variable is also conditionally declared (see the related comment on guarding the use_columnar branch).

-        bool init_only = false;
-        UNUSED(init_only);
+        [[maybe_unused]] bool init_only = false;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/KVStore/ProxyStateMachine.h` around lines 135 - 136,
Replace the UNUSED(init_only) macro usage by marking the local boolean
declaration itself as possibly unused: change the variable declaration of
init_only in ProxyStateMachine (the init_only local used in the surrounding init
logic) to use the C++ attribute [[maybe_unused]] so the compiler will not warn
when the variable is unused (remove the separate UNUSED(init_only) line and
annotate the declaration instead).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/KVStore/ProxyStateMachine.h`:
- Around line 135-155: The use_columnar branch must be guarded by the build
macro to avoid starting the proxy without --init-only on non-columnar builds:
wrap the inner if (use_columnar) branch in `#if` ENABLE_NEXT_GEN_COLUMNAR ...
`#else` emit a LOG_ERROR/LOG_INFO explaining that columnar was requested but the
binary was built without ENABLE_NEXT_GEN_COLUMNAR and return false (so
DisaggregatedMode::Compute + use_autoscaler + use_columnar cannot proceed), and
keep the `#endif` so that init_only and the later tryParseFromConfig behavior
remain consistent with columnar-enabled builds; reference symbols: use_columnar,
init_only, DisaggregatedMode::Compute, use_autoscaler, tryParseFromConfig,
ENABLE_NEXT_GEN_COLUMNAR.

---

Nitpick comments:
In `@dbms/src/Storages/KVStore/ProxyStateMachine.h`:
- Around line 100-110: The args.reserve in the function building the argv vector
(in ProxyStateMachine.h) is too small: because each map entry may push both key
and value (or only key when value is empty), reserve should account for up to
two elements per entry plus the initial program name—e.g. reserve(val_map.size()
* 2 + 1). Update the reserve call accordingly and add a short comment near this
loop explaining that empty values intentionally emit only the key (dropping
empty-string values) to document the semantics change.
- Around line 135-136: Replace the UNUSED(init_only) macro usage by marking the
local boolean declaration itself as possibly unused: change the variable
declaration of init_only in ProxyStateMachine (the init_only local used in the
surrounding init logic) to use the C++ attribute [[maybe_unused]] so the
compiler will not warn when the variable is unused (remove the separate
UNUSED(init_only) line and annotate the declaration instead).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d4657d45-5eb5-4763-8001-724e350b8ed7

📥 Commits

Reviewing files that changed from the base of the PR and between 013f968 and 19b79c0.

📒 Files selected for processing (7)
  • CMakeLists.txt
  • cmake/find_tiflash_proxy.cmake
  • contrib/tiflash-proxy-cmake/CMakeLists.txt
  • dbms/CMakeLists.txt
  • dbms/src/Common/config.h.in
  • dbms/src/Storages/KVStore/ProxyStateMachine.h
  • dbms/src/Storages/StorageDisaggregatedRemote.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • dbms/src/Storages/StorageDisaggregatedRemote.cpp

Comment thread dbms/src/Storages/KVStore/ProxyStateMachine.h
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitmodules:
- Line 41: Replace the SSH-style submodule URLs with HTTPS so CI and
contributors without SSH keys can clone them; locate the submodule entries that
currently use "git@github.com:tidbcloud/cloud-storage-engine.git" (and the
similar SSH URL referenced a second time) and update them to the HTTPS form
"https://github.com/tidbcloud/cloud-storage-engine.git" inside .gitmodules,
ensuring both occurrences are changed and then run git submodule sync/update to
apply the change.
- Around line 39-41: The .gitmodules entry for submodule
"contrib/tiflash-proxy-next-gen" lacks a branch specification, causing
inconsistent update behavior with the other submodule; add a line "branch =
tiflash-proxy-7.0" to the "contrib/tiflash-proxy-next-gen" submodule block
(matching the branch used by "tiflash-proxy-columnar") so both submodules point
to the same branch when running git submodule update --remote.

In `@dbms/src/Common/TiFlashBuildInfo.cpp`:
- Around line 150-152: In the vector initializer in TiFlashBuildInfo (within the
block guarded by ENABLE_NEXT_GEN_COLUMNAR), add a trailing comma after the
"columnar" string literal so that when ENABLE_NEXT_GEN_COLUMNAR and ENABLE_CLARA
are both defined the preprocessor output remains valid; update the entry guarded
by ENABLE_NEXT_GEN_COLUMNAR (the "columnar" literal) to include a trailing comma
to match the other feature entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: de829508-355d-4471-be4e-f87a177dfdc3

📥 Commits

Reviewing files that changed from the base of the PR and between 19b79c0 and d519b15.

📒 Files selected for processing (5)
  • .gitmodules
  • contrib/tiflash-proxy-columnar
  • dbms/src/Common/TiFlashBuildInfo.cpp
  • dbms/src/Storages/DeltaMerge/DMContext.cpp
  • dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
💤 Files with no reviewable changes (2)
  • dbms/src/Storages/DeltaMerge/DMContext.cpp
  • dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • contrib/tiflash-proxy-columnar

Comment thread .gitmodules
Comment thread .gitmodules
Comment thread dbms/src/Common/TiFlashBuildInfo.cpp
Signed-off-by: JaySon-Huang <tshent@qq.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)

241-278: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Empty-scan / zero-region path still crashes or divides by zero — past comment not yet addressed.

Several callsites assume the proxy read produced at least one stream/concurrency and at least one region:

  • Line 264: pipeline.firstStream()->getHeader() will deref a null IBlockInputStreamPtr if read_proxy_tasks came back empty (no remote regions, or all filtered out).
  • Line 314: group_builder.getCurrentHeader() has the same issue when no RNProxySourceOp was added by the loop above.
  • Lines 705 / 708: if region_num == 0 (or num_streams == 0), real_num_streams == 0 and (region_num + real_num_streams - 1) / real_num_streams divides by zero.

Please add early-returns / guards as proposed previously (return empty BlockInputStreams / return; before sizing and before building the analyzer; return tasks early before computing regions_per_reader).

Also applies to: 281-324, 703-708

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 241 - 278,
The readThroughProxy path doesn't guard against empty proxy reads or zero
regions/streams and dereferences pipeline.firstStream() and
group_builder.getCurrentHeader() and computes regions_per_reader with possible
division by zero; update StorageDisaggregated::readThroughProxy to early-return
an empty BlockInputStreams when read_proxy_tasks is empty or pipeline.streams is
empty (before accessing pipeline.firstStream() and before building
DAGExpressionAnalyzer), and add guards to return early when region_num == 0 or
num_streams == 0 (or compute real_num_streams == 0) before any division or
sizing logic that uses (region_num + real_num_streams - 1) / real_num_streams;
ensure the same defensive checks are applied around the places that call
group_builder.getCurrentHeader() so no null stream is dereferenced.

985-985: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Line 985 still has the broken if likely (...) syntax — the past sweep missed this site.

likely/unlikely are function-like macros (__builtin_expect(!!(x), N)); the macro must wrap the condition. With the space, the expansion becomes if __builtin_expect(!!(block && block.rows() > 0), 1), which is not a valid if statement. Cppcheck still reports a syntax error at this line, and this file would fail to compile under ENABLE_NEXT_GEN_COLUMNAR. The earlier fix cleaned up lines 938/956/961/971/976 but left this one.

🛠️ Proposed fix
-    if likely (block && block.rows() > 0)
+    if (likely(block && block.rows() > 0))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 985, The if
statement uses a spaced macro invocation ("if likely (block && block.rows() >
0)") which prevents the function-like likely(...) macro from expanding and
breaks compilation; change it to use the macro invocation form, e.g. if
(likely(block && block.rows() > 0)) (or if likely(block && block.rows() > 0)) so
the macro name is immediately followed by '('; update the occurrence that checks
block && block.rows() > 0 in StorageDisaggregatedColumnar (the statement
referencing variable block) to this corrected form.
🧹 Nitpick comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)

856-906: ⚡ Quick win

Deduplicate the per-column read path and hoist fn_physical_table_id out of the loop.

The extra_handle and regular-column branches (lines 871–885 and 890–905) differ only in which Rust FFI fetches the column buffer. The 9-line deserialize+SCOPE_EXIT block is copy-pasted, and fn_physical_table_id(reader) is invoked once per column even though it only depends on the reader. Pull the physical-table-id call out of the loop and factor the deserialize into a small lambda/helper to keep the read path readable.

♻️ Sketch
     TableID physical_table_id = -1;
     Block header = getHeader();
     const ColumnsWithTypeAndName col_type_and_name = header.getColumnsWithTypeAndName();
     MutableColumns columns = header.cloneEmptyColumns();
+    physical_table_id = proxy_helper->cloud_storage_engine_interfaces.fn_physical_table_id(reader);
+    auto deserialize_from = [&](size_t i, const RustStrWithView & col_data) {
+        ReadBufferFromMemory buf(col_data.buff.data, static_cast<size_t>(col_data.buff.len));
+        col_type_and_name[i].type->deserializeBinaryBulkWithMultipleStreams(
+            *columns[i],
+            [&](const IDataType::SubstreamPath &) { return &buf; },
+            rows,
+            /*avg_value_size_hint=*/-1.0, // -1 => Decimal format from proxy
+            true,
+            {});
+    };
     for (UInt32 i = 0; i < col_type_and_name.size(); ++i)
     {
         Int64 col_id = col_type_and_name[i].column_id;
         if (col_id == MutSup::extra_table_id_col_id)
             continue;
-        if (col_id == MutSup::extra_handle_id) { ... } else { ... }
+        RustStrWithView col_data = (col_id == MutSup::extra_handle_id)
+            ? proxy_helper->cloud_storage_engine_interfaces.fn_read_handle(reader)
+            : proxy_helper->cloud_storage_engine_interfaces.fn_read_column(reader, col_id);
+        SCOPE_EXIT({ RustGcHelper::instance().gcRustPtr(col_data.inner.ptr, col_data.inner.type); });
+        deserialize_from(i, col_data);
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 856 - 906,
Call proxy_helper->cloud_storage_engine_interfaces.fn_physical_table_id(reader)
once before the loop and store it in physical_table_id, and factor the repeated
deserialize+SCOPE_EXIT logic into a small helper/lambda (e.g., a local lambda
that accepts RustStrWithView or ReadBufferFromMemory& and the column reference)
so both the fn_read_handle and fn_read_column branches only differ by which
fetch function they call; ensure the lambda runs the
RustGcHelper::instance().gcRustPtr cleanup and then invokes
col_type_and_name[i].type->deserializeBinaryBulkWithMultipleStreams(...) on the
target column (columns[i]) with the same parameters used now, leaving the
special-case skip for MutSup::extra_table_id_col_id unchanged.

567-577: ⚡ Quick win

Use the fmt-style Exception constructor with the error code first.

The coding guidelines require throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);. Lines 567, 572, 577, and 851 still use the legacy (message, code) form.

🛠️ Proposed fix
-        throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");
@@
-        throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error");
@@
-        throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(
+            ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
+            "unknown error type, error_type={}",
+            static_cast<uint8_t>(columnar_reader.error_type));
@@
-        throw Exception("read_block failed in tiflash-proxy", ErrorCodes::LOGICAL_ERROR);
+        throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in tiflash-proxy");

As per coding guidelines: "Prefer the fmt-style constructor for DB::Exception with error code first: throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);".

Also applies to: 851-851

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 567 - 577,
The throws in StorageDisaggregatedColumnar (where columnar_reader.error_type is
checked) use the legacy Exception(message, code) signature; update each to the
fmt-style Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "message"[, args...])
form—i.e. for the lock, pd client, and unknown error branches (references:
columnar_reader.error_type, ColumnarReaderErrorType, LOG_WARNING) replace throw
Exception("...", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) with throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "create columnar reader failed,
pd client error") or the corresponding descriptive message and include the
uint8_t(columnar_reader.error_type) as a fmt argument for the unknown case;
apply the same pattern to the other occurrence noted (error code usage at the
later throw).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 494-553: Check and handle the return value of
region_error.ParseFromString(error_msg) (and similarly
lock_info.ParseFromString) and if parsing fails, log a parse error and treat it
as an internal/different error path instead of proceeding with a
default-constructed errorpb::Error; when iterating
region_error.epoch_not_match().current_regions() collect all region ids into
unavailable_regions and build a joined region_id_ver string (use FmtBuffer for
efficient concatenation) rather than overwriting region_id_ver each loop, and
either remove the unused retry_regions variables (region_error branch and the
other branch) or actually use them where intended; keep existing
dropRegion(region_ver_id), lock_guard(output_lock), and throw RegionException
with the aggregated region_id_ver (or a deterministic representative) and
correct RegionReadStatus after parsing succeeds.

---

Duplicate comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 241-278: The readThroughProxy path doesn't guard against empty
proxy reads or zero regions/streams and dereferences pipeline.firstStream() and
group_builder.getCurrentHeader() and computes regions_per_reader with possible
division by zero; update StorageDisaggregated::readThroughProxy to early-return
an empty BlockInputStreams when read_proxy_tasks is empty or pipeline.streams is
empty (before accessing pipeline.firstStream() and before building
DAGExpressionAnalyzer), and add guards to return early when region_num == 0 or
num_streams == 0 (or compute real_num_streams == 0) before any division or
sizing logic that uses (region_num + real_num_streams - 1) / real_num_streams;
ensure the same defensive checks are applied around the places that call
group_builder.getCurrentHeader() so no null stream is dereferenced.
- Line 985: The if statement uses a spaced macro invocation ("if likely (block
&& block.rows() > 0)") which prevents the function-like likely(...) macro from
expanding and breaks compilation; change it to use the macro invocation form,
e.g. if (likely(block && block.rows() > 0)) (or if likely(block && block.rows()
> 0)) so the macro name is immediately followed by '('; update the occurrence
that checks block && block.rows() > 0 in StorageDisaggregatedColumnar (the
statement referencing variable block) to this corrected form.

---

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 856-906: Call
proxy_helper->cloud_storage_engine_interfaces.fn_physical_table_id(reader) once
before the loop and store it in physical_table_id, and factor the repeated
deserialize+SCOPE_EXIT logic into a small helper/lambda (e.g., a local lambda
that accepts RustStrWithView or ReadBufferFromMemory& and the column reference)
so both the fn_read_handle and fn_read_column branches only differ by which
fetch function they call; ensure the lambda runs the
RustGcHelper::instance().gcRustPtr cleanup and then invokes
col_type_and_name[i].type->deserializeBinaryBulkWithMultipleStreams(...) on the
target column (columns[i]) with the same parameters used now, leaving the
special-case skip for MutSup::extra_table_id_col_id unchanged.
- Around line 567-577: The throws in StorageDisaggregatedColumnar (where
columnar_reader.error_type is checked) use the legacy Exception(message, code)
signature; update each to the fmt-style
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "message"[, args...]) form—i.e.
for the lock, pd client, and unknown error branches (references:
columnar_reader.error_type, ColumnarReaderErrorType, LOG_WARNING) replace throw
Exception("...", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) with throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "create columnar reader failed,
pd client error") or the corresponding descriptive message and include the
uint8_t(columnar_reader.error_type) as a fmt argument for the unknown case;
apply the same pattern to the other occurrence noted (error code usage at the
later throw).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d1f9ec7e-99ee-41cb-a6a3-bb1d4ef932cb

📥 Commits

Reviewing files that changed from the base of the PR and between 19b79c0 and 0136247.

📒 Files selected for processing (8)
  • .gitmodules
  • contrib/tiflash-proxy-columnar
  • dbms/src/Common/TiFlashBuildInfo.cpp
  • dbms/src/Server/Server.cpp
  • dbms/src/Storages/DeltaMerge/DMContext.cpp
  • dbms/src/Storages/KVStore/ProxyStateMachine.h
  • dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
💤 Files with no reviewable changes (2)
  • dbms/src/Storages/DeltaMerge/DMContext.cpp
  • dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
✅ Files skipped from review due to trivial changes (1)
  • .gitmodules
🚧 Files skipped from review as they are similar to previous changes (4)
  • dbms/src/Common/TiFlashBuildInfo.cpp
  • contrib/tiflash-proxy-columnar
  • dbms/src/Server/Server.cpp
  • dbms/src/Storages/KVStore/ProxyStateMachine.h

Comment on lines +494 to +553
auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
errorpb::Error region_error;
region_error.ParseFromString(error_msg);
auto region_ver_id = pingcap::kv::RegionVerID(region_id, region_conf_ver, region_ver);
// Refresh region cache and throw an exception for retrying.
if (region_error.has_epoch_not_match())
{
RegionException::UnavailableRegions unavailable_regions;
String region_id_ver; // region_id:region_ver:conf_ver
std::unordered_set<RegionID> retry_regions;
for (const auto & region : region_error.epoch_not_match().current_regions())
{
unavailable_regions.insert(region.id());
retry_regions.insert(region.id());
region_id_ver = std::to_string(region.id()) + ":" + std::to_string(region_ver) + ":"
+ std::to_string(region.region_epoch().conf_ver());
}
auto _guard = std::lock_guard(output_lock);
cluster->region_cache->dropRegion(region_ver_id);
LOG_WARNING(
log,
"create columnar reader failed region_id={}, epoch not match {}",
std::to_string(region_id),
region_ver_id.toString());
throw RegionException(
std::move(unavailable_regions),
RegionException::RegionReadStatus::EPOCH_NOT_MATCH,
region_id_ver.c_str());
}
else
{
RegionException::UnavailableRegions unavailable_regions;
std::unordered_set<RegionID> retry_regions;
auto err_region_id = 0;
if (region_error.has_region_not_found())
{
err_region_id = region_error.region_not_found().region_id();
unavailable_regions.insert(err_region_id);
retry_regions.insert(err_region_id);
LOG_WARNING(
log,
"create columnar reader failed region_id={}, region not found {}",
std::to_string(region_id),
std::to_string(err_region_id));
}
else
{
LOG_WARNING(
log,
"create columnar reader failed region_id={}, {}",
std::to_string(region_id),
region_error.ShortDebugString());
}
auto _guard = std::lock_guard(output_lock);
cluster->region_cache->dropRegion(region_ver_id);
throw RegionException(
std::move(unavailable_regions),
RegionException::RegionReadStatus::NOT_FOUND,
std::to_string(region_id).c_str());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unchecked ParseFromString, lossy region_id_ver, and dead retry_regions in the RegionError branch.

  • Line 496: region_error.ParseFromString(error_msg) return value is ignored. If the proxy ever returns a malformed error payload, you proceed with a default-constructed errorpb::Error, mis-classify as the else branch, drop the region from cache, and throw NOT_FOUND with no real diagnostic. Same concern at line 559 for lock_info.ParseFromString(error_msg).
  • Lines 502–510: region_id_ver is overwritten on every iteration of current_regions(), so the RegionException message reflects only the last region even though unavailable_regions may contain many. Either build a joined string or move the throw inside the loop / pick a deterministic representative.
  • Lines 503, 526: retry_regions is populated but never read — dead variable in both sub-branches.
🧹 Sketch
-        errorpb::Error region_error;
-        region_error.ParseFromString(error_msg);
+        errorpb::Error region_error;
+        if (!region_error.ParseFromString(error_msg))
+            throw Exception(
+                ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
+                "failed to parse proxy RegionError payload, region_id={}",
+                region_id);
@@
-            RegionException::UnavailableRegions unavailable_regions;
-            String region_id_ver; // region_id:region_ver:conf_ver
-            std::unordered_set<RegionID> retry_regions;
-            for (const auto & region : region_error.epoch_not_match().current_regions())
-            {
-                unavailable_regions.insert(region.id());
-                retry_regions.insert(region.id());
-                region_id_ver = std::to_string(region.id()) + ":" + std::to_string(region_ver) + ":"
-                    + std::to_string(region.region_epoch().conf_ver());
-            }
+            RegionException::UnavailableRegions unavailable_regions;
+            FmtBuffer regions_desc;
+            for (const auto & region : region_error.epoch_not_match().current_regions())
+            {
+                unavailable_regions.insert(region.id());
+                regions_desc.fmtAppend(
+                    "{}{}:{}:{}",
+                    regions_desc.size() == 0 ? "" : ",",
+                    region.id(),
+                    region_ver,
+                    region.region_epoch().conf_ver());
+            }

As per coding guidelines: "Use FmtBuffer for complex string building in performance-critical paths in C++".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 494 - 553,
Check and handle the return value of region_error.ParseFromString(error_msg)
(and similarly lock_info.ParseFromString) and if parsing fails, log a parse
error and treat it as an internal/different error path instead of proceeding
with a default-constructed errorpb::Error; when iterating
region_error.epoch_not_match().current_regions() collect all region ids into
unavailable_regions and build a joined region_id_ver string (use FmtBuffer for
efficient concatenation) rather than overwriting region_id_ver each loop, and
either remove the unused retry_regions variables (region_error branch and the
other branch) or actually use them where intended; keep existing
dropRegion(region_ver_id), lock_guard(output_lock), and throw RegionException
with the aggregated region_id_ver (or a deterministic representative) and
correct RegionReadStatus after parsing succeeds.

Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support new columnar storage as data source

2 participants