pingcap · lhy1024 · Jun 24, 2026 · Jun 24, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/grafana-pd-dashboard.md b/grafana-pd-dashboard.md
@@ -89,6 +89,7 @@ The following is the description of PD Dashboard metrics items:
 - Total read bytes on hot peer Regions: The total read bytes of peers that have become read hotspots on each TiKV instance
 - Store read rate bytes: The total read bytes of each TiKV instance
 - Store read rate keys: The total read keys of each TiKV instance
+- Store read cpu: The read CPU usage of each TiKV instance. PD uses this metric for CPU-aware read hotspot scheduling starting from v8.5.7 and v9.0.0.
 - Hot cache read entry number: The number of peers that are in the read hotspot statistics module on each TiKV instance
 
 ![PD Dashboard - Hot read metrics](/media/pd-dashboard-hotread-v4.png)

diff --git a/pd-control.md b/pd-control.md
@@ -515,6 +515,7 @@ Usage:
       "hot_region_type": "read",
       "hot_degree": 152,
       "flow_bytes": 0,
+      "flow_cpu": 32,
       "key_rate": 0,
       "query_rate": 305,
       "start_key": "7480000000000000FF5300000000000000F8",
@@ -536,6 +537,7 @@ Usage:
       "hot_region_type": "read",
       "hot_degree": 152,
       "flow_bytes": 0,
+      "flow_cpu": 32,
       "key_rate": 0,
       "query_rate": 305,
       "start_key": "7480000000000000FF5300000000000000F8",
@@ -546,6 +548,8 @@ Usage:
 }
 ```
 
+Starting from v8.5.7 and v9.0.0, the `hot read` and `hot history` commands include `flow_cpu`, and the `hot store` command includes `cpu-read-rate`. These fields show the read CPU usage for CPU-aware read hotspot scheduling.
+
 ### `label [store <name> <value>]`
 
 Use this command to view the label information of the cluster.
@@ -1025,22 +1029,24 @@ Usage:
   "min-hot-byte-rate": 100,
   "min-hot-key-rate": 10,
   "min-hot-query-rate": 10,
+  "min-hot-cpu-rate": 10,
   "max-zombie-rounds": 3,
   "max-peer-number": 1000,
   "byte-rate-rank-step-ratio": 0.05,
   "key-rate-rank-step-ratio": 0.05,
   "query-rate-rank-step-ratio": 0.05,
+  "cpu-rate-rank-step-ratio": 0.05,
   "count-rank-step-ratio": 0.01,
   "great-dec-ratio": 0.95,
   "minor-dec-ratio": 0.99,
   "src-tolerance-ratio": 1.05,
   "dst-tolerance-ratio": 1.05,
   "read-priorities": [
-    "query",
+    "cpu",
     "byte"
   ],
   "write-leader-priorities": [
-    "key",
+    "query",
     "byte"
   ],
   "write-peer-priorities": [
@@ -1071,6 +1077,12 @@ Usage:
     scheduler config balance-hot-region-scheduler set min-hot-query-rate 10
     ```
 
+- `min-hot-cpu-rate` specifies the minimum CPU usage of read requests to count. The value is measured as a percentage of one CPU core, and the default value is usually `10`, which means 10% of one CPU core.
+
+    ```bash
+    scheduler config balance-hot-region-scheduler set min-hot-cpu-rate 10
+    ```
+
 - `max-zombie-rounds` means the maximum number of heartbeats with which an operator can be considered as the pending influence. If you set it to a larger value, more operators might be included in the pending influence. Usually, you do not need to adjust its value. Pending influence refers to the operator influence that is generated during scheduling but still has an effect.
 
     ```bash
@@ -1083,7 +1095,7 @@ Usage:
     scheduler config balance-hot-region-scheduler set max-peer-number 1000
     ```
 
-- `byte-rate-rank-step-ratio`, `key-rate-rank-step-ratio`, `query-rate-rank-step-ratio`, and `count-rank-step-ratio` respectively mean the step ranks of byte, key, query, and count. The rank-step-ratio decides the step when the rank is calculated. `great-dec-ratio` and `minor-dec-ratio` are used to determine the `dec` rank. Usually, you do not need to modify these items.
+- `byte-rate-rank-step-ratio`, `key-rate-rank-step-ratio`, `query-rate-rank-step-ratio`, `cpu-rate-rank-step-ratio`, and `count-rank-step-ratio` represent the step ranks of byte, key, query, CPU, and count, respectively. The rank-step-ratio decides the step when calculating the rank. PD uses `great-dec-ratio` and `minor-dec-ratio` to determine the `dec` rank. Usually, you do not need to modify these items.
 
     ```bash
     scheduler config balance-hot-region-scheduler set byte-rate-rank-step-ratio 0.05
@@ -1097,15 +1109,18 @@ Usage:
 
 - `read-priorities`, `write-leader-priorities`, and `write-peer-priorities` control which dimension the scheduler prioritizes for hot Region scheduling. Two dimensions are supported for configuration.
 
-    - `read-priorities` and `write-leader-priorities` control which dimensions the scheduler prioritizes for scheduling hot Regions of the read and write-leader types. The dimension options are `query`, `byte`, and `key`.
+    - `read-priorities` controls which dimensions the scheduler prioritizes for scheduling hot Regions of the read type. The dimension options are `cpu`, `query`, `byte`, and `key`.
+    - `write-leader-priorities` controls which dimensions the scheduler prioritizes for scheduling hot Regions of the write-leader type. The dimension options are `query`, `byte`, and `key`.
     - `write-peer-priorities` controls which dimensions the scheduler prioritizes for scheduling hot Regions of the write-peer type. The dimension options are `byte` and `key`.
 
     > **Note:**
     >
-    > If a cluster component is earlier than v5.2, the configuration of `query` dimension does not take effect. If some components are upgraded to v5.2 or later, the `byte` and `key` dimensions still by default have the priority for hot Region scheduling. After all components of the cluster are upgraded to v5.2 or later, such a configuration still takes effect for compatibility. You can view the real-time configuration using the `pd-ctl` command. Usually, you do not need to modify these configurations.
+    > If a cluster component is earlier than v5.2, the configuration of the `query` dimension does not take effect. If some components are upgraded to v5.2 or later, the `byte` and `key` dimensions still by default have the priority for hot Region scheduling. After all components of the cluster are upgraded to v5.2 or later, such a configuration still takes effect for compatibility.
+    >
+    > Starting from v8.5.7, TiKV reports read CPU usage for hot Region scheduling. For clusters that support read CPU reporting, the default `read-priorities` value is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the cluster does not support the `query` dimension either. You can view the real-time configuration using the `pd-ctl` command. Usually, you do not need to modify these configurations.
 
     ```bash
-    scheduler config balance-hot-region-scheduler set read-priorities query,byte
+    scheduler config balance-hot-region-scheduler set read-priorities cpu,byte
     ```
 
 - `strict-picking-store` controls the search space of hot Region scheduling. Usually, it is enabled. This configuration item only affects the behavior when `rank-formula-version` is `v1`. When it is enabled, hot Region scheduling ensures hot Region balance on the two configured dimensions. When it is disabled, hot Region scheduling only ensures the balance on the dimension with the first priority, which might reduce balance on other dimensions. Usually, you do not need to modify this configuration.

diff --git a/troubleshoot-hot-spot-issues.md b/troubleshoot-hot-spot-issues.md
@@ -185,6 +185,13 @@ For more details, see [Coprocessor Cache](/coprocessor-cache.md).
 
 In a read hotspot scenario, the hotspot TiKV node cannot process read requests in time, resulting in the read requests queuing. However, not all TiKV resources are exhausted at this time. To reduce latency, TiDB v7.1.0 introduces the load-based replica read feature, which allows TiDB to read data from other TiKV nodes without queuing on the hotspot TiKV node. You can control the queue length of read requests using the [`tidb_load_based_replica_read_threshold`](/system-variables.md#tidb_load_based_replica_read_threshold-new-in-v700) system variable. When the estimated queue time of the leader node exceeds this threshold, TiDB prioritizes reading data from follower nodes. This feature can improve read throughput by 70% to 200% in a read hotspot scenario compared to not scattering read hotspots.
 
+Starting from v8.5.7 and v9.0.0, PD supports CPU-aware hot Region scheduling for read hotspots. TiKV reports per-Region read CPU usage in store heartbeats, and PD can use CPU usage as a scheduling dimension. This mechanism helps PD identify read hotspots whose QPS or byte throughput appears balanced but whose TiKV CPU usage remains uneven, such as workloads with queries that have different CPU costs or clusters where TiKV nodes handle different workload patterns.
+
+- For clusters that support read CPU reporting, the default `read-priorities` value of `balance-hot-region-scheduler` is `cpu,byte`. 
+- For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the cluster does not support the `query` dimension either. 
+
+To view or adjust the scheduling dimensions, use [`pd-ctl scheduler config balance-hot-region-scheduler`](/pd-control.md#scheduler-config-balance-hot-region-scheduler).
+
 ## Use TiKV MVCC in-memory engine to mitigate read hotspots caused by high MVCC read amplification
 
 When the retention time of historical MVCC data for GC is too long, or when the records are frequently updated or deleted, read hotspots might occur due to scanning a large number of MVCC versions. To alleviate this type of hotspot, you can enable the [TiKV MVCC In-Memory Engine](/tikv-in-memory-engine.md) feature.