From d052276bd7b8b3a295f1b989b274b59e8b0501cb Mon Sep 17 00:00:00 2001 From: qiancai Date: Sat, 27 Jun 2026 11:11:38 +0800 Subject: [PATCH 1/2] tidb-cloud: refresh htap quickstart and add connection steps --- tidb-cloud/tidb-cloud-htap-quickstart.md | 137 +++++++++++++++-------- tidb-cloud/tidb-cloud-quickstart.md | 13 +++ 2 files changed, 106 insertions(+), 44 deletions(-) diff --git a/tidb-cloud/tidb-cloud-htap-quickstart.md b/tidb-cloud/tidb-cloud-htap-quickstart.md index a02787565ab91..23bf13ef88c03 100644 --- a/tidb-cloud/tidb-cloud-htap-quickstart.md +++ b/tidb-cloud/tidb-cloud-htap-quickstart.md @@ -6,24 +6,39 @@ aliases: ['/tidbcloud/use-htap-cluster'] # TiDB Cloud HTAP Quick Start -[HTAP](https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing) means Hybrid Transactional and Analytical Processing. The HTAP architecture in TiDB Cloud is composed of [TiKV](https://tikv.org), a row-based storage engine designed for transactional processing, and [TiFlash](https://docs.pingcap.com/tidb/stable/tiflash-overview), a columnar storage designed for analytical processing. Your application data is first stored in TiKV and then replicated to TiFlash via the Raft consensus algorithm. So it is a real-time replication from the row-based storage to the columnar storage. +[HTAP](https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing) stands for Hybrid Transactional and Analytical Processing. The HTAP architecture in TiDB Cloud is composed of [TiKV](https://tikv.org), a row-based storage engine designed for transactional processing, and [TiFlash](https://docs.pingcap.com/tidb/stable/tiflash-overview), a columnar storage engine designed for analytical processing. Your application data is first stored in TiKV and then replicated to TiFlash via the Raft consensus algorithm. So it is a real-time replication from the row-based storage to the columnar storage. This tutorial guides you through an easy way to experience the Hybrid Transactional and Analytical Processing (HTAP) feature of TiDB Cloud. The content includes how to replicate tables to TiFlash, how to run queries with TiFlash, and how to experience the performance boost. ## Before you begin -Before experiencing the HTAP feature, follow [TiDB Cloud Quick Start](/tidb-cloud/tidb-cloud-quickstart.md) to create a {{{ .starter }}} instance and import the **Steam Game Stats** sample dataset to the instance. +Before trying the HTAP feature, follow [TiDB Cloud Quick Start](/tidb-cloud/tidb-cloud-quickstart.md) to create a {{{ .starter }}} instance, and then import your data (this document takes the **Steam Games Dataset 2021-2025** as an example) into the instance as follows: + +1. Download the [Steam Games Dataset 2021-2025](https://www.kaggle.com/datasets/jypenpen54534/steam-games-dataset-2021-2025-65k) from Kaggle. +2. Open the **Import** page for your {{{ .starter }}} instance. + + 1. In the [TiDB Cloud console](https://tidbcloud.com/), navigate to the [**My TiDB**](https://tidbcloud.com/tidbs) page, and then click the name of your target {{{ .starter }}} instance to go to its overview page. + 2. On the overview page, click **Data** > **Import** in the left navigation pane. + +3. Import the downloaded CSV file into your {{{ .starter }}} instance. + + 1. On the **Import** page, click **Upload a local file**, and then select and upload the downloaded CSV file. + 2. In the **Destination** section, enter `steam` in the **Database** field and `games` in the **Table** field. + 3. Click **Define Table**, change the data type of the column `categories` to `TEXT`, and then change the data type of the column `developer` to `TEXT`. + 4. Click **Start Import**. + + You can view the import progress on the **Import Task Detail** page. ## Steps ### Step 1. Replicate the sample data to the columnar storage engine -After a {{{ .starter }}} instance is created, TiKV does not replicate data to TiFlash by default. You need to execute DDL statements in a MySQL client of TiDB to specify the tables to be replicated. After that, TiDB will create the specified table replicas in TiFlash accordingly. +After a {{{ .starter }}} instance is created, TiDB does not replicate data from TiKV to TiFlash by default. To replicate a target table to TiFlash, use a MySQL client to connect to your {{{ .starter }}} instance and execute a DDL statement. TiDB then creates the specified table replicas in TiFlash. -For example, to replicate the `games` table (in the **Steam Game Stats** sample dataset) to TiFlash, execute the following statements: +For example, to replicate the `games` table (from the **Steam Games Dataset 2021-2025** sample dataset) to TiFlash, execute the following statements: ```sql -USE game; +USE steam; ``` ```sql @@ -33,16 +48,16 @@ ALTER TABLE games SET TIFLASH REPLICA 2; To check the replication progress, execute the following statement: ```sql -SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_ID, REPLICA_COUNT, LOCATION_LABELS, AVAILABLE, PROGRESS FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'game' and TABLE_NAME = 'games'; +SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_ID, REPLICA_COUNT, LOCATION_LABELS, AVAILABLE, PROGRESS FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'steam' AND TABLE_NAME = 'games'; ``` ```sql +--------------+------------+----------+---------------+-----------------+-----------+----------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS | +--------------+------------+----------+---------------+-----------------+-----------+----------+ -| game | games | 88 | 2 | | 1 | 1 | +| steam | games | 227 | 2 | | 1 | 1 | +--------------+------------+----------+---------------+-----------------+-----------+----------+ -1 row in set (0.20 sec) +1 row in set (0.24 sec) ``` In the result of the preceding statement: @@ -54,16 +69,18 @@ In the result of the preceding statement: When the process of replication is completed, you can start to run some queries. -For example, you can check the number of games released every year, as well as the average price and average playtime: +For example, you can check the number of games released every year, as well as the average price and average number of recommendations: ```sql SELECT - YEAR(`release_date`) AS `release_year`, + `release_year`, COUNT(*) AS `games_released`, AVG(`price`) AS `average_price`, - AVG(`average_playtime_forever`) AS `average_playtime` + AVG(`recommendations`) AS `average_recommendations` FROM `games` +WHERE + `release_year` IS NOT NULL GROUP BY `release_year` ORDER BY @@ -78,16 +95,30 @@ In this step, you can compare the execution statistics between TiKV (row-based s ```sql EXPLAIN ANALYZE SELECT /*+ READ_FROM_STORAGE(TIKV[games]) */ - YEAR(`release_date`) AS `release_year`, + `release_year`, + `genres`, + `developer`, COUNT(*) AS `games_released`, + SUM(`recommendations`) AS `total_recommendations`, + AVG(`recommendations`) AS `average_recommendations`, AVG(`price`) AS `average_price`, - AVG(`average_playtime_forever`) AS `average_playtime` + MAX(`recommendations`) AS `max_recommendations` FROM `games` + WHERE + `release_year` IS NOT NULL + AND `genres` IS NOT NULL + AND `developer` IS NOT NULL GROUP BY - `release_year` + `release_year`, + `genres`, + `developer` + HAVING + COUNT(*) >= 2 ORDER BY - `release_year` DESC; + `total_recommendations` DESC, + `games_released` DESC + LIMIT 20; ``` For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use either TiKV or TiFlash replicas based on the cost estimation. In the preceding `EXPLAIN ANALYZE` statement, the `/*+ READ_FROM_STORAGE(TIKV[games]) */` hint is used to force the optimizer to choose TiKV so you can check the execution statistics of TiKV. @@ -99,55 +130,73 @@ In this step, you can compare the execution statistics between TiKV (row-based s In the output, you can get the execution time from the `execution info` column. ```sql - id | estRows | actRows | task | access object | execution info | operator info | memory | disk - ---------------------------+----------+---------+-----------+---------------+--------------------------------------------+-----------------------------------------------+---------+--------- - Sort_5 | 4019.00 | 28 | root | | time:672.7ms, loops:2, RU:1159.679690 | Column#36:desc | 18.0 KB | 0 Bytes - └─Projection_7 | 4019.00 | 28 | root | | time:672.7ms, loops:6, Concurrency:5 | year(game.games.release_date)->Column#36, ... | 35.5 KB | N/A - └─HashAgg_15 | 4019.00 | 28 | root | | time:672.6ms, loops:6, partial_worker:... | group by:Column#38, funcs:count(Column#39)... | 56.7 KB | N/A - └─TableReader_16 | 4019.00 | 28 | root | | time:672.4ms, loops:2, cop_task: {num:... | data:HashAgg_9 | 3.60 KB | N/A - └─HashAgg_9 | 4019.00 | 28 | cop[tikv] | | tikv_task:{proc max:300ms, min:0s, avg... | group by:year(game.games.release_date), ... | N/A | N/A - └─TableFullScan_14 | 68223.00 | 68223 | cop[tikv] | table:games | tikv_task:{proc max:290ms, min:0s, avg... | keep order:false | N/A | N/A - (6 rows) + id | estRows | actRows | task | access object | execution info | operator info | memory | disk + ----------------------------------+----------+---------+-----------+---------------+---------------------------------------------+--------------------------------------------+----------+--------- + Projection_10 | 20.00 | 20 | root | | time:234.4ms, loops:2, RU:241.66, ... | steam.games.release_year, ... | 6.03 KB | N/A + └─TopN_13 | 20.00 | 20 | root | | time:234.4ms, loops:2 | Column#13:desc, Column#12:desc, ... | 12.6 KB | 0 Bytes + └─Selection_18 | 36774.40 | 2458 | root | | time:233.9ms, loops:5 | ge(Column#12, ?) | 187.0 KB | N/A + └─HashAgg_22 | 45968.00 | 59883 | root | | time:228.4ms, loops:62, partial_worker:... | group by:Column#39, Column#40, ... | 31.7 MB | 0 Bytes + └─Projection_38 | 65521.00 | 65521 | root | | time:142.9ms, loops:66, Concurrency:5 | cast(steam.games.recommendations, ... | 1.16 MB | N/A + └─TableReader_32 | 65521.00 | 65521 | root | | time:49.5ms, loops:66, cop_task:{num:9... | data:Selection_31 | 3.26 MB | N/A + └─Selection_31 | 65521.00 | 65521 | cop[tikv] | | tikv_task:{proc max:20ms, min:0s, ... | not(isnull(steam.games.developer)), ... | N/A | N/A + └─TableFullScan_30 | 65521.00 | 65521 | cop[tikv] | table:games | tikv_task:{proc max:10ms, min:0s, ... | keep order:false | N/A | N/A + (8 rows) ``` -- To get the execution statistics of this query using TiFlash, execute the following statement: +- To get the execution statistics of this query using TiFlash, execute the same statement without the `/*+ READ_FROM_STORAGE(TIKV[games]) */` hint: ```sql EXPLAIN ANALYZE SELECT - YEAR(`release_date`) AS `release_year`, + `release_year`, + `genres`, + `developer`, COUNT(*) AS `games_released`, + SUM(`recommendations`) AS `total_recommendations`, + AVG(`recommendations`) AS `average_recommendations`, AVG(`price`) AS `average_price`, - AVG(`average_playtime_forever`) AS `average_playtime` + MAX(`recommendations`) AS `max_recommendations` FROM `games` + WHERE + `release_year` IS NOT NULL + AND `genres` IS NOT NULL + AND `developer` IS NOT NULL GROUP BY - `release_year` + `release_year`, + `genres`, + `developer` + HAVING + COUNT(*) >= 2 ORDER BY - `release_year` DESC; + `total_recommendations` DESC, + `games_released` DESC + LIMIT 20; ``` In the output, you can get the execution time from the `execution info` column. ```sql - id | estRows | actRows | task | access object | execution info | operator info | memory | disk - -------------------------------------+----------+---------+--------------+---------------+-------------------------------------------------------+--------------------------------------------+---------+--------- - Sort_5 | 4019.00 | 28 | root | | time:222.2ms, loops:2, RU:25.609675 | Column#36:desc | 3.77 KB | 0 Bytes - └─TableReader_39 | 4019.00 | 28 | root | | time:222.1ms, loops:2, cop_task: {num: 2, max: 0s,... | MppVersion: 1, data:ExchangeSender_38 | 4.64 KB | N/A - └─ExchangeSender_38 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:214.8ms, loops:1, threads:1} | ExchangeType: PassThrough | N/A | N/A - └─Projection_8 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:214.8ms, loops:1, threads:1} | year(game.games.release_date)->Column#3... | N/A | N/A - └─Projection_34 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:214.8ms, loops:1, threads:1} | Column#33, div(Column#34, cast(case(eq(... | N/A | N/A - └─HashAgg_35 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:214.8ms, loops:1, threads:1} | group by:Column#63, funcs:sum(Column#64... | N/A | N/A - └─ExchangeReceiver_37 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:214.8ms, loops:1, threads:8} | | N/A | N/A - └─ExchangeSender_36 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:210.6ms, loops:1, threads:1} | ExchangeType: HashPartition, Compressio... | N/A | N/A - └─HashAgg_33 | 4019.00 | 28 | mpp[tiflash] | | tiflash_task:{time:210.6ms, loops:1, threads:1} | group by:Column#75, funcs:count(1)->Col... | N/A | N/A - └─Projection_40 | 68223.00 | 68223 | mpp[tiflash] | | tiflash_task:{time:210.6ms, loops:2, threads:8} | game.games.price, game.games.price, gam... | N/A | N/A - └─TableFullScan_23 | 68223.00 | 68223 | mpp[tiflash] | table:games | tiflash_task:{time:210.6ms, loops:2, threads:8}, ... | keep order:false | N/A | N/A - (11 rows) + id | estRows | actRows | task | access object | execution info | operator info | memory | disk + ----------------------------------------+----------+---------+--------------+---------------+---------------------------------------------+--------------------------------------------+---------+--------- + Projection_10 | 20.00 | 20 | root | | time:92.5ms, loops:2, RU:120.42, ... | steam.games.release_year, ... | 6.03 KB | N/A + └─TopN_14 | 20.00 | 20 | root | | time:92.4ms, loops:2 | Column#13:desc, Column#12:desc, ... | 4.32 KB | 0 Bytes + └─TableReader_68 | 20.00 | 20 | root | | time:92.4ms, loops:2, cop_task:{num:2... | MppVersion: 2, data:ExchangeSender_67 | 7.99 KB | N/A + └─ExchangeSender_67 | 20.00 | 20 | mpp[tiflash] | | tiflash_task:{time:91ms, loops:1, ... | ExchangeType: PassThrough | N/A | N/A + └─TopN_66 | 20.00 | 20 | mpp[tiflash] | | tiflash_task:{time:91ms, loops:1, ... | Column#13:desc, Column#12:desc, ... | N/A | N/A + └─Selection_65 | 36774.40 | 2458 | mpp[tiflash] | | tiflash_task:{time:91ms, loops:1, ... | ge(Column#12, ?) | N/A | N/A + └─Projection_58 | 45968.00 | 59883 | mpp[tiflash] | | tiflash_task:{time:91ms, loops:1, ... | Column#12, Column#13, div(Column#14, ... | N/A | N/A + └─HashAgg_56 | 45968.00 | 59883 | mpp[tiflash] | | tiflash_task:{time:71ms, loops:1, ... | group by:Column#79, Column#80, ... | N/A | N/A + └─Projection_71 | 65521.00 | 65521 | mpp[tiflash] | | tiflash_task:{time:31ms, loops:7, ... | cast(steam.games.recommendations, ... | N/A | N/A + └─ExchangeReceiver_41 | 65521.00 | 65521 | mpp[tiflash] | | tiflash_task:{time:31ms, loops:7, ... | | N/A | N/A + └─ExchangeSender_40 | 65521.00 | 65521 | mpp[tiflash] | | tiflash_task:{time:31.2ms, loops:7, ... | ExchangeType: HashPartition, ... | N/A | N/A + └─Selection_39 | 65521.00 | 65521 | mpp[tiflash] | | tiflash_task:{time:21.2ms, loops:7, ... | not(isnull(steam.games.developer)), ... | N/A | N/A + └─TableFullScan_38| 65521.00 | 65521 | mpp[tiflash] | table:games | tiflash_task:{time:21.2ms, loops:7, ... | pushed down filter:empty, keep order:false | N/A | N/A + (13 rows) ``` > **Note:** > -> Because the size of sample data is small and the query in this document is very simple, if you have already forced the optimizer to choose TiKV for this query and run the same query again, TiKV will reuse its cache, so the query might be much faster. If the data is updated frequently, the cache will be missed. +> Because the sample dataset is small and the query in this document is relatively simple, if you have already forced the optimizer to choose TiKV for this query and then run the same query again, TiKV will reuse its cache, so the query might be much faster. If the data is updated frequently, the cache will be missed. ## Learn more diff --git a/tidb-cloud/tidb-cloud-quickstart.md b/tidb-cloud/tidb-cloud-quickstart.md index 86dfe3cda8d2b..320ee8cc70043 100644 --- a/tidb-cloud/tidb-cloud-quickstart.md +++ b/tidb-cloud/tidb-cloud-quickstart.md @@ -131,6 +131,19 @@ TiDB Cloud offers an interactive tutorial with carefully crafted sample datasets 2. Choose a {{{ .starter }}} instance that you want to use for the tour, and click **Import Dataset**. The import process might take approximately one minute. 3. Once the sample data is imported, follow the on-screen instructions to complete the tour. +## Step 4: Connect to your {{{ .starter }}} instance + +TiDB is highly compatible with the MySQL protocol, so you can connect to it using most MySQL tools, drivers, and ORMs. The following steps use MySQL CLI client `mysql` as an example. + +1. Navigate to the [**My TiDB**](https://tidbcloud.com/tidbs) page, and then click the name of your target {{{ .starter }}} instance to go to its overview page. +2. Click **Connect** in the upper-right corner. A connection dialog is displayed with connection parameters. +3. Ensure that the configurations in the connection dialog match your operating environment. +4. Click **Generate Password** to create a random password. +5. In the **Connect With** drop-down list, select **MySQL CLI**. +6. Copy the connection command, paste it into your terminal, and then press **Enter** to run it. + +The MySQL CLI client opens and connects to your {{{ .starter }}} instance. + ## What's next - To learn how to connect to your {{{ .starter }}} instance using different methods, see [Connect to a {{{ .starter }}} or Essential instance](/tidb-cloud/connect-to-tidb-cluster-serverless.md). From e60a5e539c0a461ac24037e9bf4ca4db0efb94f4 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Sat, 27 Jun 2026 11:25:58 +0800 Subject: [PATCH 2/2] Apply suggestions from code review Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb-cloud/tidb-cloud-htap-quickstart.md | 10 +++++----- tidb-cloud/tidb-cloud-quickstart.md | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tidb-cloud/tidb-cloud-htap-quickstart.md b/tidb-cloud/tidb-cloud-htap-quickstart.md index 23bf13ef88c03..6167674e7e063 100644 --- a/tidb-cloud/tidb-cloud-htap-quickstart.md +++ b/tidb-cloud/tidb-cloud-htap-quickstart.md @@ -12,7 +12,7 @@ This tutorial guides you through an easy way to experience the Hybrid Transactio ## Before you begin -Before trying the HTAP feature, follow [TiDB Cloud Quick Start](/tidb-cloud/tidb-cloud-quickstart.md) to create a {{{ .starter }}} instance, and then import your data (this document takes the **Steam Games Dataset 2021-2025** as an example) into the instance as follows: +Before trying the HTAP feature, follow [TiDB Cloud Quick Start](/tidb-cloud/tidb-cloud-quickstart.md) to create a {{{ .starter }}} instance, and then import your data (this document uses the **Steam Games Dataset 2021-2025** as an example) into the instance as follows: 1. Download the [Steam Games Dataset 2021-2025](https://www.kaggle.com/datasets/jypenpen54534/steam-games-dataset-2021-2025-65k) from Kaggle. 2. Open the **Import** page for your {{{ .starter }}} instance. @@ -33,9 +33,9 @@ Before trying the HTAP feature, follow [TiDB Cloud Quick Start](/tidb-cloud/tidb ### Step 1. Replicate the sample data to the columnar storage engine -After a {{{ .starter }}} instance is created, TiDB does not replicate data from TiKV to TiFlash by default. To replicate a target table to TiFlash, use a MySQL client to connect to your {{{ .starter }}} instance and execute a DDL statement. TiDB then creates the specified table replicas in TiFlash. +After you create a {{{ .starter }}} instance, TiDB does not replicate data from TiKV to TiFlash by default. To replicate a target table to TiFlash, use a MySQL client to connect to your {{{ .starter }}} instance and execute a DDL statement. TiDB then creates the specified table replicas in TiFlash. -For example, to replicate the `games` table (from the **Steam Games Dataset 2021-2025** sample dataset) to TiFlash, execute the following statements: +For example, to replicate the `games` table (from the **Steam Games Dataset 2021-2025**) to TiFlash, execute the following statements: ```sql USE steam; @@ -67,7 +67,7 @@ In the result of the preceding statement: ### Step 2. Query data using HTAP -When the process of replication is completed, you can start to run some queries. +After the replication completes, you can run queries. For example, you can check the number of games released every year, as well as the average price and average number of recommendations: @@ -196,7 +196,7 @@ In this step, you can compare the execution statistics between TiKV (row-based s > **Note:** > -> Because the sample dataset is small and the query in this document is relatively simple, if you have already forced the optimizer to choose TiKV for this query and then run the same query again, TiKV will reuse its cache, so the query might be much faster. If the data is updated frequently, the cache will be missed. +> Because the sample dataset is small and the query in this document is relatively simple, if you force the optimizer to choose TiKV for this query and then run the same query again, TiKV reuses its cache, which makes the query much faster. If the data is updated frequently, the cache is invalidated. ## Learn more diff --git a/tidb-cloud/tidb-cloud-quickstart.md b/tidb-cloud/tidb-cloud-quickstart.md index 320ee8cc70043..8fc535d293e00 100644 --- a/tidb-cloud/tidb-cloud-quickstart.md +++ b/tidb-cloud/tidb-cloud-quickstart.md @@ -133,7 +133,7 @@ TiDB Cloud offers an interactive tutorial with carefully crafted sample datasets ## Step 4: Connect to your {{{ .starter }}} instance -TiDB is highly compatible with the MySQL protocol, so you can connect to it using most MySQL tools, drivers, and ORMs. The following steps use MySQL CLI client `mysql` as an example. +TiDB is highly compatible with the MySQL protocol, so you can connect to it using most MySQL tools, drivers, and ORMs. The following steps use the MySQL CLI client `mysql` as an example. 1. Navigate to the [**My TiDB**](https://tidbcloud.com/tidbs) page, and then click the name of your target {{{ .starter }}} instance to go to its overview page. 2. Click **Connect** in the upper-right corner. A connection dialog is displayed with connection parameters.