From c89ac172e423e27fb873caafbf56ab251bcacc0e Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 04:23:23 -0700 Subject: [PATCH 01/10] docs(table-design): add default-first Partitioning & Bucketing landing The Partitioning & Bucketing section opened on a 37KB concept page (basic-concepts) and listed the most manual option first, so a reader had to absorb the full model before seeing a recommended configuration. Add a default-first landing page that leads with a recommended CREATE TABLE (auto partition by time + BUCKETS AUTO), a decision table for when to customize, and links to the how-tos. Make it the category landing, reorder the how-tos default-first (auto, then dynamic, then manual), and repurpose basic-concepts as the 'How It Works' explanation page it already is. Content of basic-concepts is unchanged apart from its title and intro; deeper de-duplication (advanced modes vs the auto/dynamic/manual pages) is left for a follow-up. Applies to both current docs and version-4.x. --- .../data-partitioning/basic-concepts.mdx | 7 ++- .../data-partitioning/overview.md | 63 +++++++++++++++++++ sidebars.ts | 7 ++- .../data-partitioning/basic-concepts.mdx | 7 ++- .../data-partitioning/overview.md | 63 +++++++++++++++++++ versioned_sidebars/version-4.x-sidebars.json | 7 ++- 6 files changed, 142 insertions(+), 12 deletions(-) create mode 100644 docs/table-design/data-partitioning/overview.md create mode 100644 versioned_docs/version-4.x/table-design/data-partitioning/overview.md diff --git a/docs/table-design/data-partitioning/basic-concepts.mdx b/docs/table-design/data-partitioning/basic-concepts.mdx index 1c11a1f64ee46..c6306267e5088 100644 --- a/docs/table-design/data-partitioning/basic-concepts.mdx +++ b/docs/table-design/data-partitioning/basic-concepts.mdx @@ -1,8 +1,9 @@ --- { - "title": "Basic Concepts", + "title": "How Partitioning and Bucketing Work", + "sidebar_label": "How It Works", "language": "en", - "description": "A progressive introduction to Doris partitioning and bucketing: from core concepts and the first CREATE TABLE example to auto/dynamic partitioning, auto-bucketing, Colocate, and other advanced capabilities, along with design recommendations and operational guidance for partitions and buckets." + "description": "The data-distribution model behind Doris partitioning and bucketing: partitions, buckets, tablets, and nodes, plus advanced partition and bucket modes, design recommendations, and operational guidance." } --- @@ -12,7 +13,7 @@ import TabItem from '@theme/TabItem'; {/* Knowledge type: Concept introduction / Procedure */} {/* Applicable scenarios: Table design / Data organization and management */} -This document introduces the partitioning (Partition) and bucketing (Bucket) mechanisms of Doris, helping you design table structures reasonably to improve query performance and data management efficiency. New users are recommended to read the sections in order: Sections 1-3 cover core concepts and the first CREATE TABLE example, Sections 4-6 cover advanced features and design recommendations, and Section 7 covers the methods for viewing and modifying partitions needed for daily operations. +This page explains how Doris distributes data across partitions, buckets, and tablets, and covers the advanced partition and bucket modes, design recommendations, and operational commands. For the recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview); read this page when you want to understand the underlying model. ## 1. Overview diff --git a/docs/table-design/data-partitioning/overview.md b/docs/table-design/data-partitioning/overview.md new file mode 100644 index 0000000000000..cdfad6279b4a3 --- /dev/null +++ b/docs/table-design/data-partitioning/overview.md @@ -0,0 +1,63 @@ +--- +{ + "title": "Partitioning and Bucketing", + "language": "en", + "description": "The recommended partitioning and bucketing for a Doris table, and when to customize: auto, dynamic, and manual partitioning, bucketing method, and bucket count." +} +--- + +Doris organizes a table in two tiers: partitions split rows by column value, and buckets shard each partition for parallelism. This page gives the recommended starting point and shows when to customize. + +## Recommended Starting Point + +For most tables, partition by time and let Doris manage partition creation and bucket sizing automatically: + +```sql +CREATE TABLE sales ( + sale_time DATETIME NOT NULL, + order_id BIGINT NOT NULL, + amount DECIMAL(10, 2) +) +DUPLICATE KEY(sale_time, order_id) +AUTO PARTITION BY RANGE (date_trunc(sale_time, 'day')) () +DISTRIBUTED BY HASH(order_id) BUCKETS AUTO; +``` + +- **Auto partitioning** creates a partition as data arrives, so you never pre-define or backfill partition ranges. +- **`BUCKETS AUTO`** lets Doris size the number of shards from the data. +- Partition pruning on `sale_time` and parallel scans across buckets keep queries fast. + +If the table has no time column or stays small (under about 1 GB), use a single partition with a fixed bucket count: + +```sql +DISTRIBUTED BY HASH(order_id) BUCKETS 10 +``` + +## Choose Your Design + +Customize only when the default does not fit: + +| Decision | Recommended default | Change it when | +| --- | --- | --- | +| How to partition | [Auto partitioning](./auto-partitioning) by a time column | You need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); you want a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | +| Bucketing method | Hash on a high-cardinality column | Data skews or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | +| Number of buckets | `BUCKETS AUTO` | You know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | + +## How It Works + +Doris maps data in two tiers: + +```text +Table ──► Partition (by column value) ──► Bucket (hash or random) ──► Tablet (shard on a BE node) +``` + +Partitions prune data and enable lifecycle management, such as archiving or dropping by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts). + +## Next Steps + +- [Auto Partitioning](./auto-partitioning): the default, with no manual range maintenance. +- [Dynamic Partitioning](./dynamic-partitioning): rolling time windows with retention. +- [Manual Partitioning](./manual-partitioning): explicit ranges and list partitions. +- [Data Bucketing](./data-bucketing): choose the method, key, and bucket count. +- [How Partitioning and Bucketing Work](./basic-concepts): the underlying data-distribution model. +- [Common Issues](./common-issues): troubleshooting partition and bucket design. diff --git a/sidebars.ts b/sidebars.ts index 34b9d609ff1b7..74e0cd6d659e4 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -208,12 +208,13 @@ const sidebars: SidebarsConfig = { { type: 'category', label: 'Partitioning & Bucketing', - link: {type: 'doc', id: 'table-design/data-partitioning/basic-concepts'}, + link: {type: 'doc', id: 'table-design/data-partitioning/overview'}, items: [ - 'table-design/data-partitioning/manual-partitioning', - 'table-design/data-partitioning/dynamic-partitioning', 'table-design/data-partitioning/auto-partitioning', + 'table-design/data-partitioning/dynamic-partitioning', + 'table-design/data-partitioning/manual-partitioning', 'table-design/data-partitioning/data-bucketing', + 'table-design/data-partitioning/basic-concepts', 'table-design/data-partitioning/common-issues', ], }, diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx b/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx index 1c11a1f64ee46..c6306267e5088 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx +++ b/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx @@ -1,8 +1,9 @@ --- { - "title": "Basic Concepts", + "title": "How Partitioning and Bucketing Work", + "sidebar_label": "How It Works", "language": "en", - "description": "A progressive introduction to Doris partitioning and bucketing: from core concepts and the first CREATE TABLE example to auto/dynamic partitioning, auto-bucketing, Colocate, and other advanced capabilities, along with design recommendations and operational guidance for partitions and buckets." + "description": "The data-distribution model behind Doris partitioning and bucketing: partitions, buckets, tablets, and nodes, plus advanced partition and bucket modes, design recommendations, and operational guidance." } --- @@ -12,7 +13,7 @@ import TabItem from '@theme/TabItem'; {/* Knowledge type: Concept introduction / Procedure */} {/* Applicable scenarios: Table design / Data organization and management */} -This document introduces the partitioning (Partition) and bucketing (Bucket) mechanisms of Doris, helping you design table structures reasonably to improve query performance and data management efficiency. New users are recommended to read the sections in order: Sections 1-3 cover core concepts and the first CREATE TABLE example, Sections 4-6 cover advanced features and design recommendations, and Section 7 covers the methods for viewing and modifying partitions needed for daily operations. +This page explains how Doris distributes data across partitions, buckets, and tablets, and covers the advanced partition and bucket modes, design recommendations, and operational commands. For the recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview); read this page when you want to understand the underlying model. ## 1. Overview diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md new file mode 100644 index 0000000000000..cdfad6279b4a3 --- /dev/null +++ b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md @@ -0,0 +1,63 @@ +--- +{ + "title": "Partitioning and Bucketing", + "language": "en", + "description": "The recommended partitioning and bucketing for a Doris table, and when to customize: auto, dynamic, and manual partitioning, bucketing method, and bucket count." +} +--- + +Doris organizes a table in two tiers: partitions split rows by column value, and buckets shard each partition for parallelism. This page gives the recommended starting point and shows when to customize. + +## Recommended Starting Point + +For most tables, partition by time and let Doris manage partition creation and bucket sizing automatically: + +```sql +CREATE TABLE sales ( + sale_time DATETIME NOT NULL, + order_id BIGINT NOT NULL, + amount DECIMAL(10, 2) +) +DUPLICATE KEY(sale_time, order_id) +AUTO PARTITION BY RANGE (date_trunc(sale_time, 'day')) () +DISTRIBUTED BY HASH(order_id) BUCKETS AUTO; +``` + +- **Auto partitioning** creates a partition as data arrives, so you never pre-define or backfill partition ranges. +- **`BUCKETS AUTO`** lets Doris size the number of shards from the data. +- Partition pruning on `sale_time` and parallel scans across buckets keep queries fast. + +If the table has no time column or stays small (under about 1 GB), use a single partition with a fixed bucket count: + +```sql +DISTRIBUTED BY HASH(order_id) BUCKETS 10 +``` + +## Choose Your Design + +Customize only when the default does not fit: + +| Decision | Recommended default | Change it when | +| --- | --- | --- | +| How to partition | [Auto partitioning](./auto-partitioning) by a time column | You need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); you want a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | +| Bucketing method | Hash on a high-cardinality column | Data skews or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | +| Number of buckets | `BUCKETS AUTO` | You know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | + +## How It Works + +Doris maps data in two tiers: + +```text +Table ──► Partition (by column value) ──► Bucket (hash or random) ──► Tablet (shard on a BE node) +``` + +Partitions prune data and enable lifecycle management, such as archiving or dropping by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts). + +## Next Steps + +- [Auto Partitioning](./auto-partitioning): the default, with no manual range maintenance. +- [Dynamic Partitioning](./dynamic-partitioning): rolling time windows with retention. +- [Manual Partitioning](./manual-partitioning): explicit ranges and list partitions. +- [Data Bucketing](./data-bucketing): choose the method, key, and bucket count. +- [How Partitioning and Bucketing Work](./basic-concepts): the underlying data-distribution model. +- [Common Issues](./common-issues): troubleshooting partition and bucket design. diff --git a/versioned_sidebars/version-4.x-sidebars.json b/versioned_sidebars/version-4.x-sidebars.json index 1963c0873aace..4e2cda3550935 100644 --- a/versioned_sidebars/version-4.x-sidebars.json +++ b/versioned_sidebars/version-4.x-sidebars.json @@ -246,13 +246,14 @@ "label": "Partitioning & Bucketing", "link": { "type": "doc", - "id": "table-design/data-partitioning/basic-concepts" + "id": "table-design/data-partitioning/overview" }, "items": [ - "table-design/data-partitioning/manual-partitioning", - "table-design/data-partitioning/dynamic-partitioning", "table-design/data-partitioning/auto-partitioning", + "table-design/data-partitioning/dynamic-partitioning", + "table-design/data-partitioning/manual-partitioning", "table-design/data-partitioning/data-bucketing", + "table-design/data-partitioning/basic-concepts", "table-design/data-partitioning/common-issues" ] }, From 502c74b4f2cbfeb5195c4585838c716071e00d48 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 04:35:56 -0700 Subject: [PATCH 02/10] docs(table-design): trim duplicated examples from partitioning concepts page Remove the three CREATE TABLE Tab examples in 'Advanced: Partition Modes' and the Auto Bucketing example, which duplicate the dedicated auto/dynamic/manual partitioning and data-bucketing pages; replace with links. Keep the partition- mode comparison table, Colocate, and all design recommendations (FE/BE tablet limits, bucket-count guidance), which are not duplicated elsewhere. Removes the now-unused Tabs imports. ~140 lines lighter. --- .../data-partitioning/basic-concepts.mdx | 145 +----------------- .../data-partitioning/basic-concepts.mdx | 145 +----------------- 2 files changed, 4 insertions(+), 286 deletions(-) diff --git a/docs/table-design/data-partitioning/basic-concepts.mdx b/docs/table-design/data-partitioning/basic-concepts.mdx index c6306267e5088..de659b4338ea2 100644 --- a/docs/table-design/data-partitioning/basic-concepts.mdx +++ b/docs/table-design/data-partitioning/basic-concepts.mdx @@ -7,8 +7,6 @@ } --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; {/* Knowledge type: Concept introduction / Procedure */} {/* Applicable scenarios: Table design / Data organization and management */} @@ -149,152 +147,13 @@ Besides manually declaring partitions at table creation time, Doris also support | Dynamic partition | Automatically created/recycled by the system based on time scheduling rules | Time-series data, where you want to automatically maintain rolling partitions for the past N days/weeks/months | | Auto partition | Created on demand when data is written | Partition values are unpredictable (such as multi-tenant or sparse time), where pre-creation should be avoided | -The following shows CREATE TABLE examples for common combinations: - - - - -[Auto Partition](./auto-partitioning) supports automatically creating corresponding partitions according to user-defined rules during data ingestion, making it more convenient to use. The basic example rewritten as Auto Range partition: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Use month as the partition granularity -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1" -); -``` - -With this CREATE TABLE statement, when data is loaded, Doris automatically creates corresponding partitions for the `date` column at the month level. For example, `2018-12-01` and `2018-12-31` fall into the same partition, while `2018-11-12` falls into another partition. Auto Partition also supports List partitioning. For more usage, see the Auto Partition documentation. - - - - - -[Dynamic Partition](./dynamic-partitioning) is a management approach that automatically creates and recycles partitions based on real time. The basic example rewritten as Dynamic Partition: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -PARTITION BY RANGE(`date`) -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "WEEK", --- Partition granularity is week - "dynamic_partition.start" = "-2", --- Retain the past two weeks - "dynamic_partition.end" = "2", --- Pre-create the next two weeks - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -Dynamic Partition supports tiered storage, custom replica counts, and more. See the Dynamic Partition documentation for details. - - - - - -Auto Partition and Dynamic Partition each have their own advantages. Combining the two enables flexible on-demand creation and automatic recycling of partitions: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Use month as the partition granularity -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "month", --- The two granularities must be the same - "dynamic_partition.start" = "-2", --- Dynamic Partition automatically cleans up historical partitions older than two weeks - "dynamic_partition.end" = "0", --- Dynamic Partition does not create future partitions; this is fully delegated to Auto Partition - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -For details about this feature, see [Using Auto Partition with Dynamic Partition](./auto-partitioning#lifecycle-management). - - - - +For ready-to-use CREATE TABLE examples of each mode, including combining auto with dynamic partitioning, see [Auto Partitioning](./auto-partitioning), [Dynamic Partitioning](./dynamic-partitioning), and [Manual Partitioning](./manual-partitioning). ## 5. Advanced: Bucketing ### 5.1 Auto Bucketing -When you are not sure about a reasonable number of buckets, you can use Auto Bucketing to let Doris perform the estimation. You only need to provide the estimated table data size: - -```sql -CREATE TABLE IF NOT EXISTS example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -PARTITION BY RANGE(`date`) -( - PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), - PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), - PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), - PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) -) -DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO -PROPERTIES -( - "replication_num" = "1", - "estimate_partition_size" = "2G" --- Estimated data volume for one partition; defaults to 10G if not provided -); -``` - -Note that this approach is not suitable for scenarios with extremely large table data volumes. +When you are unsure how many buckets to use, set `BUCKETS AUTO` and let Doris size them from an estimated data volume (`estimate_partition_size`). This is not suitable for extremely large tables. For details, see [Data Bucketing](./data-bucketing). ### 5.2 Colocate diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx b/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx index c6306267e5088..de659b4338ea2 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx +++ b/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx @@ -7,8 +7,6 @@ } --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; {/* Knowledge type: Concept introduction / Procedure */} {/* Applicable scenarios: Table design / Data organization and management */} @@ -149,152 +147,13 @@ Besides manually declaring partitions at table creation time, Doris also support | Dynamic partition | Automatically created/recycled by the system based on time scheduling rules | Time-series data, where you want to automatically maintain rolling partitions for the past N days/weeks/months | | Auto partition | Created on demand when data is written | Partition values are unpredictable (such as multi-tenant or sparse time), where pre-creation should be avoided | -The following shows CREATE TABLE examples for common combinations: - - - - -[Auto Partition](./auto-partitioning) supports automatically creating corresponding partitions according to user-defined rules during data ingestion, making it more convenient to use. The basic example rewritten as Auto Range partition: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Use month as the partition granularity -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1" -); -``` - -With this CREATE TABLE statement, when data is loaded, Doris automatically creates corresponding partitions for the `date` column at the month level. For example, `2018-12-01` and `2018-12-31` fall into the same partition, while `2018-11-12` falls into another partition. Auto Partition also supports List partitioning. For more usage, see the Auto Partition documentation. - - - - - -[Dynamic Partition](./dynamic-partitioning) is a management approach that automatically creates and recycles partitions based on real time. The basic example rewritten as Dynamic Partition: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -PARTITION BY RANGE(`date`) -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "WEEK", --- Partition granularity is week - "dynamic_partition.start" = "-2", --- Retain the past two weeks - "dynamic_partition.end" = "2", --- Pre-create the next two weeks - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -Dynamic Partition supports tiered storage, custom replica counts, and more. See the Dynamic Partition documentation for details. - - - - - -Auto Partition and Dynamic Partition each have their own advantages. Combining the two enables flexible on-demand creation and automatic recycling of partitions: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Use month as the partition granularity -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "month", --- The two granularities must be the same - "dynamic_partition.start" = "-2", --- Dynamic Partition automatically cleans up historical partitions older than two weeks - "dynamic_partition.end" = "0", --- Dynamic Partition does not create future partitions; this is fully delegated to Auto Partition - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -For details about this feature, see [Using Auto Partition with Dynamic Partition](./auto-partitioning#lifecycle-management). - - - - +For ready-to-use CREATE TABLE examples of each mode, including combining auto with dynamic partitioning, see [Auto Partitioning](./auto-partitioning), [Dynamic Partitioning](./dynamic-partitioning), and [Manual Partitioning](./manual-partitioning). ## 5. Advanced: Bucketing ### 5.1 Auto Bucketing -When you are not sure about a reasonable number of buckets, you can use Auto Bucketing to let Doris perform the estimation. You only need to provide the estimated table data size: - -```sql -CREATE TABLE IF NOT EXISTS example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "User ID", - `date` DATE NOT NULL COMMENT "Data ingestion date", - `timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp", - `city` VARCHAR(20) COMMENT "User's city", - `age` SMALLINT COMMENT "User age", - `sex` TINYINT COMMENT "User gender", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time", - `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" -) -PARTITION BY RANGE(`date`) -( - PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), - PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), - PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), - PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) -) -DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO -PROPERTIES -( - "replication_num" = "1", - "estimate_partition_size" = "2G" --- Estimated data volume for one partition; defaults to 10G if not provided -); -``` - -Note that this approach is not suitable for scenarios with extremely large table data volumes. +When you are unsure how many buckets to use, set `BUCKETS AUTO` and let Doris size them from an estimated data volume (`estimate_partition_size`). This is not suitable for extremely large tables. For details, see [Data Bucketing](./data-bucketing). ### 5.2 Colocate From cad083311ff79657370fb2fa5e8195b8d236957a Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 04:38:15 -0700 Subject: [PATCH 03/10] docs(table-design): zh-CN for default-first partitioning landing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the 中文 partition/bucket landing page and apply the same default-first restructure and example trim to the 中文 basic-concepts page, matching the English changes. Current docs and version-4.x. --- .../data-partitioning/basic-concepts.mdx | 150 +----------------- .../data-partitioning/overview.md | 63 ++++++++ .../data-partitioning/basic-concepts.mdx | 150 +----------------- .../data-partitioning/overview.md | 63 ++++++++ 4 files changed, 136 insertions(+), 290 deletions(-) create mode 100644 i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md create mode 100644 i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.mdx b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.mdx index a280f7cfc1efc..7d51e12be4f21 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.mdx +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.mdx @@ -1,18 +1,17 @@ --- { - "title": "基本概念", + "title": "分区与分桶原理", + "sidebar_label": "工作原理", "language": "zh-CN", "description": "由浅入深介绍 Doris 的分区与分桶机制:从核心概念、第一个建表示例到自动/动态分区、自动分桶、Colocate 等进阶能力,并给出分区分桶的设计建议与运维方法。" } --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; {/* 知识类型: 概念介绍 / 操作步骤 */} {/* 适用场景: 建表设计 / 数据组织与管理 */} -本文介绍 Doris 的分区(Partition)与分桶(Bucket)机制,帮助用户合理设计表结构以提升查询性能与数据管理效率。建议新手按章节顺序阅读:第 1–3 节涵盖核心概念与第一个建表示例,第 4–6 节为进阶特性与设计建议,第 7 节为日常运维所需的查看与修改方法。 +本文介绍 Doris 如何将数据分布到分区、分桶与 Tablet,并涵盖进阶的分区与分桶模式、设计建议与运维命令。若需要推荐的起步配置与选型指南,请先阅读 [分区与分桶](./overview);当你希望理解底层模型时再阅读本文。 ## 1. 概述 @@ -148,152 +147,13 @@ PROPERTIES | 动态分区 | 系统按时间调度规则自动创建/回收 | 时间序列数据,希望自动滚动维护近 N 天/周/月分区 | | 自动分区 | 数据写入时按需创建 | 分区取值不可预知(如多租户、稀疏时间),希望避免预创建 | -下面给出常见组合的建表示例: - - - - -[自动分区](./auto-partitioning) 支持在数据导入时根据用户定义的规则自动创建对应分区,使用更为便捷。将基础示例改写为自动 Range 分区: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1" -); -``` - -如上建表,当数据导入时,Doris 将自动按月级别为 `date` 列创建对应分区。例如 `2018-12-01` 与 `2018-12-31` 会落入同一个分区,而 `2018-11-12` 会落入另一个分区。自动分区还支持 List 分区,更多用法请查看自动分区文档。 - - - - - -[动态分区](./dynamic-partitioning) 是根据现实时间进行自动分区创建与回收的管理方式。将基础示例改写为动态分区: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -PARTITION BY RANGE(`date`) -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "WEEK", --- 分区粒度为周 - "dynamic_partition.start" = "-2", --- 向前保留两周 - "dynamic_partition.end" = "2", --- 提前创建后两周 - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -动态分区支持分层存储、自定义副本数等功能,详见动态分区文档。 - - - - - -自动分区与动态分区各有优势,二者结合可实现分区的灵活按需创建与自动回收: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "month", --- 二者粒度必须相同 - "dynamic_partition.start" = "-2", --- 动态分区自动清理超过两周的历史分区 - "dynamic_partition.end" = "0", --- 动态分区不创建未来分区,完全交给自动分区 - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -关于该功能的细节说明,详见 [自动分区与动态分区联用](./auto-partitioning#与动态分区联用)。 - - - - +各模式开箱即用的建表示例,以及自动分区与动态分区的组合用法,见[自动分区](./auto-partitioning)、[动态分区](./dynamic-partitioning)与[手动分区](./manual-partitioning)。 ## 5. 进阶:分桶进阶 ### 5.1 自动分桶 -当用户不确定合理的分桶数时,可以使用自动分桶让 Doris 完成估计,用户仅需提供预估的表数据量: - -```sql -CREATE TABLE IF NOT EXISTS example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -PARTITION BY RANGE(`date`) -( - PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), - PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), - PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), - PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) -) -DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO -PROPERTIES -( - "replication_num" = "1", - "estimate_partition_size" = "2G" --- 用户估计一个分区将有的数据量,不提供则默认为 10G -); -``` - -需要注意的是,该方式不适用于表数据量特别大的场景。 +当不确定合理的分桶数时,可设置 `BUCKETS AUTO`,由 Doris 根据预估数据量(`estimate_partition_size`)自动确定分桶数。该方式不适用于数据量极大的表。详见[数据分桶](./data-bucketing)。 ### 5.2 Colocate(同分布) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md new file mode 100644 index 0000000000000..55f94742d6d0e --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md @@ -0,0 +1,63 @@ +--- +{ + "title": "分区与分桶", + "language": "zh-CN", + "description": "为 Doris 表选择推荐的分区与分桶方式,以及何时自定义:自动、动态、手动分区,分桶方式与分桶数。" +} +--- + +Doris 将一张表分为两层组织:分区按列值拆分数据行,分桶将每个分区切分为多个分片以实现并行。本文给出推荐的起步配置,并说明何时需要自定义。 + +## 推荐起步配置 + +大多数表建议按时间分区,并让 Doris 自动创建分区、自动确定分桶数: + +```sql +CREATE TABLE sales ( + sale_time DATETIME NOT NULL, + order_id BIGINT NOT NULL, + amount DECIMAL(10, 2) +) +DUPLICATE KEY(sale_time, order_id) +AUTO PARTITION BY RANGE (date_trunc(sale_time, 'day')) () +DISTRIBUTED BY HASH(order_id) BUCKETS AUTO; +``` + +- **自动分区(Auto Partition)**:数据写入时按需创建分区,无需预先定义或回填分区范围。 +- **`BUCKETS AUTO`**:由 Doris 根据数据量自动确定分片数量。 +- 基于 `sale_time` 的分区裁剪与跨分桶的并行扫描可保证查询性能。 + +如果表没有时间列,或数据量较小(约 1 GB 以内),使用单分区加固定分桶数即可: + +```sql +DISTRIBUTED BY HASH(order_id) BUCKETS 10 +``` + +## 选择你的设计 + +仅在默认方式不适用时才自定义: + +| 决策项 | 推荐默认 | 何时调整 | +| --- | --- | --- | +| 如何分区 | 按时间列[自动分区](./auto-partitioning) | 需要固定或不规则范围时,用[手动分区](./manual-partitioning);需要按时间滚动并保留窗口时,用[动态分区](./dynamic-partitioning) | +| 分桶方式 | 按高基数列做 Hash 分桶 | 数据倾斜或需按任意维度过滤时,用 Random 分桶([数据分桶](./data-bucketing)) | +| 分桶数量 | `BUCKETS AUTO` | 已知数据量并希望固定控制时,手动设置分桶数([数据分桶](./data-bucketing)) | + +## 工作原理 + +Doris 将数据按两层映射: + +```text +表 ──► 分区(按列值)──► 分桶(Hash 或 Random)──► Tablet(BE 节点上的分片) +``` + +分区用于数据裁剪与生命周期管理(如按时间归档或删除),分桶将每个分区分散到多个 Tablet 以实现读写并行。完整的数据分布模型(包括 Tablet、副本及其与节点的映射),见[分区与分桶原理](./basic-concepts)。 + +## 后续步骤 + +- [自动分区](./auto-partitioning):默认方式,无需手动维护分区范围。 +- [动态分区](./dynamic-partitioning):按时间滚动并保留窗口。 +- [手动分区](./manual-partitioning):显式声明 Range 与 List 分区。 +- [数据分桶](./data-bucketing):选择分桶方式、分桶键与分桶数。 +- [分区与分桶原理](./basic-concepts):底层数据分布模型。 +- [常见问题](./common-issues):分区与分桶设计的排查方法。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx index a280f7cfc1efc..7d51e12be4f21 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx @@ -1,18 +1,17 @@ --- { - "title": "基本概念", + "title": "分区与分桶原理", + "sidebar_label": "工作原理", "language": "zh-CN", "description": "由浅入深介绍 Doris 的分区与分桶机制:从核心概念、第一个建表示例到自动/动态分区、自动分桶、Colocate 等进阶能力,并给出分区分桶的设计建议与运维方法。" } --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; {/* 知识类型: 概念介绍 / 操作步骤 */} {/* 适用场景: 建表设计 / 数据组织与管理 */} -本文介绍 Doris 的分区(Partition)与分桶(Bucket)机制,帮助用户合理设计表结构以提升查询性能与数据管理效率。建议新手按章节顺序阅读:第 1–3 节涵盖核心概念与第一个建表示例,第 4–6 节为进阶特性与设计建议,第 7 节为日常运维所需的查看与修改方法。 +本文介绍 Doris 如何将数据分布到分区、分桶与 Tablet,并涵盖进阶的分区与分桶模式、设计建议与运维命令。若需要推荐的起步配置与选型指南,请先阅读 [分区与分桶](./overview);当你希望理解底层模型时再阅读本文。 ## 1. 概述 @@ -148,152 +147,13 @@ PROPERTIES | 动态分区 | 系统按时间调度规则自动创建/回收 | 时间序列数据,希望自动滚动维护近 N 天/周/月分区 | | 自动分区 | 数据写入时按需创建 | 分区取值不可预知(如多租户、稀疏时间),希望避免预创建 | -下面给出常见组合的建表示例: - - - - -[自动分区](./auto-partitioning) 支持在数据导入时根据用户定义的规则自动创建对应分区,使用更为便捷。将基础示例改写为自动 Range 分区: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1" -); -``` - -如上建表,当数据导入时,Doris 将自动按月级别为 `date` 列创建对应分区。例如 `2018-12-01` 与 `2018-12-31` 会落入同一个分区,而 `2018-11-12` 会落入另一个分区。自动分区还支持 List 分区,更多用法请查看自动分区文档。 - - - - - -[动态分区](./dynamic-partitioning) 是根据现实时间进行自动分区创建与回收的管理方式。将基础示例改写为动态分区: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -PARTITION BY RANGE(`date`) -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "WEEK", --- 分区粒度为周 - "dynamic_partition.start" = "-2", --- 向前保留两周 - "dynamic_partition.end" = "2", --- 提前创建后两周 - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -动态分区支持分层存储、自定义副本数等功能,详见动态分区文档。 - - - - - -自动分区与动态分区各有优势,二者结合可实现分区的灵活按需创建与自动回收: - -```sql -CREATE TABLE example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 -() -DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 -PROPERTIES -( - "replication_num" = "1", - "dynamic_partition.enable" = "true", - "dynamic_partition.time_unit" = "month", --- 二者粒度必须相同 - "dynamic_partition.start" = "-2", --- 动态分区自动清理超过两周的历史分区 - "dynamic_partition.end" = "0", --- 动态分区不创建未来分区,完全交给自动分区 - "dynamic_partition.prefix" = "p", - "dynamic_partition.buckets" = "8" -); -``` - -关于该功能的细节说明,详见 [自动分区与动态分区联用](./auto-partitioning#与动态分区联用)。 - - - - +各模式开箱即用的建表示例,以及自动分区与动态分区的组合用法,见[自动分区](./auto-partitioning)、[动态分区](./dynamic-partitioning)与[手动分区](./manual-partitioning)。 ## 5. 进阶:分桶进阶 ### 5.1 自动分桶 -当用户不确定合理的分桶数时,可以使用自动分桶让 Doris 完成估计,用户仅需提供预估的表数据量: - -```sql -CREATE TABLE IF NOT EXISTS example_range_tbl -( - `user_id` LARGEINT NOT NULL COMMENT "用户 id", - `date` DATE NOT NULL COMMENT "数据灌入日期时间", - `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", - `city` VARCHAR(20) COMMENT "用户所在城市", - `age` SMALLINT COMMENT "用户年龄", - `sex` TINYINT COMMENT "用户性别", - `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", - `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", - `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", - `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -) -PARTITION BY RANGE(`date`) -( - PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), - PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), - PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), - PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) -) -DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO -PROPERTIES -( - "replication_num" = "1", - "estimate_partition_size" = "2G" --- 用户估计一个分区将有的数据量,不提供则默认为 10G -); -``` - -需要注意的是,该方式不适用于表数据量特别大的场景。 +当不确定合理的分桶数时,可设置 `BUCKETS AUTO`,由 Doris 根据预估数据量(`estimate_partition_size`)自动确定分桶数。该方式不适用于数据量极大的表。详见[数据分桶](./data-bucketing)。 ### 5.2 Colocate(同分布) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md new file mode 100644 index 0000000000000..55f94742d6d0e --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md @@ -0,0 +1,63 @@ +--- +{ + "title": "分区与分桶", + "language": "zh-CN", + "description": "为 Doris 表选择推荐的分区与分桶方式,以及何时自定义:自动、动态、手动分区,分桶方式与分桶数。" +} +--- + +Doris 将一张表分为两层组织:分区按列值拆分数据行,分桶将每个分区切分为多个分片以实现并行。本文给出推荐的起步配置,并说明何时需要自定义。 + +## 推荐起步配置 + +大多数表建议按时间分区,并让 Doris 自动创建分区、自动确定分桶数: + +```sql +CREATE TABLE sales ( + sale_time DATETIME NOT NULL, + order_id BIGINT NOT NULL, + amount DECIMAL(10, 2) +) +DUPLICATE KEY(sale_time, order_id) +AUTO PARTITION BY RANGE (date_trunc(sale_time, 'day')) () +DISTRIBUTED BY HASH(order_id) BUCKETS AUTO; +``` + +- **自动分区(Auto Partition)**:数据写入时按需创建分区,无需预先定义或回填分区范围。 +- **`BUCKETS AUTO`**:由 Doris 根据数据量自动确定分片数量。 +- 基于 `sale_time` 的分区裁剪与跨分桶的并行扫描可保证查询性能。 + +如果表没有时间列,或数据量较小(约 1 GB 以内),使用单分区加固定分桶数即可: + +```sql +DISTRIBUTED BY HASH(order_id) BUCKETS 10 +``` + +## 选择你的设计 + +仅在默认方式不适用时才自定义: + +| 决策项 | 推荐默认 | 何时调整 | +| --- | --- | --- | +| 如何分区 | 按时间列[自动分区](./auto-partitioning) | 需要固定或不规则范围时,用[手动分区](./manual-partitioning);需要按时间滚动并保留窗口时,用[动态分区](./dynamic-partitioning) | +| 分桶方式 | 按高基数列做 Hash 分桶 | 数据倾斜或需按任意维度过滤时,用 Random 分桶([数据分桶](./data-bucketing)) | +| 分桶数量 | `BUCKETS AUTO` | 已知数据量并希望固定控制时,手动设置分桶数([数据分桶](./data-bucketing)) | + +## 工作原理 + +Doris 将数据按两层映射: + +```text +表 ──► 分区(按列值)──► 分桶(Hash 或 Random)──► Tablet(BE 节点上的分片) +``` + +分区用于数据裁剪与生命周期管理(如按时间归档或删除),分桶将每个分区分散到多个 Tablet 以实现读写并行。完整的数据分布模型(包括 Tablet、副本及其与节点的映射),见[分区与分桶原理](./basic-concepts)。 + +## 后续步骤 + +- [自动分区](./auto-partitioning):默认方式,无需手动维护分区范围。 +- [动态分区](./dynamic-partitioning):按时间滚动并保留窗口。 +- [手动分区](./manual-partitioning):显式声明 Range 与 List 分区。 +- [数据分桶](./data-bucketing):选择分桶方式、分桶键与分桶数。 +- [分区与分桶原理](./basic-concepts):底层数据分布模型。 +- [常见问题](./common-issues):分区与分桶设计的排查方法。 From a67b8f4e7e40bf5169e2d9a487e4607cdd1e6aa1 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 04:47:55 -0700 Subject: [PATCH 04/10] docs(table-design): tighten English on partitioning landing for clarity Plainer wording and shorter sentences on the partitioning overview and the intro of the concepts page. Meaning unchanged. --- docs/table-design/data-partitioning/basic-concepts.mdx | 2 +- docs/table-design/data-partitioning/overview.md | 4 ++-- .../table-design/data-partitioning/basic-concepts.mdx | 2 +- .../version-4.x/table-design/data-partitioning/overview.md | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/table-design/data-partitioning/basic-concepts.mdx b/docs/table-design/data-partitioning/basic-concepts.mdx index de659b4338ea2..5923953a0ec6c 100644 --- a/docs/table-design/data-partitioning/basic-concepts.mdx +++ b/docs/table-design/data-partitioning/basic-concepts.mdx @@ -11,7 +11,7 @@ {/* Knowledge type: Concept introduction / Procedure */} {/* Applicable scenarios: Table design / Data organization and management */} -This page explains how Doris distributes data across partitions, buckets, and tablets, and covers the advanced partition and bucket modes, design recommendations, and operational commands. For the recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview); read this page when you want to understand the underlying model. +This page explains how Doris distributes data across partitions, buckets, and tablets. It also covers the advanced partition and bucket modes, design recommendations, and operational commands. For a recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview), then read this page when you want to understand the underlying model. ## 1. Overview diff --git a/docs/table-design/data-partitioning/overview.md b/docs/table-design/data-partitioning/overview.md index cdfad6279b4a3..c1524341d0e40 100644 --- a/docs/table-design/data-partitioning/overview.md +++ b/docs/table-design/data-partitioning/overview.md @@ -6,7 +6,7 @@ } --- -Doris organizes a table in two tiers: partitions split rows by column value, and buckets shard each partition for parallelism. This page gives the recommended starting point and shows when to customize. +Doris organizes a table in two tiers: partitions split rows by column value, and buckets split each partition into shards for parallel processing. This page gives the recommended starting point and shows when to customize. ## Recommended Starting Point @@ -51,7 +51,7 @@ Doris maps data in two tiers: Table ──► Partition (by column value) ──► Bucket (hash or random) ──► Tablet (shard on a BE node) ``` -Partitions prune data and enable lifecycle management, such as archiving or dropping by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts). +Partitions let Doris skip data that can't match a query, and make it easy to archive or drop data by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts). ## Next Steps diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx b/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx index de659b4338ea2..5923953a0ec6c 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx +++ b/versioned_docs/version-4.x/table-design/data-partitioning/basic-concepts.mdx @@ -11,7 +11,7 @@ {/* Knowledge type: Concept introduction / Procedure */} {/* Applicable scenarios: Table design / Data organization and management */} -This page explains how Doris distributes data across partitions, buckets, and tablets, and covers the advanced partition and bucket modes, design recommendations, and operational commands. For the recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview); read this page when you want to understand the underlying model. +This page explains how Doris distributes data across partitions, buckets, and tablets. It also covers the advanced partition and bucket modes, design recommendations, and operational commands. For a recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview), then read this page when you want to understand the underlying model. ## 1. Overview diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md index cdfad6279b4a3..c1524341d0e40 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md +++ b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md @@ -6,7 +6,7 @@ } --- -Doris organizes a table in two tiers: partitions split rows by column value, and buckets shard each partition for parallelism. This page gives the recommended starting point and shows when to customize. +Doris organizes a table in two tiers: partitions split rows by column value, and buckets split each partition into shards for parallel processing. This page gives the recommended starting point and shows when to customize. ## Recommended Starting Point @@ -51,7 +51,7 @@ Doris maps data in two tiers: Table ──► Partition (by column value) ──► Bucket (hash or random) ──► Tablet (shard on a BE node) ``` -Partitions prune data and enable lifecycle management, such as archiving or dropping by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts). +Partitions let Doris skip data that can't match a query, and make it easy to archive or drop data by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts). ## Next Steps From 776fdd7d99924227af249c9a2ab300416d6054b5 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 05:05:53 -0700 Subject: [PATCH 05/10] docs(table-design): clearer 'Change it when' cells on partitioning landing Use conditional 'If .../for ...' phrasing instead of comma-splice clauses. --- docs/table-design/data-partitioning/overview.md | 6 +++--- .../version-4.x/table-design/data-partitioning/overview.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/table-design/data-partitioning/overview.md b/docs/table-design/data-partitioning/overview.md index c1524341d0e40..a3129477b65d8 100644 --- a/docs/table-design/data-partitioning/overview.md +++ b/docs/table-design/data-partitioning/overview.md @@ -39,9 +39,9 @@ Customize only when the default does not fit: | Decision | Recommended default | Change it when | | --- | --- | --- | -| How to partition | [Auto partitioning](./auto-partitioning) by a time column | You need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); you want a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | -| Bucketing method | Hash on a high-cardinality column | Data skews or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | -| Number of buckets | `BUCKETS AUTO` | You know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | +| How to partition | [Auto partitioning](./auto-partitioning) by a time column | If you need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); for a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | +| Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | +| Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | ## How It Works diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md index c1524341d0e40..a3129477b65d8 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md +++ b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md @@ -39,9 +39,9 @@ Customize only when the default does not fit: | Decision | Recommended default | Change it when | | --- | --- | --- | -| How to partition | [Auto partitioning](./auto-partitioning) by a time column | You need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); you want a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | -| Bucketing method | Hash on a high-cardinality column | Data skews or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | -| Number of buckets | `BUCKETS AUTO` | You know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | +| How to partition | [Auto partitioning](./auto-partitioning) by a time column | If you need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); for a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | +| Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | +| Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | ## How It Works From 3b0843ac3204a2ecbfc76a7c2cdf91942065594c Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 05:33:55 -0700 Subject: [PATCH 06/10] docs(table-design): add partition retention (TTL) section to landing Surface the cross-cutting 'expire old data' need on the partitioning landing with a comparison: dynamic partitioning uses dynamic_partition.start (time-based rolling window) and auto RANGE partitioning uses partition.retention_count (count-based). Note the deprecated auto+dynamic combo, and that retention drops data whereas tiered storage migrates it. Links to each mode's page for detail; no standalone page (the two mechanisms differ and are coupled to their modes). EN + zh-CN. --- docs/table-design/data-partitioning/overview.md | 13 +++++++++++++ .../table-design/data-partitioning/overview.md | 13 +++++++++++++ .../table-design/data-partitioning/overview.md | 13 +++++++++++++ .../table-design/data-partitioning/overview.md | 13 +++++++++++++ 4 files changed, 52 insertions(+) diff --git a/docs/table-design/data-partitioning/overview.md b/docs/table-design/data-partitioning/overview.md index a3129477b65d8..0fd58f852b885 100644 --- a/docs/table-design/data-partitioning/overview.md +++ b/docs/table-design/data-partitioning/overview.md @@ -43,6 +43,19 @@ Customize only when the default does not fit: | Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | | Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | +## Expire Old Partitions + +To drop old data automatically, set a retention policy. The mechanism depends on the partition mode: + +| Partition mode | Property | What it keeps | +| --- | --- | --- | +| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | Partitions within a time window relative to now; older ones are dropped on a schedule (time-based) | +| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | The N newest historical partitions; older ones are dropped (count-based) | + +Combining auto and dynamic partitioning for retention is no longer recommended; use `partition.retention_count` for auto-range tables. + +Retention **drops** data. To move cold data to cheaper storage instead of dropping it, use [tiered storage](../tiered-storage/overview) instead. + ## How It Works Doris maps data in two tiers: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md index 55f94742d6d0e..db52abbdc5694 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md @@ -43,6 +43,19 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 10 | 分桶方式 | 按高基数列做 Hash 分桶 | 数据倾斜或需按任意维度过滤时,用 Random 分桶([数据分桶](./data-bucketing)) | | 分桶数量 | `BUCKETS AUTO` | 已知数据量并希望固定控制时,手动设置分桶数([数据分桶](./data-bucketing)) | +## 让旧分区过期 + +如需自动删除旧数据,可设置保留策略。具体机制取决于分区模式: + +| 分区模式 | 属性 | 保留内容 | +| --- | --- | --- | +| [动态分区](./dynamic-partitioning) | `dynamic_partition.start`(例如 `-7`) | 保留相对当前时间某个时间窗口内的分区,更早的分区按调度自动删除(按时间) | +| [自动分区](./auto-partitioning)(RANGE) | `partition.retention_count`(例如 `3`) | 保留最新的 N 个历史分区,更早的分区被删除(按数量) | + +不再推荐将自动分区与动态分区组合用于数据保留;自动 RANGE 分区表请使用 `partition.retention_count`。 + +数据保留是**删除**数据。如果希望将冷数据迁移到更廉价的存储而非删除,请改用[分层存储](../tiered-storage/overview)。 + ## 工作原理 Doris 将数据按两层映射: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md index 55f94742d6d0e..db52abbdc5694 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md @@ -43,6 +43,19 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 10 | 分桶方式 | 按高基数列做 Hash 分桶 | 数据倾斜或需按任意维度过滤时,用 Random 分桶([数据分桶](./data-bucketing)) | | 分桶数量 | `BUCKETS AUTO` | 已知数据量并希望固定控制时,手动设置分桶数([数据分桶](./data-bucketing)) | +## 让旧分区过期 + +如需自动删除旧数据,可设置保留策略。具体机制取决于分区模式: + +| 分区模式 | 属性 | 保留内容 | +| --- | --- | --- | +| [动态分区](./dynamic-partitioning) | `dynamic_partition.start`(例如 `-7`) | 保留相对当前时间某个时间窗口内的分区,更早的分区按调度自动删除(按时间) | +| [自动分区](./auto-partitioning)(RANGE) | `partition.retention_count`(例如 `3`) | 保留最新的 N 个历史分区,更早的分区被删除(按数量) | + +不再推荐将自动分区与动态分区组合用于数据保留;自动 RANGE 分区表请使用 `partition.retention_count`。 + +数据保留是**删除**数据。如果希望将冷数据迁移到更廉价的存储而非删除,请改用[分层存储](../tiered-storage/overview)。 + ## 工作原理 Doris 将数据按两层映射: diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md index a3129477b65d8..0fd58f852b885 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md +++ b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md @@ -43,6 +43,19 @@ Customize only when the default does not fit: | Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | | Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | +## Expire Old Partitions + +To drop old data automatically, set a retention policy. The mechanism depends on the partition mode: + +| Partition mode | Property | What it keeps | +| --- | --- | --- | +| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | Partitions within a time window relative to now; older ones are dropped on a schedule (time-based) | +| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | The N newest historical partitions; older ones are dropped (count-based) | + +Combining auto and dynamic partitioning for retention is no longer recommended; use `partition.retention_count` for auto-range tables. + +Retention **drops** data. To move cold data to cheaper storage instead of dropping it, use [tiered storage](../tiered-storage/overview) instead. + ## How It Works Doris maps data in two tiers: From 8f35894d9db56244aea5fcd6205e193c721edf09 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 05:47:29 -0700 Subject: [PATCH 07/10] docs(table-design): reframe partition retention as window-vs-count Both dynamic and auto retention keep the most recent time-ordered partitions; they differ only in how the limit is expressed: dynamic_partition.start is a time window, partition.retention_count is a partition count. Note they are effectively equivalent for regular time partitions and diverge for irregular granularity or stale data. Corrects the earlier 'time-based vs count-based' overstatement. EN + zh-CN. --- docs/table-design/data-partitioning/overview.md | 10 ++++++---- .../current/table-design/data-partitioning/overview.md | 10 ++++++---- .../table-design/data-partitioning/overview.md | 10 ++++++---- .../table-design/data-partitioning/overview.md | 10 ++++++---- 4 files changed, 24 insertions(+), 16 deletions(-) diff --git a/docs/table-design/data-partitioning/overview.md b/docs/table-design/data-partitioning/overview.md index 0fd58f852b885..e4a724cb1a39f 100644 --- a/docs/table-design/data-partitioning/overview.md +++ b/docs/table-design/data-partitioning/overview.md @@ -45,12 +45,14 @@ Customize only when the default does not fit: ## Expire Old Partitions -To drop old data automatically, set a retention policy. The mechanism depends on the partition mode: +To drop old data automatically, set a retention policy. Both modes keep the most recent partitions and drop older ones; they differ in how you express the limit: -| Partition mode | Property | What it keeps | +| Partition mode | Property | Retention limit | | --- | --- | --- | -| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | Partitions within a time window relative to now; older ones are dropped on a schedule (time-based) | -| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | The N newest historical partitions; older ones are dropped (count-based) | +| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | A time window: keep partitions within the last N time units of now | +| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | A partition count: keep the newest N historical partitions | + +With regular time partitions (such as one per day), the two are effectively equivalent: "last 7 days" matches "newest 7 daily partitions." They diverge when partitions are irregular or data is stale: a time window can drop every partition once the data is older than the window, whereas a count always keeps the newest N. Combining auto and dynamic partitioning for retention is no longer recommended; use `partition.retention_count` for auto-range tables. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md index db52abbdc5694..ec5351fe21357 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md @@ -45,12 +45,14 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 10 ## 让旧分区过期 -如需自动删除旧数据,可设置保留策略。具体机制取决于分区模式: +如需自动删除旧数据,可设置保留策略。两种模式都保留最近的分区、删除更早的分区,区别在于保留上限的表达方式: -| 分区模式 | 属性 | 保留内容 | +| 分区模式 | 属性 | 保留上限 | | --- | --- | --- | -| [动态分区](./dynamic-partitioning) | `dynamic_partition.start`(例如 `-7`) | 保留相对当前时间某个时间窗口内的分区,更早的分区按调度自动删除(按时间) | -| [自动分区](./auto-partitioning)(RANGE) | `partition.retention_count`(例如 `3`) | 保留最新的 N 个历史分区,更早的分区被删除(按数量) | +| [动态分区](./dynamic-partitioning) | `dynamic_partition.start`(例如 `-7`) | 时间窗口:保留相对当前时间最近 N 个时间单位内的分区 | +| [自动分区](./auto-partitioning)(RANGE) | `partition.retention_count`(例如 `3`) | 分区数量:保留最新的 N 个历史分区 | + +对于规则的时间分区(如每天一个),两者基本等价:“最近 7 天”等于“最新的 7 个按天分区”。当分区不规则或数据陈旧时二者会出现差异:一旦数据比时间窗口更旧,按时间窗口可能删除全部分区,而按数量始终保留最新的 N 个。 不再推荐将自动分区与动态分区组合用于数据保留;自动 RANGE 分区表请使用 `partition.retention_count`。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md index db52abbdc5694..ec5351fe21357 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md @@ -45,12 +45,14 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 10 ## 让旧分区过期 -如需自动删除旧数据,可设置保留策略。具体机制取决于分区模式: +如需自动删除旧数据,可设置保留策略。两种模式都保留最近的分区、删除更早的分区,区别在于保留上限的表达方式: -| 分区模式 | 属性 | 保留内容 | +| 分区模式 | 属性 | 保留上限 | | --- | --- | --- | -| [动态分区](./dynamic-partitioning) | `dynamic_partition.start`(例如 `-7`) | 保留相对当前时间某个时间窗口内的分区,更早的分区按调度自动删除(按时间) | -| [自动分区](./auto-partitioning)(RANGE) | `partition.retention_count`(例如 `3`) | 保留最新的 N 个历史分区,更早的分区被删除(按数量) | +| [动态分区](./dynamic-partitioning) | `dynamic_partition.start`(例如 `-7`) | 时间窗口:保留相对当前时间最近 N 个时间单位内的分区 | +| [自动分区](./auto-partitioning)(RANGE) | `partition.retention_count`(例如 `3`) | 分区数量:保留最新的 N 个历史分区 | + +对于规则的时间分区(如每天一个),两者基本等价:“最近 7 天”等于“最新的 7 个按天分区”。当分区不规则或数据陈旧时二者会出现差异:一旦数据比时间窗口更旧,按时间窗口可能删除全部分区,而按数量始终保留最新的 N 个。 不再推荐将自动分区与动态分区组合用于数据保留;自动 RANGE 分区表请使用 `partition.retention_count`。 diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md index 0fd58f852b885..e4a724cb1a39f 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md +++ b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md @@ -45,12 +45,14 @@ Customize only when the default does not fit: ## Expire Old Partitions -To drop old data automatically, set a retention policy. The mechanism depends on the partition mode: +To drop old data automatically, set a retention policy. Both modes keep the most recent partitions and drop older ones; they differ in how you express the limit: -| Partition mode | Property | What it keeps | +| Partition mode | Property | Retention limit | | --- | --- | --- | -| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | Partitions within a time window relative to now; older ones are dropped on a schedule (time-based) | -| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | The N newest historical partitions; older ones are dropped (count-based) | +| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | A time window: keep partitions within the last N time units of now | +| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | A partition count: keep the newest N historical partitions | + +With regular time partitions (such as one per day), the two are effectively equivalent: "last 7 days" matches "newest 7 daily partitions." They diverge when partitions are irregular or data is stale: a time window can drop every partition once the data is older than the window, whereas a count always keeps the newest N. Combining auto and dynamic partitioning for retention is no longer recommended; use `partition.retention_count` for auto-range tables. From 123d88d6ae044674b72da3ab124b025b09de2d38 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 05:58:06 -0700 Subject: [PATCH 08/10] docs(table-design): group manual and dynamic under 'Other Partition Modes' Promote Auto Partitioning as the top-level recommended mode; collapse the two alternatives (manual for explicit control, dynamic which Auto supersedes) under a collapsed 'Other Partition Modes' category, so the partitioning section shows fewer entries by default. Not labeled 'Legacy' because manual partitioning is not deprecated (it is the explicit-control / LIST option). --- sidebars.ts | 10 ++++++++-- versioned_sidebars/version-4.x-sidebars.json | 10 ++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/sidebars.ts b/sidebars.ts index 74e0cd6d659e4..1f6fee572c499 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -211,8 +211,14 @@ const sidebars: SidebarsConfig = { link: {type: 'doc', id: 'table-design/data-partitioning/overview'}, items: [ 'table-design/data-partitioning/auto-partitioning', - 'table-design/data-partitioning/dynamic-partitioning', - 'table-design/data-partitioning/manual-partitioning', + { + type: 'category', + label: 'Other Partition Modes', + items: [ + 'table-design/data-partitioning/manual-partitioning', + 'table-design/data-partitioning/dynamic-partitioning', + ], + }, 'table-design/data-partitioning/data-bucketing', 'table-design/data-partitioning/basic-concepts', 'table-design/data-partitioning/common-issues', diff --git a/versioned_sidebars/version-4.x-sidebars.json b/versioned_sidebars/version-4.x-sidebars.json index 4e2cda3550935..4efb0e3ede770 100644 --- a/versioned_sidebars/version-4.x-sidebars.json +++ b/versioned_sidebars/version-4.x-sidebars.json @@ -250,8 +250,14 @@ }, "items": [ "table-design/data-partitioning/auto-partitioning", - "table-design/data-partitioning/dynamic-partitioning", - "table-design/data-partitioning/manual-partitioning", + { + "type": "category", + "label": "Other Partition Modes", + "items": [ + "table-design/data-partitioning/manual-partitioning", + "table-design/data-partitioning/dynamic-partitioning" + ] + }, "table-design/data-partitioning/data-bucketing", "table-design/data-partitioning/basic-concepts", "table-design/data-partitioning/common-issues" From b511ec289c88140020186498e2a51bd71191bcd3 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 06:00:56 -0700 Subject: [PATCH 09/10] docs(table-design): steer partition choice to auto; mark dynamic superseded In the overview decision table, recommend auto partitioning and route the rolling-window case to it (auto + partition.retention_count), noting dynamic is superseded. Keep manual for the schemes auto cannot express (custom/irregular ranges, numeric-column ranges, grouped LIST values). EN + zh-CN. --- docs/table-design/data-partitioning/overview.md | 2 +- .../current/table-design/data-partitioning/overview.md | 2 +- .../version-4.x/table-design/data-partitioning/overview.md | 2 +- .../version-4.x/table-design/data-partitioning/overview.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/table-design/data-partitioning/overview.md b/docs/table-design/data-partitioning/overview.md index e4a724cb1a39f..5e8d6b8c854bf 100644 --- a/docs/table-design/data-partitioning/overview.md +++ b/docs/table-design/data-partitioning/overview.md @@ -39,7 +39,7 @@ Customize only when the default does not fit: | Decision | Recommended default | Change it when | | --- | --- | --- | -| How to partition | [Auto partitioning](./auto-partitioning) by a time column | If you need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); for a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | +| How to partition | [Auto partitioning](./auto-partitioning) | Use [manual partitioning](./manual-partitioning) for schemes auto cannot express: custom or irregular ranges, ranges on a numeric column, or grouped LIST values. [Dynamic partitioning](./dynamic-partitioning) is superseded by auto. | | Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | | Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md index ec5351fe21357..ec54326a1eaff 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/overview.md @@ -39,7 +39,7 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 10 | 决策项 | 推荐默认 | 何时调整 | | --- | --- | --- | -| 如何分区 | 按时间列[自动分区](./auto-partitioning) | 需要固定或不规则范围时,用[手动分区](./manual-partitioning);需要按时间滚动并保留窗口时,用[动态分区](./dynamic-partitioning) | +| 如何分区 | [自动分区](./auto-partitioning) | 对于自动分区无法表达的方案,使用[手动分区](./manual-partitioning):自定义或不规则范围、数值列范围,或将多个值归入同一分区的 LIST。[动态分区](./dynamic-partitioning)已被自动分区取代。 | | 分桶方式 | 按高基数列做 Hash 分桶 | 数据倾斜或需按任意维度过滤时,用 Random 分桶([数据分桶](./data-bucketing)) | | 分桶数量 | `BUCKETS AUTO` | 已知数据量并希望固定控制时,手动设置分桶数([数据分桶](./data-bucketing)) | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md index ec5351fe21357..ec54326a1eaff 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/overview.md @@ -39,7 +39,7 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 10 | 决策项 | 推荐默认 | 何时调整 | | --- | --- | --- | -| 如何分区 | 按时间列[自动分区](./auto-partitioning) | 需要固定或不规则范围时,用[手动分区](./manual-partitioning);需要按时间滚动并保留窗口时,用[动态分区](./dynamic-partitioning) | +| 如何分区 | [自动分区](./auto-partitioning) | 对于自动分区无法表达的方案,使用[手动分区](./manual-partitioning):自定义或不规则范围、数值列范围,或将多个值归入同一分区的 LIST。[动态分区](./dynamic-partitioning)已被自动分区取代。 | | 分桶方式 | 按高基数列做 Hash 分桶 | 数据倾斜或需按任意维度过滤时,用 Random 分桶([数据分桶](./data-bucketing)) | | 分桶数量 | `BUCKETS AUTO` | 已知数据量并希望固定控制时,手动设置分桶数([数据分桶](./data-bucketing)) | diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md index e4a724cb1a39f..5e8d6b8c854bf 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/overview.md +++ b/versioned_docs/version-4.x/table-design/data-partitioning/overview.md @@ -39,7 +39,7 @@ Customize only when the default does not fit: | Decision | Recommended default | Change it when | | --- | --- | --- | -| How to partition | [Auto partitioning](./auto-partitioning) by a time column | If you need fixed or irregular ranges, use [manual partitioning](./manual-partitioning); for a rolling time window with retention, use [dynamic partitioning](./dynamic-partitioning) | +| How to partition | [Auto partitioning](./auto-partitioning) | Use [manual partitioning](./manual-partitioning) for schemes auto cannot express: custom or irregular ranges, ranges on a numeric column, or grouped LIST values. [Dynamic partitioning](./dynamic-partitioning) is superseded by auto. | | Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) | | Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) | From 37a7a15920c3b7ed5146bacb3d23ee106c2cca73 Mon Sep 17 00:00:00 2001 From: Yongqiang YANG Date: Thu, 18 Jun 2026 06:04:49 -0700 Subject: [PATCH 10/10] docs(table-design): flatten partition sidebar; mark dynamic as legacy Drop the 'Other Partition Modes' grouping (manual is not legacy, so it should not be bucketed away). Keep a flat list ordered auto -> manual -> dynamic, and mark dynamic partitioning as legacy via a sidebar label '(Legacy)' and a strengthened callout pointing to auto as its successor. EN + zh-CN. --- .../data-partitioning/dynamic-partitioning.md | 5 +++-- .../data-partitioning/dynamic-partitioning.md | 5 +++-- .../data-partitioning/dynamic-partitioning.md | 5 +++-- sidebars.ts | 10 ++-------- .../data-partitioning/dynamic-partitioning.md | 5 +++-- versioned_sidebars/version-4.x-sidebars.json | 10 ++-------- 6 files changed, 16 insertions(+), 24 deletions(-) diff --git a/docs/table-design/data-partitioning/dynamic-partitioning.md b/docs/table-design/data-partitioning/dynamic-partitioning.md index daff66ad6d763..440dae1687b85 100644 --- a/docs/table-design/data-partitioning/dynamic-partitioning.md +++ b/docs/table-design/data-partitioning/dynamic-partitioning.md @@ -1,13 +1,14 @@ --- { "title": "Dynamic Partitioning", + "sidebar_label": "Dynamic Partitioning (Legacy)", "language": "en", "description": "Dynamic partitioning rolls partitions forward by creating and dropping them on a schedule, providing partition lifecycle management (TTL) for tables. It applies to scenarios such as logs and time-series data that need automatic cleanup of expired data." } --- -:::info Tip -[Auto Partitioning](./auto-partitioning) is the recommended approach for automatic partition management. It is the successor to dynamic partitioning. +:::info Legacy +Dynamic partitioning is superseded by [auto partitioning](./auto-partitioning), its successor for automatic partition management. Use auto partitioning for new tables; this page is kept for existing dynamic-partition tables. ::: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/dynamic-partitioning.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/dynamic-partitioning.md index 093088b0ed49f..101da7c7a3bba 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/dynamic-partitioning.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/dynamic-partitioning.md @@ -1,13 +1,14 @@ --- { "title": "动态分区", + "sidebar_label": "动态分区(旧版)", "language": "zh-CN", "description": "动态分区按规则滚动创建和删除分区,实现表分区生命周期管理(TTL)。适用于日志、时序数据等需要自动清理过期数据的场景。" } --- -:::info 提示 -更推荐使用[自动分区](./auto-partitioning)实现分区自动管理,它是动态分区的上位替代。 +:::info 旧版 +动态分区已被[自动分区](./auto-partitioning)取代,后者是其在分区自动管理上的上位替代。新表请使用自动分区;本文用于维护已有的动态分区表。 ::: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md index 093088b0ed49f..101da7c7a3bba 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md @@ -1,13 +1,14 @@ --- { "title": "动态分区", + "sidebar_label": "动态分区(旧版)", "language": "zh-CN", "description": "动态分区按规则滚动创建和删除分区,实现表分区生命周期管理(TTL)。适用于日志、时序数据等需要自动清理过期数据的场景。" } --- -:::info 提示 -更推荐使用[自动分区](./auto-partitioning)实现分区自动管理,它是动态分区的上位替代。 +:::info 旧版 +动态分区已被[自动分区](./auto-partitioning)取代,后者是其在分区自动管理上的上位替代。新表请使用自动分区;本文用于维护已有的动态分区表。 ::: diff --git a/sidebars.ts b/sidebars.ts index 1f6fee572c499..864acc50b437c 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -211,14 +211,8 @@ const sidebars: SidebarsConfig = { link: {type: 'doc', id: 'table-design/data-partitioning/overview'}, items: [ 'table-design/data-partitioning/auto-partitioning', - { - type: 'category', - label: 'Other Partition Modes', - items: [ - 'table-design/data-partitioning/manual-partitioning', - 'table-design/data-partitioning/dynamic-partitioning', - ], - }, + 'table-design/data-partitioning/manual-partitioning', + 'table-design/data-partitioning/dynamic-partitioning', 'table-design/data-partitioning/data-bucketing', 'table-design/data-partitioning/basic-concepts', 'table-design/data-partitioning/common-issues', diff --git a/versioned_docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md b/versioned_docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md index daff66ad6d763..440dae1687b85 100644 --- a/versioned_docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md +++ b/versioned_docs/version-4.x/table-design/data-partitioning/dynamic-partitioning.md @@ -1,13 +1,14 @@ --- { "title": "Dynamic Partitioning", + "sidebar_label": "Dynamic Partitioning (Legacy)", "language": "en", "description": "Dynamic partitioning rolls partitions forward by creating and dropping them on a schedule, providing partition lifecycle management (TTL) for tables. It applies to scenarios such as logs and time-series data that need automatic cleanup of expired data." } --- -:::info Tip -[Auto Partitioning](./auto-partitioning) is the recommended approach for automatic partition management. It is the successor to dynamic partitioning. +:::info Legacy +Dynamic partitioning is superseded by [auto partitioning](./auto-partitioning), its successor for automatic partition management. Use auto partitioning for new tables; this page is kept for existing dynamic-partition tables. ::: diff --git a/versioned_sidebars/version-4.x-sidebars.json b/versioned_sidebars/version-4.x-sidebars.json index 4efb0e3ede770..fe81aa4dbb87c 100644 --- a/versioned_sidebars/version-4.x-sidebars.json +++ b/versioned_sidebars/version-4.x-sidebars.json @@ -250,14 +250,8 @@ }, "items": [ "table-design/data-partitioning/auto-partitioning", - { - "type": "category", - "label": "Other Partition Modes", - "items": [ - "table-design/data-partitioning/manual-partitioning", - "table-design/data-partitioning/dynamic-partitioning" - ] - }, + "table-design/data-partitioning/manual-partitioning", + "table-design/data-partitioning/dynamic-partitioning", "table-design/data-partitioning/data-bucketing", "table-design/data-partitioning/basic-concepts", "table-design/data-partitioning/common-issues"