Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 6 additions & 146 deletions docs/table-design/data-partitioning/basic-concepts.mdx
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
---
{
"title": "Basic Concepts",
"title": "How Partitioning and Bucketing Work",
"sidebar_label": "How It Works",
"language": "en",
"description": "A progressive introduction to Doris partitioning and bucketing: from core concepts and the first CREATE TABLE example to auto/dynamic partitioning, auto-bucketing, Colocate, and other advanced capabilities, along with design recommendations and operational guidance for partitions and buckets."
"description": "The data-distribution model behind Doris partitioning and bucketing: partitions, buckets, tablets, and nodes, plus advanced partition and bucket modes, design recommendations, and operational guidance."
}
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

{/* Knowledge type: Concept introduction / Procedure */}
{/* Applicable scenarios: Table design / Data organization and management */}

This document introduces the partitioning (Partition) and bucketing (Bucket) mechanisms of Doris, helping you design table structures reasonably to improve query performance and data management efficiency. New users are recommended to read the sections in order: Sections 1-3 cover core concepts and the first CREATE TABLE example, Sections 4-6 cover advanced features and design recommendations, and Section 7 covers the methods for viewing and modifying partitions needed for daily operations.
This page explains how Doris distributes data across partitions, buckets, and tablets. It also covers the advanced partition and bucket modes, design recommendations, and operational commands. For a recommended starting configuration and a decision guide, start with [Partitioning and Bucketing](./overview), then read this page when you want to understand the underlying model.

## 1. Overview

Expand Down Expand Up @@ -148,152 +147,13 @@ Besides manually declaring partitions at table creation time, Doris also support
| Dynamic partition | Automatically created/recycled by the system based on time scheduling rules | Time-series data, where you want to automatically maintain rolling partitions for the past N days/weeks/months |
| Auto partition | Created on demand when data is written | Partition values are unpredictable (such as multi-tenant or sparse time), where pre-creation should be avoided |

The following shows CREATE TABLE examples for common combinations:

<Tabs>
<TabItem value="auto-partition" label="Auto Partition" default>

[Auto Partition](./auto-partitioning) supports automatically creating corresponding partitions according to user-defined rules during data ingestion, making it more convenient to use. The basic example rewritten as Auto Range partition:

```sql
CREATE TABLE example_range_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`date` DATE NOT NULL COMMENT "Data ingestion date",
`timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp",
`city` VARCHAR(20) COMMENT "User's city",
`age` SMALLINT COMMENT "User age",
`sex` TINYINT COMMENT "User gender",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time",
`cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
)
AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Use month as the partition granularity
()
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "1"
);
```

With this CREATE TABLE statement, when data is loaded, Doris automatically creates corresponding partitions for the `date` column at the month level. For example, `2018-12-01` and `2018-12-31` fall into the same partition, while `2018-11-12` falls into another partition. Auto Partition also supports List partitioning. For more usage, see the Auto Partition documentation.

</TabItem>

<TabItem value="dynamic-partition" label="Dynamic Partition">

[Dynamic Partition](./dynamic-partitioning) is a management approach that automatically creates and recycles partitions based on real time. The basic example rewritten as Dynamic Partition:

```sql
CREATE TABLE example_range_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`date` DATE NOT NULL COMMENT "Data ingestion date",
`timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp",
`city` VARCHAR(20) COMMENT "User's city",
`age` SMALLINT COMMENT "User age",
`sex` TINYINT COMMENT "User gender",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time",
`cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
)
PARTITION BY RANGE(`date`)
()
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "1",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "WEEK", --- Partition granularity is week
"dynamic_partition.start" = "-2", --- Retain the past two weeks
"dynamic_partition.end" = "2", --- Pre-create the next two weeks
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "8"
);
```

Dynamic Partition supports tiered storage, custom replica counts, and more. See the Dynamic Partition documentation for details.

</TabItem>

<TabItem value="auto-and-dynamic-partition" label="Auto Partition + Dynamic Partition">

Auto Partition and Dynamic Partition each have their own advantages. Combining the two enables flexible on-demand creation and automatic recycling of partitions:

```sql
CREATE TABLE example_range_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`date` DATE NOT NULL COMMENT "Data ingestion date",
`timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp",
`city` VARCHAR(20) COMMENT "User's city",
`age` SMALLINT COMMENT "User age",
`sex` TINYINT COMMENT "User gender",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time",
`cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
)
AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Use month as the partition granularity
()
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "1",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "month", --- The two granularities must be the same
"dynamic_partition.start" = "-2", --- Dynamic Partition automatically cleans up historical partitions older than two weeks
"dynamic_partition.end" = "0", --- Dynamic Partition does not create future partitions; this is fully delegated to Auto Partition
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "8"
);
```

For details about this feature, see [Using Auto Partition with Dynamic Partition](./auto-partitioning#lifecycle-management).

</TabItem>

</Tabs>
For ready-to-use CREATE TABLE examples of each mode, including combining auto with dynamic partitioning, see [Auto Partitioning](./auto-partitioning), [Dynamic Partitioning](./dynamic-partitioning), and [Manual Partitioning](./manual-partitioning).

## 5. Advanced: Bucketing

### 5.1 Auto Bucketing

When you are not sure about a reasonable number of buckets, you can use Auto Bucketing to let Doris perform the estimation. You only need to provide the estimated table data size:

```sql
CREATE TABLE IF NOT EXISTS example_range_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`date` DATE NOT NULL COMMENT "Data ingestion date",
`timestamp` DATETIME NOT NULL COMMENT "Data ingestion timestamp",
`city` VARCHAR(20) COMMENT "User's city",
`age` SMALLINT COMMENT "User age",
`sex` TINYINT COMMENT "User gender",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User's last visit time",
`cost` BIGINT SUM DEFAULT "0" COMMENT "Total user spending",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
)
PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO
PROPERTIES
(
"replication_num" = "1",
"estimate_partition_size" = "2G" --- Estimated data volume for one partition; defaults to 10G if not provided
);
```

Note that this approach is not suitable for scenarios with extremely large table data volumes.
When you are unsure how many buckets to use, set `BUCKETS AUTO` and let Doris size them from an estimated data volume (`estimate_partition_size`). This is not suitable for extremely large tables. For details, see [Data Bucketing](./data-bucketing).

### 5.2 Colocate

Expand Down
5 changes: 3 additions & 2 deletions docs/table-design/data-partitioning/dynamic-partitioning.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
---
{
"title": "Dynamic Partitioning",
"sidebar_label": "Dynamic Partitioning (Legacy)",
"language": "en",
"description": "Dynamic partitioning rolls partitions forward by creating and dropping them on a schedule, providing partition lifecycle management (TTL) for tables. It applies to scenarios such as logs and time-series data that need automatic cleanup of expired data."
}
---

:::info Tip
[Auto Partitioning](./auto-partitioning) is the recommended approach for automatic partition management. It is the successor to dynamic partitioning.
:::info Legacy
Dynamic partitioning is superseded by [auto partitioning](./auto-partitioning), its successor for automatic partition management. Use auto partitioning for new tables; this page is kept for existing dynamic-partition tables.
:::

<!-- Knowledge type: Feature introduction + Operating procedure + Configuration parameters -->
Expand Down
78 changes: 78 additions & 0 deletions docs/table-design/data-partitioning/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
{
"title": "Partitioning and Bucketing",
"language": "en",
"description": "The recommended partitioning and bucketing for a Doris table, and when to customize: auto, dynamic, and manual partitioning, bucketing method, and bucket count."
}
---

Doris organizes a table in two tiers: partitions split rows by column value, and buckets split each partition into shards for parallel processing. This page gives the recommended starting point and shows when to customize.

## Recommended Starting Point

For most tables, partition by time and let Doris manage partition creation and bucket sizing automatically:

```sql
CREATE TABLE sales (
sale_time DATETIME NOT NULL,
order_id BIGINT NOT NULL,
amount DECIMAL(10, 2)
)
DUPLICATE KEY(sale_time, order_id)
AUTO PARTITION BY RANGE (date_trunc(sale_time, 'day')) ()
DISTRIBUTED BY HASH(order_id) BUCKETS AUTO;
```

- **Auto partitioning** creates a partition as data arrives, so you never pre-define or backfill partition ranges.
- **`BUCKETS AUTO`** lets Doris size the number of shards from the data.
- Partition pruning on `sale_time` and parallel scans across buckets keep queries fast.

If the table has no time column or stays small (under about 1 GB), use a single partition with a fixed bucket count:

```sql
DISTRIBUTED BY HASH(order_id) BUCKETS 10
```

## Choose Your Design

Customize only when the default does not fit:

| Decision | Recommended default | Change it when |
| --- | --- | --- |
| How to partition | [Auto partitioning](./auto-partitioning) | Use [manual partitioning](./manual-partitioning) for schemes auto cannot express: custom or irregular ranges, ranges on a numeric column, or grouped LIST values. [Dynamic partitioning](./dynamic-partitioning) is superseded by auto. |
| Bucketing method | Hash on a high-cardinality column | If data skews, or you filter on arbitrary dimensions, use random bucketing ([Data Bucketing](./data-bucketing)) |
| Number of buckets | `BUCKETS AUTO` | If you know your data size and want fixed control, set a count ([Data Bucketing](./data-bucketing)) |

## Expire Old Partitions

To drop old data automatically, set a retention policy. Both modes keep the most recent partitions and drop older ones; they differ in how you express the limit:

| Partition mode | Property | Retention limit |
| --- | --- | --- |
| [Dynamic partitioning](./dynamic-partitioning) | `dynamic_partition.start` (for example, `-7`) | A time window: keep partitions within the last N time units of now |
| [Auto partitioning](./auto-partitioning) (RANGE) | `partition.retention_count` (for example, `3`) | A partition count: keep the newest N historical partitions |

With regular time partitions (such as one per day), the two are effectively equivalent: "last 7 days" matches "newest 7 daily partitions." They diverge when partitions are irregular or data is stale: a time window can drop every partition once the data is older than the window, whereas a count always keeps the newest N.

Combining auto and dynamic partitioning for retention is no longer recommended; use `partition.retention_count` for auto-range tables.

Retention **drops** data. To move cold data to cheaper storage instead of dropping it, use [tiered storage](../tiered-storage/overview) instead.

## How It Works

Doris maps data in two tiers:

```text
Table ──► Partition (by column value) ──► Bucket (hash or random) ──► Tablet (shard on a BE node)
```

Partitions let Doris skip data that can't match a query, and make it easy to archive or drop data by time. Buckets spread each partition across tablets for parallel reads and writes. For the full data-distribution model, including tablets, replicas, and how they map to nodes, see [How Partitioning and Bucketing Work](./basic-concepts).

## Next Steps

- [Auto Partitioning](./auto-partitioning): the default, with no manual range maintenance.
- [Dynamic Partitioning](./dynamic-partitioning): rolling time windows with retention.
- [Manual Partitioning](./manual-partitioning): explicit ranges and list partitions.
- [Data Bucketing](./data-bucketing): choose the method, key, and bucket count.
- [How Partitioning and Bucketing Work](./basic-concepts): the underlying data-distribution model.
- [Common Issues](./common-issues): troubleshooting partition and bucket design.
Loading
Loading