CLOUDPREM: Update sizing recommendations (#32702)

guilload · estherk15 · web-flow · commit a2e54bcb0d4f · 2025-11-10T13:17:08.000-05:00
* CLOUDPREM: Update sizing recommendations * Apply suggestions from code review Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * Reformat cluster sizing recommendations (#32703) * Reformat recommendations into tables * Apply suggestions from code review Co-authored-by: Adrien Guillo <adrien.guillo@datadoghq.com> --------- Co-authored-by: Adrien Guillo <adrien.guillo@datadoghq.com> --------- Co-authored-by: Esther Kim <esther.kim@datadoghq.com>
diff --git a/content/en/cloudprem/configure/cluster_sizing.md b/content/en/cloudprem/configure/cluster_sizing.md
@@ -18,41 +18,56 @@ further_reading:
 
 ## Overview
 
-This document gives recommendations on dimensioning your CloudPrem cluster components, particularly indexers and searchers.
+Proper cluster sizing ensures optimal performance, cost efficiency, and reliability for your CloudPrem deployment. Your sizing requirements depend on several factors including log ingestion volume, query patterns, and the complexity of your log data.
 
-<div class="alert alert-info">
-These are starting recommendations. Monitor your cluster's performance and resource utilization closely and adjust sizing as needed.
+This guide provides baseline recommendations for dimensioning your CloudPrem cluster components—indexers, searchers, supporting services, and the PostgreSQL database.
+
+<div class="alert alert-tip">
+Use your expected daily log volume and peak ingestion rates as starting points, then monitor your cluster's performance and adjust sizing as needed.
 </div>
 
 ## Indexers
 
-- **Performance:** To index 5 MB/s of logs, CloudPrem needs approximately 1 vCPU and 2 GB of RAM.
-- **Recommended Pod Sizes:** Datadog recommends that you deploy indexer pods with either:
-  - 2 vCPUs and 4 GB of RAM
-  - 4 vCPUs and 8 GB of RAM
-  - 8 vCPUs and 16 GB of RAM
-- **Storage:** Indexers require persistent storage (preferably SSDs, but local HDDs or remote EBS volumes can also be used) to store temporary data while constructing the index files.
-  - Minimum: 100 GB per pod
-  - Recommendation (for pods > 4 vCPUs): 200 GB per pod
-- **Example Calculation:** To index 1 TB per day (~11.6 MB/s):
-  - Required vCPUs: `(11.6 MB/s / 5 MB/s/vCPU) ≈ 2.3 vCPUs`
-  - Rounding up, you might start with one indexer pod configured with 3 vCPUs and 6 GB RAM, requiring a 100 GB EBS volume. (Adjust this configuration based on observed performance and redundancy needs.)
+Indexers receive logs from Datadog Agents, then process, index, and store them as index files (called _splits_) in object storage. Proper sizing is critical for maintaining ingestion throughput and ensuring your cluster can handle your log volume.
+
+| Specification | Recommendation | Notes |
+|---------------|----------------|-------|
+| **Performance** | 5 MB/s per vCPU | Baseline throughput to determine initial sizing. Actual performance depends on log characteristics (size, number of attributes, nesting level) |
+| **Memory** | 4 GB RAM per vCPU | |
+| **Minimum Pod Size** | 2 vCPUs, 8 GB RAM | Recommended minimum for indexer pods |
+| **Storage Capacity** | At least 200 GB | Required for temporary data while creating and merging index files |
+| **Storage Type** | Local SSDs (preferred) | Local HDDs or network-attached block storage (Amazon EBS, Azure Managed Disks) can also be used |
+| **Disk I/O** | ~20 MB/s per vCPU | Equivalent to 320 IOPS per vCPU for Amazon EBS (assuming 64 KB IOPS) |
+
+
+{{% collapse-content title="Example: Sizing for 1 TB of logs per day" level="h4" expanded=false %}}
+To index 1 TB of logs per day (~11.6 MB/s), follow these steps:
+
+1. **Calculate vCPUs:** `11.6 MB/s ÷ 5 MB/s per vCPU ≈ 2.3 vCPUs`
+2. **Calculate RAM:** `2.3 vCPUs × 4 GB RAM ≈ 9 GB RAM`
+3. **Add headroom:** Start with one indexer pod configured with **3 vCPUs, 12 GB RAM, and a 200 GB disk**. Adjust these values based on observed performance and redundancy needs.
+{{% /collapse-content %}}
 
 ## Searchers
 
-- **Performance:** Search performance depends heavily on the workload (query complexity, concurrency, data scanned).
-- **Rule of Thumb:** A general starting point is to provision roughly double the total number of vCPUs allocated to Indexers.
-- **Memory:** We recommend 4 GB of RAM per searcher vCPU. Provision more RAM if you expect many concurrent aggregation requests.
+Searchers handle search queries from the Datadog UI, reading metadata from the Metastore and fetching data from object storage.
+
+A general starting point is to provision roughly double the total number of vCPUs allocated to Indexers.
+
+- **Performance:** Search performance depends heavily on the workload (query complexity, concurrency, amount of data scanned). For instance, term queries (`status:error AND message:exception`) are usually computationally less expensive than aggregations.
+- **Memory:** 4 GB of RAM per searcher vCPU. Provision more RAM if you expect many concurrent aggregation requests.
 
 ## Other services
 
-The following components are typically lightweight:
+Allocate the following resources for these lightweight components:
 
-- **Control Plane:** 1 vCPU, 2 GB RAM
-- **Metastore:** 1 vCPU, 2 GB RAM
-- **Janitor:** 1 vCPU, 2 GB RAM
+| Service | vCPUs | RAM | Replicas |
+|---------|-------|-----|----------|
+| **Control Plane** | 2 | 4 GB | 1 |
+| **Metastore** | 2 | 4 GB | 2 |
+| **Janitor** | 2 | 4 GB | 1 |
 
-## Postgres Metastore backend
+## PostgreSQL database
 
 - **Instance Size:** For most use cases, a PostgreSQL instance with 1 vCPU and 4 GB of RAM is sufficient
 - **AWS RDS Recommendation:** If using AWS RDS, the `t4g.medium` instance type is a suitable starting point