Skip to content

Commit 5db8dc8

Browse files
dhtclkBlargian
andauthored
Apply suggestions from code review
Co-authored-by: Shaun Struwig <41984034+Blargian@users.noreply.github.com>
1 parent b4723d2 commit 5db8dc8

File tree

1 file changed

+20
-20
lines changed

1 file changed

+20
-20
lines changed

docs/cloud/guides/production-readiness.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -35,47 +35,47 @@ Your responsibilities for enterprise production readiness:
3535
- Establish backup validation and disaster recovery procedures
3636
- Configure cost management and billing integration
3737

38-
This guide walks through each area, helping you transition from a working ClickHouse Cloud deployment to an enterprise-ready system.
38+
This guide walks you through each area, helping you transition from a working ClickHouse Cloud deployment to an enterprise-ready system.
3939

40-
## Environment Strategy {#environment-strategy}
40+
## Environment strategy {#environment-strategy}
4141

4242
Establish separate environments to safely test changes before impacting production workloads. Most production incidents trace back to untested queries or configuration changes deployed directly to production systems.
4343

44-
**Environment Structure**: Maintain production (live workloads), staging (production-equivalent validation), and development (individual/team experimentation) environments.
44+
**Environment structure**: Maintain production (live workloads), staging (production-equivalent validation), and development (individual/team experimentation) environments.
4545

4646
**Testing**: Test queries in staging before production deployment. Queries that work on small datasets often cause memory exhaustion, excessive CPU usage, or slow execution at production scale. Validate configuration changes including user permissions, quotas, and service settings in staging—configuration errors discovered in production create immediate operational incidents.
4747

4848
**Sizing**: Size your staging service to approximate production load characteristics. Testing on significantly smaller infrastructure may not reveal resource contention or scaling issues. Use production-representative datasets through periodic data refreshes or synthetic data generation.
4949

50-
## Enterprise Authentication and User Management {#enterprise-authentication}
50+
## Enterprise authentication and user management {#enterprise-authentication}
5151

5252
Moving from console-based user management to enterprise authentication integration is essential for production readiness.
5353

54-
### SSO/SAML Setup {#sso-saml-setup}
54+
### SSO/SAML setup {#sso-saml-setup}
5555

5656
Enterprise tier ClickHouse Cloud supports SAML integration with identity providers including Okta, Azure Active Directory, and Google Workspace. SAML configuration requires coordination with ClickHouse support and involves providing your IdP metadata and configuring attribute mappings.
5757

5858
:::note Important limitation
5959
Users authenticated through SAML are assigned the "Member" role by default and must be manually granted additional roles by an admin after their first login. Group-to-role mapping and automatic role assignment are not currently supported.
6060
:::
6161

62-
### Access Control Design {#access-control-design}
62+
### Access control design {#access-control-design}
6363

6464
ClickHouse Cloud uses organization-level roles (Admin, Developer, Billing, Member) and service/database-level roles (Service Admin, Read Only, SQL console roles). Design roles around job functions applying the principle of least privilege:
6565

66-
- **Application Users**: Service accounts with specific database and table access
67-
- **Analyst Users**: Read-only access to curated datasets and reporting views
68-
- **Admin Users**: Full administrative capabilities
66+
- **Application users**: Service accounts with specific database and table access
67+
- **Analyst users**: Read-only access to curated datasets and reporting views
68+
- **Admin users**: Full administrative capabilities
6969

7070
Configure quotas, limits, and settings profiles to manage resource usage for different users and roles. Set memory and execution time limits to prevent individual queries from impacting system performance. Monitor resource usage through audit, session, and query logs to identify users or applications that frequently hit limits. Conduct regular access reviews using ClickHouse Cloud's audit capabilities.
7171

72-
### User Lifecycle Management Limitations {#user-lifecycle-management}
72+
### User lifecycle management limitations {#user-lifecycle-management}
7373

7474
ClickHouse Cloud does not currently support SCIM or automated provisioning/deprovisioning via identity providers. Users must be manually removed from the ClickHouse Cloud console after being removed from your IdP. Plan for manual user management processes until these features become available.
7575

7676
Learn more about [Cloud Access Management](/cloud/security/cloud_access_management) and [SAML SSO setup](/cloud/security/saml-setup).
7777

78-
## Infrastructure as Code and Automation {#infrastructure-as-code}
78+
## Infrastructure as code and automation {#infrastructure-as-code}
7979

8080
Managing ClickHouse Cloud through infrastructure-as-code practices and API automation provides consistency, version control, and repeatability for your deployment configuration.
8181

@@ -103,9 +103,9 @@ provider "clickhouse" {
103103

104104
The Terraform provider supports service provisioning, IP access lists, and user management. Note that the provider does not currently support importing existing services or explicit backup configuration. For features not covered by the provider, manage them through the console or contact ClickHouse support.
105105

106-
For comprehensive examples including service configuration and network access controls, see [Terraform example on how to use Cloud API](https://clickhouse.com/docs/knowledgebase/terraform_example).
106+
For comprehensive examples including service configuration and network access controls, see [Terraform example on how to use Cloud API](/knowledgebase/terraform_example).
107107

108-
### Cloud API Integration {#cloud-api-integration}
108+
### Cloud API integration {#cloud-api-integration}
109109

110110
Organizations with existing automation frameworks can integrate ClickHouse Cloud management directly through the Cloud API. The API provides programmatic access to service lifecycle management, user administration, backup operations, and monitoring data retrieval.
111111

@@ -117,19 +117,19 @@ Common API integration patterns:
117117

118118
API authentication uses the same token-based approach as Terraform. For complete API reference and integration examples, see [ClickHouse Cloud API](/cloud/manage/api/api-overview) documentation.
119119

120-
## Monitoring and Operational Integration {#monitoring-integration}
120+
## Monitoring and operational integration {#monitoring-integration}
121121

122122
Connecting ClickHouse Cloud to your existing monitoring infrastructure ensures visibility and proactive issue detection.
123123

124-
### Built-in Monitoring {#built-in-monitoring}
124+
### Built-in monitoring {#built-in-monitoring}
125125

126126
ClickHouse Cloud provides an advanced dashboard with real-time metrics including queries per second, memory usage, CPU usage, and storage rates. Access via Cloud console under Monitoring → Advanced dashboard. Create custom dashboards tailored to specific workload patterns or team resource consumption.
127127

128128
:::note Common production gaps
129129
Lack of proactive alerting integration with enterprise incident management systems and automated cost monitoring. Built-in dashboards provide visibility but automated alerting requires external integration.
130130
:::
131131

132-
### Production Alerting Setup {#production-alerting}
132+
### Production alerting setup {#production-alerting}
133133

134134
**Built-in Capabilities**: ClickHouse Cloud provides notifications for billing events, scaling events, and service health via email, UI, and Slack. Configure delivery channels and notification severities through the console notification settings.
135135

@@ -147,21 +147,21 @@ scrape_configs:
147147
148148
For comprehensive setup including detailed Prometheus/Grafana configuration and advanced alerting, see the [ClickHouse Cloud Observability Guide](/use-cases/observability/cloud-monitoring#prometheus).
149149
150-
## Business Continuity and Support Integration {#business-continuity}
150+
## Business continuity and support integration {#business-continuity}
151151
152152
Establishing backup validation procedures and support integration ensures your ClickHouse Cloud deployment can recover from incidents and access help when needed.
153153
154-
### Backup Strategy Assessment {#backup-strategy}
154+
### Backup strategy assessment {#backup-strategy}
155155
156156
ClickHouse Cloud provides automatic backups with configurable retention periods. Assess your current backup configuration against compliance and recovery requirements. Enterprise customers with specific compliance requirements around backup location or encryption can configure ClickHouse Cloud to store backups in their own cloud storage buckets (BYOB). Contact ClickHouse support for BYOB configuration.
157157
158-
### Validate and Test Recovery Procedures {#validate-test-recovery}
158+
### Validate and test recovery procedures {#validate-test-recovery}
159159
160160
Most organizations discover backup gaps during actual recovery scenarios. Establish regular validation cycles to verify backup integrity and test recovery procedures before incidents occur. Schedule periodic test restorations to non-production environments, document step-by-step recovery procedures including time estimates, verify restored data completeness and application functionality, and test recovery procedures with different failure scenarios (service deletion, data corruption, regional outages). Maintain updated recovery runbooks accessible to on-call teams.
161161
162162
Test backup restoration at least quarterly for critical production services. Organizations with strict compliance requirements may need monthly or even weekly validation cycles.
163163
164-
### Disaster Recovery Planning {#disaster-recovery-planning}
164+
### Disaster recovery planning {#disaster-recovery-planning}
165165
166166
Document your recovery time objectives (RTO) and recovery point objectives (RPO) to validate that your current backup configuration meets business requirements. Establish regular testing schedules for backup restoration and maintain updated recovery documentation.
167167

0 commit comments

Comments
 (0)