Skip to content

Commit b651fb2

Browse files
committed
Docs: Auto generate configuration docs
1 parent 681e532 commit b651fb2

File tree

23 files changed

+215
-187
lines changed

23 files changed

+215
-187
lines changed

clickhouse-core/src/main/scala/xenon/clickhouse/Utils.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ object Utils extends Logging {
4747

4848
def classpathResourceAsStream(name: String): InputStream = defaultClassLoader.getResourceAsStream(name)
4949

50+
def getCodeSourceLocation(clazz: Class[_]): String = {
51+
new File(clazz.getProtectionDomain.getCodeSource.getLocation.toURI).getPath
52+
}
53+
5054
@transient lazy val tmpDirPath: Path = Files.createTempDirectory("classpath_res_")
5155

5256
def copyFileFromClasspath(name: String): File = {

docs/best_practices/01_deployment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---

docs/best_practices/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---

docs/configurations/01_catalog_configurations.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,18 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---
1515

16-
Catalog Configurations
17-
===
18-
19-
## Single Instance
16+
<!--begin-include-->
17+
### Single Instance
2018

2119
Suppose you have one ClickHouse instance which installed on `10.0.0.1` and exposes HTTP on `8123`.
2220

@@ -34,7 +32,7 @@ spark.sql.catalog.clickhouse.database default
3432

3533
Then you can access ClickHouse table `<ck_db>.<ck_table>` from Spark SQL by using `clickhouse.<ck_db>.<ck_table>`.
3634

37-
## Cluster
35+
### Cluster
3836

3937
For ClickHouse cluster, give an unique catalog name for each instances.
4038

@@ -63,3 +61,4 @@ spark.sql.catalog.clickhouse2.database default
6361

6462
Then you can access clickhouse1 table `<ck_db>.<ck_table>` from Spark SQL by `clickhouse1.<ck_db>.<ck_table>`,
6563
and access clickhouse2 table `<ck_db>.<ck_table>` by `clickhouse2.<ck_db>.<ck_table>`.
64+
<!--end-include-->

docs/configurations/02_sql_configurations.md

Lines changed: 24 additions & 132 deletions
Original file line numberDiff line numberDiff line change
@@ -3,143 +3,35 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---
1515

16-
SQL Configurations
17-
===
18-
19-
!!! tip "Since 0.1.0 - spark.clickhouse.write.batchSize"
20-
21-
Default Value: 10000
22-
23-
Description: The number of records per batch on writing to ClickHouse.
24-
25-
!!! tip "Since 0.1.0 - spark.clickhouse.write.maxRetry"
26-
27-
Default Value: 3
28-
29-
Description: The maximum number of write we will retry for a single batch write failed with retryable codes.
30-
31-
!!! tip "Since 0.1.0 - spark.clickhouse.write.retryInterval"
32-
33-
Default Value: 10
34-
35-
Description: The interval in seconds between write retry.
36-
37-
!!! tip "Since 0.1.0 - spark.clickhouse.write.retryableErrorCodes"
38-
39-
Default Value: 241
40-
41-
Description: The retryable error codes returned by ClickHouse server when write failing.
42-
43-
!!! tip "Since 0.1.0 - spark.clickhouse.write.repartitionNum"
44-
45-
Default Value: 0
46-
47-
Description: Repartition data to meet the distributions of ClickHouse table is required before writing, use this
48-
conf to specific the repartition number, value less than 1 mean no requirement.
49-
50-
!!! tip "Since 0.3.0 - spark.clickhouse.write.repartitionByPartition"
51-
52-
Default Value: true
53-
54-
Description: Whether to repartition data by ClickHouse partition keys to meet the distributions of ClickHouse table
55-
before writing.
56-
57-
!!! tip "Since 0.3.0 - spark.clickhouse.write.repartitionStrictly"
58-
59-
Default Value: false
60-
61-
Description: If true, Spark will strictly distribute incoming records across partitions to satisfy
62-
the required distribution before passing the records to the data source table on write.
63-
Otherwise, Spark may apply certain optimizations to speed up the query but break the
64-
distribution requirement. Note, this configuration requires SPARK-37523, w/o this patch,
65-
it always act as `true`.
66-
67-
!!! tip "Since 0.1.0 - spark.clickhouse.write.distributed.useClusterNodes"
68-
69-
Default Value: true
70-
71-
Description: Write to all nodes of cluster when writing Distributed table.
72-
73-
!!! tip "Since 0.1.0 - spark.clickhouse.read.distributed.useClusterNodes"
74-
75-
Default Value: false
76-
77-
Description: Read from all nodes of cluster when reading Distributed table.
78-
79-
!!! tip "Since 0.1.0 - spark.clickhouse.write.distributed.convertLocal"
80-
81-
Default Value: false
82-
83-
Description: When writing Distributed table, write local table instead of itself. If `true`, ignore
84-
`write.distributed.useClusterNodes`.
85-
86-
!!! tip "Since 0.1.0 - spark.clickhouse.read.distributed.convertLocal"
87-
88-
Default Value: true
89-
90-
Description: When reading Distributed table, read local table instead of itself. If `true`, ignore
91-
`read.distributed.useClusterNodes`.
92-
93-
!!! tip "Since 0.4.0 - spark.clickhouse.read.splitByPartitionId"
94-
95-
Default Value: true
96-
97-
Description: If `true`, construct input partition filter by virtual column `_partition_id`,
98-
instead of partition value. There are known bugs to assemble SQL predication by
99-
partition value. This feature requires ClickHouse Server v21.6+.
100-
101-
!!! tip "Since 0.3.0 - spark.clickhouse.write.localSortByPartition"
102-
103-
Default Value: `spark.clickhouse.write.repartitionByPartition`
104-
105-
Description: If `true`, do local sort by partition before writing.
106-
107-
!!! tip "Since 0.3.0 - spark.clickhouse.write.localSortByKey"
108-
109-
Default Value: true
110-
111-
Description: If `true`, do local sort by sort keys before writing.
112-
113-
!!! tip "Since 0.4.0 - spark.clickhouse.ignoreUnsupportedTransform"
114-
115-
Default Value: false
116-
117-
Description: ClickHouse supports using complex expressions as sharding keys or partition values,
118-
e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`,
119-
ignore the unsupported expressions, otherwise fail fast w/ an exception. Note: when
120-
`spark.clickhouse.write.distributed.convertLocal` is enabled, ignore unsupported sharding keys
121-
may corrupt the data.
122-
123-
!!! tip "Since 0.5.0 - spark.clickhouse.read.compression.codec"
124-
125-
Default Value: lz4
126-
127-
Description: The codec used to decompress data for reading. Supported codecs: none, lz4.
128-
129-
!!! tip "Since 0.3.0 - spark.clickhouse.write.compression.codec"
130-
131-
Default Value: lz4
132-
133-
Description: The codec used to compress data for writing. Supported codecs: none, lz4.
134-
135-
!!! tip "Since 0.6.0 - spark.clickhouse.read.format"
136-
137-
Default Value: json
138-
139-
Description: Serialize format for reading. Supported formats: json, binary.
140-
141-
!!! tip "Since 0.4.0 - spark.clickhouse.write.format"
142-
143-
Default Value: arrow
144-
145-
Description: Serialize format for writing. Supported formats: json, arrow.
16+
<!--begin-include-->
17+
|Key | Default | Description | Since
18+
|--- | ------- | ----------- | -----
19+
spark.clickhouse.ignoreUnsupportedTransform|false|ClickHouse supports using complex expressions as sharding keys or partition values, e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`, ignore the unsupported expressions, otherwise fail fast w/ an exception. Note: when `spark.clickhouse.write.distributed.convertLocal` is enabled, ignore unsupported sharding keys may corrupt the data.|0.4.0
20+
spark.clickhouse.read.compression.codec|lz4|The codec used to decompress data for reading. Supported codecs: none, lz4.|0.5.0
21+
spark.clickhouse.read.distributed.convertLocal|true|When reading Distributed table, read local table instead of itself. If `true`, ignore `spark.clickhouse.read.distributed.useClusterNodes`.|0.1.0
22+
spark.clickhouse.read.format|json|Serialize format for reading. Supported formats: json, binary|0.6.0
23+
spark.clickhouse.read.splitByPartitionId|true|If `true`, construct input partition filter by virtual column `_partition_id`, instead of partition value. There are known bugs to assemble SQL predication by partition value. This feature requires ClickHouse Server v21.6+|0.4.0
24+
spark.clickhouse.write.batchSize|10000|The number of records per batch on writing to ClickHouse.|0.1.0
25+
spark.clickhouse.write.compression.codec|lz4|The codec used to compress data for writing. Supported codecs: none, lz4.|0.3.0
26+
spark.clickhouse.write.distributed.convertLocal|false|When writing Distributed table, write local table instead of itself. If `true`, ignore `spark.clickhouse.write.distributed.useClusterNodes`.|0.1.0
27+
spark.clickhouse.write.distributed.useClusterNodes|true|Write to all nodes of cluster when writing Distributed table.|0.1.0
28+
spark.clickhouse.write.format|arrow|Serialize format for writing. Supported formats: json, arrow|0.4.0
29+
spark.clickhouse.write.localSortByKey|true|If `true`, do local sort by sort keys before writing.|0.3.0
30+
spark.clickhouse.write.localSortByPartition|<value of spark.clickhouse.write.repartitionByPartition>|If `true`, do local sort by partition before writing. If not set, it equals to `spark.clickhouse.write.repartitionByPartition`.|0.3.0
31+
spark.clickhouse.write.maxRetry|3|The maximum number of write we will retry for a single batch write failed with retryable codes.|0.1.0
32+
spark.clickhouse.write.repartitionByPartition|true|Whether to repartition data by ClickHouse partition keys to meet the distributions of ClickHouse table before writing.|0.3.0
33+
spark.clickhouse.write.repartitionNum|0|Repartition data to meet the distributions of ClickHouse table is required before writing, use this conf to specific the repartition number, value less than 1 mean no requirement.|0.1.0
34+
spark.clickhouse.write.repartitionStrictly|false|If `true`, Spark will strictly distribute incoming records across partitions to satisfy the required distribution before passing the records to the data source table on write. Otherwise, Spark may apply certain optimizations to speed up the query but break the distribution requirement. Note, this configuration requires SPARK-37523, w/o this patch, it always act as `true`.|0.3.0
35+
spark.clickhouse.write.retryInterval|10s|The interval in seconds between write retry.|0.1.0
36+
spark.clickhouse.write.retryableErrorCodes|241|The retryable error codes returned by ClickHouse server when write failing.|0.1.0
37+
<!--end-include-->

docs/configurations/index.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,37 @@
11
---
2+
hide:
3+
- navigation
24
license: |
35
Licensed under the Apache License, Version 2.0 (the "License");
46
you may not use this file except in compliance with the License.
57
You may obtain a copy of the License at
8+
69
https://www.apache.org/licenses/LICENSE-2.0
10+
711
Unless required by applicable law or agreed to in writing, software
812
distributed under the License is distributed on an "AS IS" BASIS,
9-
1013
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1214
See the License for the specific language governing permissions and
1315
limitations under the License.
1416
---
1517

1618
Configurations
1719
===
1820

19-
## TODO
21+
## Catalog Configurations
22+
23+
{!
24+
include-markdown "./01_catalog_configurations.md"
25+
start="<!--begin-include-->"
26+
end="<!--end-include-->"
27+
!}
2028

21-
## Overwrite SQL Configurations
29+
## SQL Configurations
2230

23-
Your can overwrite [ClickHouse SQL Configurations](./02_sql_configurations.md) by editing
24-
`$SPARK_HOME/conf/spark-defaults.conf`, e.g.
31+
SQL Configurations could be overwritten by `SET <key>=<value>` in runtime.
2532

26-
```
27-
spark.clickhouse.write.batchSize 10000
28-
spark.clickhouse.write.maxRetry 2
29-
```
33+
{!
34+
include-markdown "./02_sql_configurations.md"
35+
start="<!--begin-include-->"
36+
end="<!--end-include-->"
37+
!}

docs/developers/01_build_and_test.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---

docs/developers/02_docs_and_website.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---

docs/developers/03_private_release.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---

docs/developers/04_public_release.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ license: |
33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.
55
You may obtain a copy of the License at
6+
67
https://www.apache.org/licenses/LICENSE-2.0
8+
79
Unless required by applicable law or agreed to in writing, software
810
distributed under the License is distributed on an "AS IS" BASIS,
9-
1011
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11-
1212
See the License for the specific language governing permissions and
1313
limitations under the License.
1414
---

0 commit comments

Comments
 (0)