You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description: If `true`, do local sort by partition before writing.
106
-
107
-
!!! tip "Since 0.3.0 - spark.clickhouse.write.localSortByKey"
108
-
109
-
Default Value: true
110
-
111
-
Description: If `true`, do local sort by sort keys before writing.
112
-
113
-
!!! tip "Since 0.4.0 - spark.clickhouse.ignoreUnsupportedTransform"
114
-
115
-
Default Value: false
116
-
117
-
Description: ClickHouse supports using complex expressions as sharding keys or partition values,
118
-
e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`,
119
-
ignore the unsupported expressions, otherwise fail fast w/ an exception. Note: when
120
-
`spark.clickhouse.write.distributed.convertLocal` is enabled, ignore unsupported sharding keys
121
-
may corrupt the data.
122
-
123
-
!!! tip "Since 0.5.0 - spark.clickhouse.read.compression.codec"
124
-
125
-
Default Value: lz4
126
-
127
-
Description: The codec used to decompress data for reading. Supported codecs: none, lz4.
128
-
129
-
!!! tip "Since 0.3.0 - spark.clickhouse.write.compression.codec"
130
-
131
-
Default Value: lz4
132
-
133
-
Description: The codec used to compress data for writing. Supported codecs: none, lz4.
134
-
135
-
!!! tip "Since 0.6.0 - spark.clickhouse.read.format"
136
-
137
-
Default Value: json
138
-
139
-
Description: Serialize format for reading. Supported formats: json, binary.
140
-
141
-
!!! tip "Since 0.4.0 - spark.clickhouse.write.format"
142
-
143
-
Default Value: arrow
144
-
145
-
Description: Serialize format for writing. Supported formats: json, arrow.
16
+
<!--begin-include-->
17
+
|Key | Default | Description | Since
18
+
|--- | ------- | ----------- | -----
19
+
spark.clickhouse.ignoreUnsupportedTransform|false|ClickHouse supports using complex expressions as sharding keys or partition values, e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`, ignore the unsupported expressions, otherwise fail fast w/ an exception. Note: when `spark.clickhouse.write.distributed.convertLocal` is enabled, ignore unsupported sharding keys may corrupt the data.|0.4.0
20
+
spark.clickhouse.read.compression.codec|lz4|The codec used to decompress data for reading. Supported codecs: none, lz4.|0.5.0
21
+
spark.clickhouse.read.distributed.convertLocal|true|When reading Distributed table, read local table instead of itself. If `true`, ignore `spark.clickhouse.read.distributed.useClusterNodes`.|0.1.0
22
+
spark.clickhouse.read.format|json|Serialize format for reading. Supported formats: json, binary|0.6.0
23
+
spark.clickhouse.read.splitByPartitionId|true|If `true`, construct input partition filter by virtual column `_partition_id`, instead of partition value. There are known bugs to assemble SQL predication by partition value. This feature requires ClickHouse Server v21.6+|0.4.0
24
+
spark.clickhouse.write.batchSize|10000|The number of records per batch on writing to ClickHouse.|0.1.0
25
+
spark.clickhouse.write.compression.codec|lz4|The codec used to compress data for writing. Supported codecs: none, lz4.|0.3.0
26
+
spark.clickhouse.write.distributed.convertLocal|false|When writing Distributed table, write local table instead of itself. If `true`, ignore `spark.clickhouse.write.distributed.useClusterNodes`.|0.1.0
27
+
spark.clickhouse.write.distributed.useClusterNodes|true|Write to all nodes of cluster when writing Distributed table.|0.1.0
28
+
spark.clickhouse.write.format|arrow|Serialize format for writing. Supported formats: json, arrow|0.4.0
29
+
spark.clickhouse.write.localSortByKey|true|If `true`, do local sort by sort keys before writing.|0.3.0
30
+
spark.clickhouse.write.localSortByPartition|<valueofspark.clickhouse.write.repartitionByPartition>|If `true`, do local sort by partition before writing. If not set, it equals to `spark.clickhouse.write.repartitionByPartition`.|0.3.0
31
+
spark.clickhouse.write.maxRetry|3|The maximum number of write we will retry for a single batch write failed with retryable codes.|0.1.0
32
+
spark.clickhouse.write.repartitionByPartition|true|Whether to repartition data by ClickHouse partition keys to meet the distributions of ClickHouse table before writing.|0.3.0
33
+
spark.clickhouse.write.repartitionNum|0|Repartition data to meet the distributions of ClickHouse table is required before writing, use this conf to specific the repartition number, value less than 1 mean no requirement.|0.1.0
34
+
spark.clickhouse.write.repartitionStrictly|false|If `true`, Spark will strictly distribute incoming records across partitions to satisfy the required distribution before passing the records to the data source table on write. Otherwise, Spark may apply certain optimizations to speed up the query but break the distribution requirement. Note, this configuration requires SPARK-37523, w/o this patch, it always act as `true`.|0.3.0
35
+
spark.clickhouse.write.retryInterval|10s|The interval in seconds between write retry.|0.1.0
36
+
spark.clickhouse.write.retryableErrorCodes|241|The retryable error codes returned by ClickHouse server when write failing.|0.1.0
0 commit comments