Commit 20af57c
[SPARK-54147][SQL] Set OMP_NUM_THREADS to spark.task.cpus by default in BaseScriptTransformationExec
### What changes were proposed in this pull request?
Set OMP_NUM_THREADS to spark.task.cpus by default in BaseScriptTransformationExec
### Why are the changes needed?
When we use the TRANSFORM function to invoke a Python script,the Python script uses packages such as PyTorch or NumPy. Since these libraries, by default, start a number of intra-op threads equal to the number of available CPU cores on the node, this can lead to CPU overload.
```
ADD ARCHIVE s3://example-bucket/udf/emotion/emotion_predict.zip;
ADD ARCHIVE s3://example-bucket/udf/emotion/python_env.zip;
INSERT OVERWRITE TABLE demo_db.text_emotion_result PARTITION (dt = 'XXX')
SELECT
TRANSFORM(
id,
title,
content
)
USING './python_env.zip/python_env/bin/python emotion_predict.zip/emotion_predict/predict.py'
AS (id, title, content, emotion_label, emotion_score)
FROM (
SELECT /*+ REPARTITION(1000) */
id, title, content
FROM demo_db.text_input_data
WHERE dt = 'XXX'
) src;
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually.
Closes #52850 from TongWei1105/SPARK-54147.
Authored-by: TongWei1105 <vvtwow@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>1 parent 3f41adc commit 20af57c
File tree
1 file changed
+4
-0
lines changed- sql/core/src/main/scala/org/apache/spark/sql/execution
1 file changed
+4
-0
lines changedLines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
87 | 91 | | |
88 | 92 | | |
89 | 93 | | |
| |||
0 commit comments