Skip to content

[VL] Fix duplicate File prefix in FileSourceScanExecTransformer node name on Spark 4.1#11968

Open
srichetar wants to merge 1 commit intoapache:mainfrom
srichetar:fix/file-source-scan-duplicate-node-name-prefix
Open

[VL] Fix duplicate File prefix in FileSourceScanExecTransformer node name on Spark 4.1#11968
srichetar wants to merge 1 commit intoapache:mainfrom
srichetar:fix/file-source-scan-duplicate-node-name-prefix

Conversation

@srichetar
Copy link
Copy Markdown

What changes were proposed in this pull request?

Override nodeNamePrefix to "" in FileSourceScanExecTransformerBase to prevent the scan node from displaying as FileFileSourceScanExecTransformer instead of FileSourceScanExecTransformer in physical plans on Spark 4.1.

Root cause

  1. The Spark 4.1 shim AbstractFileSourceScanExec sets nodeNamePrefix = "File"
  2. FileSourceScanExecTransformerBase overrides nodeName using getClass.getSimpleName which already starts with "File"
  3. simpleString concatenates nodeNamePrefix + nodeName producing "File" + "FileSourceScanExecTransformer ..." = "FileFileSourceScanExecTransformer ..."

The existing bug report #11402 also incidentally shows FileFileSourceScanExecTransformer in its plan output, confirming this is not environment-specific.

Fix

Add override val nodeNamePrefix: String = "" in FileSourceScanExecTransformerBase to prevent the inherited "File" prefix from being prepended to the already-complete nodeName.

Does this PR introduce any user-facing change?

Yes (cosmetic). Physical plan display shows correct node name (FileSourceScanExecTransformer) instead of FileFileSourceScanExecTransformer.

How was this tested?

Existing tests. The test class TestFileSourceScanExecTransformer already demonstrates the nodeNamePrefix override pattern.

…e on Spark 4.1

The simpleString method in FileSourceScanExecTransformerBase prepends
nodeNamePrefix to nodeName. On Spark 4.1, AbstractFileSourceScanExec
sets nodeNamePrefix='File', but nodeName already contains the full class
name (via getClass.getSimpleName) which starts with 'File'. This causes
the physical plan to display 'FileFileSourceScanExecTransformer' instead
of 'FileSourceScanExecTransformer'.

Override nodeNamePrefix to empty string in FileSourceScanExecTransformerBase
to prevent the double prefix.
@github-actions github-actions Bot added the CORE works for Gluten Core label Apr 21, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant