Skip to content

Comments

[SPARK-55658][PYTHON] SparkSessionBuilder.create in PySpark classic should mirror getOrCreate path as much as possible#54429

Open
jonmio wants to merge 6 commits intoapache:masterfrom
jonmio:jonmio-unify-create-and-getOrCreate
Open

[SPARK-55658][PYTHON] SparkSessionBuilder.create in PySpark classic should mirror getOrCreate path as much as possible#54429
jonmio wants to merge 6 commits intoapache:masterfrom
jonmio:jonmio-unify-create-and-getOrCreate

Conversation

@jonmio
Copy link
Contributor

@jonmio jonmio commented Feb 23, 2026

What changes were proposed in this pull request?

As titled this is a minor hygiene improvement which brings the create and getOrCreate codepaths closer together. Prior to this PR, the create codepath would always create a new SparkConf and pass that to SparkContext.getOrCreate. When a SparkSession/Context already exists, the creation of the SparkConf is not required at all and the SparkContext can be fetched from the instantiated session.

Note that this change only modifies SparkSessionBuilder.create in PySpark classic which was recently added here.

Why are the changes needed?

Code hygiene

Does this PR introduce any user-facing change?

No

How was this patch tested?

We mainly rely on existing tests for the create codepath. We also add a test to verify that a SparkConf is not created when there is an existing session.

Was this patch authored or co-authored using generative AI tooling?

No

@holdenk
Copy link
Contributor

holdenk commented Feb 23, 2026

This is not minor, changing the core constructor path needs a JIRA at least and some testing. I don't see a benefit to this change.

sparkConf.set(key, str(value))

sc = SparkContext.getOrCreate(sparkConf)
session = SparkSession._instantiatedSession
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean? I think def create should always create a new session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes create always returns a new session. We are just reusing the sc of the instantiated session if it exists. I renamed the variables to make this more clear

@jonmio jonmio changed the title [Minor] Update SparkSessionBuilder.create to use existing SparkContext if it already exists [SPARK-55658] Update SparkSessionBuilder.create to use existing SparkContext if it already exists Feb 24, 2026
Add test to ensure SparkConf is not called when a session exists.
@jonmio jonmio changed the title [SPARK-55658] Update SparkSessionBuilder.create to use existing SparkContext if it already exists [SPARK-55658] SparkSessionBuilder.create in PySpark classic should mirror getOrCreate path as much as possible Feb 24, 2026
@jonmio
Copy link
Contributor Author

jonmio commented Feb 24, 2026

This is not minor, changing the core constructor path needs a JIRA at least and some testing.

@holdenk I created a JIRA ticket and linked it to this PR. Also added a test verifying the change made here. While this does change the core create codepath it is only modifying a very recently added branch for PySpark classic. This was added around 6 weeks ago.

I don't see a benefit to this change.

The main benefits here are minimizing divergence with the getOrCreate codepath and removing the unnecessary SparkConf creation

@jonmio jonmio requested a review from cloud-fan February 24, 2026 15:46
@gengliangwang
Copy link
Member

@holdenk Just want to confirm before merging — are you okay with the latest changes in this PR?

@HyukjinKwon HyukjinKwon changed the title [SPARK-55658] SparkSessionBuilder.create in PySpark classic should mirror getOrCreate path as much as possible [SPARK-55658][PYTHON] SparkSessionBuilder.create in PySpark classic should mirror getOrCreate path as much as possible Feb 24, 2026
@holdenk
Copy link
Contributor

holdenk commented Feb 25, 2026

I am concerned about changing the behavior when a user has some dynamic conf specified, would love to see test coverage for that first.

@cloud-fan
Copy link
Contributor

cloud-fan commented Feb 25, 2026

@holdenk we only allow one SparkContext instance per driver JVM, so even if we try to create SparkContext before this PR, it returns the existing one too. I think this PR just make the code more explicit about it, plus saving the creation of SparkConf. It should not have any behavior change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants