[SPARK-55658][PYTHON] SparkSessionBuilder.create in PySpark classic should mirror getOrCreate path as much as possible#54429
Conversation
|
This is not minor, changing the core constructor path needs a JIRA at least and some testing. I don't see a benefit to this change. |
python/pyspark/sql/session.py
Outdated
| sparkConf.set(key, str(value)) | ||
|
|
||
| sc = SparkContext.getOrCreate(sparkConf) | ||
| session = SparkSession._instantiatedSession |
There was a problem hiding this comment.
what does this mean? I think def create should always create a new session?
There was a problem hiding this comment.
Yes create always returns a new session. We are just reusing the sc of the instantiated session if it exists. I renamed the variables to make this more clear
Add test to ensure SparkConf is not called when a session exists.
@holdenk I created a JIRA ticket and linked it to this PR. Also added a test verifying the change made here. While this does change the core
The main benefits here are minimizing divergence with the |
|
@holdenk Just want to confirm before merging — are you okay with the latest changes in this PR? |
|
I am concerned about changing the behavior when a user has some dynamic conf specified, would love to see test coverage for that first. |
|
@holdenk we only allow one |
What changes were proposed in this pull request?
As titled this is a minor hygiene improvement which brings the create and getOrCreate codepaths closer together. Prior to this PR, the create codepath would always create a new SparkConf and pass that to SparkContext.getOrCreate. When a SparkSession/Context already exists, the creation of the SparkConf is not required at all and the SparkContext can be fetched from the instantiated session.
Note that this change only modifies SparkSessionBuilder.create in PySpark classic which was recently added here.
Why are the changes needed?
Code hygiene
Does this PR introduce any user-facing change?
No
How was this patch tested?
We mainly rely on existing tests for the create codepath. We also add a test to verify that a SparkConf is not created when there is an existing session.
Was this patch authored or co-authored using generative AI tooling?
No