[DOCS] Clarify DataFrames in quickstart#54428
Open
celestehorgan wants to merge 1 commit intoapache:masterfrom
Open
[DOCS] Clarify DataFrames in quickstart#54428celestehorgan wants to merge 1 commit intoapache:masterfrom
celestehorgan wants to merge 1 commit intoapache:masterfrom
Conversation
holdenk
reviewed
Feb 23, 2026
Contributor
holdenk
left a comment
There was a problem hiding this comment.
I know its still WIP but just left a comment about leaving in the transformation language.
docs/quick-start.md
Outdated
| {% endhighlight %} | ||
|
|
||
| You can get values from DataFrame directly, by calling some actions, or transform the DataFrame to get a new one. For more details, please read the _[API doc](api/python/index.html#pyspark.sql.DataFrame)_. | ||
| Once you've created the DataFrame, you can perform actions against it. For more details see the [API doc](api/python/index.html#pyspark.sql.DataFrame). |
Contributor
There was a problem hiding this comment.
So actions and transformations are seperate concepts in Spark, so having them both mentioned would be better.
Broadly speaking a transformation is something which gives you back another DataFrame/Dataset/RDD and an action is one which collects/writes out/ or otherwise forces evaluation of a DataFrame/Dataset/RDD.
The distinction is a bit more blurry with a few specific transformations but that's beyond the scope of getting started.
Author
There was a problem hiding this comment.
Is this not addressed by the next section in L#76?
ecebd5f to
3c5c7ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This pull request clarifies some of the language around DataFrames and Datasets in the Python Quickstart, and corrects some grammar/sentence structure in the first section of the Quickstart guide. No breaking changes are introduced.
Why are the changes needed?
The Quickstart is one of the highest traffic-ed pages in any documentation website. The original authors saw fit to introduce the idea of DataFrames vs. Datasets in the Python quickstart, but the user needs to understand why that matters (namely, that other languages they might use Spark in implement things differently – indeed, the Scala quickstart one tab over sticks entirely with the concept of Datasets).
Does this PR introduce any user-facing change?
Yes! Some language in https://spark.apache.org/docs/latest/quick-start.html changes.
How was this patch tested?
This patch was built locally to ensure the website still built.
Was this patch authored or co-authored using generative AI tooling?
No