[DOCS] Clarify DataFrames in quickstart by celestehorgan · Pull Request #54428 · apache/spark

celestehorgan · 2026-02-23T16:49:17Z

What changes were proposed in this pull request?

This pull request clarifies some of the language around DataFrames and Datasets in the Python Quickstart, and corrects some grammar/sentence structure in the first section of the Quickstart guide. No breaking changes are introduced.

Why are the changes needed?

The Quickstart is one of the highest traffic-ed pages in any documentation website. The original authors saw fit to introduce the idea of DataFrames vs. Datasets in the Python quickstart, but the user needs to understand why that matters (namely, that other languages they might use Spark in implement things differently – indeed, the Scala quickstart one tab over sticks entirely with the concept of Datasets).

Does this PR introduce any user-facing change?

Yes! Some language in https://spark.apache.org/docs/latest/quick-start.html changes.

How was this patch tested?

This patch was built locally to ensure the website still built.

Was this patch authored or co-authored using generative AI tooling?

No

holdenk

I know its still WIP but just left a comment about leaving in the transformation language.

holdenk · 2026-02-23T19:57:47Z

docs/quick-start.md

 {% endhighlight %}

-You can get values from DataFrame directly, by calling some actions, or transform the DataFrame to get a new one. For more details, please read the _[API doc](api/python/index.html#pyspark.sql.DataFrame)_.
+Once you've created the DataFrame, you can perform actions against it. For more details see the [API doc](api/python/index.html#pyspark.sql.DataFrame).


So actions and transformations are seperate concepts in Spark, so having them both mentioned would be better.

Broadly speaking a transformation is something which gives you back another DataFrame/Dataset/RDD and an action is one which collects/writes out/ or otherwise forces evaluation of a DataFrame/Dataset/RDD.

The distinction is a bit more blurry with a few specific transformations but that's beyond the scope of getting started.

Is this not addressed by the next section in L#76?

holdenk reviewed Feb 23, 2026

View reviewed changes

Clarify DataFrames in quickstart

3c5c7ab

celestehorgan force-pushed the update-quickstart branch from ecebd5f to 3c5c7ab Compare February 24, 2026 18:40

celestehorgan changed the title ~~[WIP][DOCS] Clarify DataFrames in quickstart~~ [DOCS] Clarify DataFrames in quickstart Feb 24, 2026

celestehorgan marked this pull request as ready for review February 24, 2026 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[DOCS] Clarify DataFrames in quickstart#54428

[DOCS] Clarify DataFrames in quickstart#54428
celestehorgan wants to merge 1 commit intoapache:masterfrom
celestehorgan:update-quickstart

celestehorgan commented Feb 23, 2026

Uh oh!

holdenk left a comment

Uh oh!

holdenk Feb 23, 2026

Uh oh!

celestehorgan Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

celestehorgan commented Feb 23, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

holdenk Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

celestehorgan Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants