Skip to content

DuckDB Spark API is incompatible with the PySpark API's spark.createDataFrame(list of dict) #183

@asddfl

Description

@asddfl

What happens?

The DuckDB Spark API is incompatible with the PySpark API's spark.createDataFrame(list of dict) method.

To Reproduce

from duckdb.experimental.spark.sql import SparkSession as DuckdbSparkSession
from pyspark.sql import SparkSession

sql_text = "SELECT * FROM t0"
data = [
    {"c0": "1969-12-21"}
]
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data)
df.createOrReplaceTempView("t0")

print("PySpark SQL result:")
pyspark_result = spark.sql(sql_text)
pyspark_result.show()

duckdb_spark = DuckdbSparkSession.builder.getOrCreate()
df = duckdb_spark.createDataFrame(data)
df.createOrReplaceTempView("t0")

print("Duckdb Spark SQL result: ")
duckdb_spark_result = duckdb_spark.sql(sql_text)
duckdb_spark_result.show()
PySpark SQL result:
+----------+                                                                    
|        c0|
+----------+
|1969-12-21|
+----------+

Duckdb Spark SQL result: 
┌─────────┐
│  col0   │
│ varchar │
├─────────┤
│ c0      │
└─────────┘

OS:

x86_64 Ubuntu 24.04 Linux-6.14.0-35-generic-x86_64-with-glibc2.39

DuckDB Version:

1.4.2

DuckDB Client:

Python

Hardware:

No response

Full Name:

asddfl

Affiliation:

xxx

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • Yes, I have

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant data sets for reproducing the issue?

Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions