Skip to content

feat(lance): Ensure Spark-SQL works correctly when using lance as base file format#18375

Draft
wombatu-kun wants to merge 1 commit intoapache:masterfrom
wombatu-kun:lance-test-spark-sql
Draft

feat(lance): Ensure Spark-SQL works correctly when using lance as base file format#18375
wombatu-kun wants to merge 1 commit intoapache:masterfrom
wombatu-kun:lance-test-spark-sql

Conversation

@wombatu-kun
Copy link
Contributor

@wombatu-kun wombatu-kun commented Mar 24, 2026

Describe the issue this Pull Request addresses

Closes #18268

Summary and Changelog

Added HoodieLanceInputFormat, HoodieLanceRealtimeInputFormat and HoodieLanceRecordReader (similar to HFile format).
Parametrized Spark-Sql tests TestSparkSqlCoreFlow, TestSqlStatement and TestQueryMergeOnReadOptimizedTable with baseFileFormat (parquet/lance) and fixed bugs.

Impact

Spark-SQL works correctly when using lance as base file format

Risk Level

low

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@wombatu-kun wombatu-kun changed the title Ensure Spark-SQL works correctly when using lance as base file format feat(lance): Ensure Spark-SQL works correctly when using lance as base file format Mar 24, 2026
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Mar 24, 2026
@wombatu-kun wombatu-kun force-pushed the lance-test-spark-sql branch 3 times, most recently from 0a1f3c8 to 48b2d92 Compare March 25, 2026 05:22
…e file format

parametrized TestSqlStatement with lance/parquet baseFileFormat
fixed TestAvroFileWriterFactory and TestSecondaryIndexPruning
fixed initializeWriter
parametrized TestQueryMergeOnReadOptimizedTable for lance
parametrized TestSparkSqlCoreFlow with baseFileFormat
added HoodieLanceInputFormat and HoodieLanceRealtimeInputFormat
@wombatu-kun wombatu-kun force-pushed the lance-test-spark-sql branch from 48b2d92 to 0558b63 Compare March 25, 2026 08:29
@wombatu-kun wombatu-kun marked this pull request as ready for review March 25, 2026 08:39
@wombatu-kun wombatu-kun marked this pull request as draft March 25, 2026 09:09
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 36.95652% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.14%. Comparing base (817b3ad) to head (0558b63).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/hudi/hadoop/HoodieLanceRecordReader.java 0.00% 31 Missing ⚠️
...adoop/realtime/HoodieLanceRealtimeInputFormat.java 0.00% 20 Missing ⚠️
...n/java/org/apache/hudi/index/HoodieIndexUtils.java 59.25% 2 Missing and 9 partials ⚠️
...ache/hudi/hadoop/utils/HoodieInputFormatUtils.java 25.00% 9 Missing ⚠️
...ution/datasources/lance/SparkLanceReaderBase.scala 0.00% 6 Missing ⚠️
...he/hudi/SparkFileFormatInternalRecordContext.scala 70.58% 2 Missing and 3 partials ⚠️
...org/apache/hudi/hadoop/HoodieLanceInputFormat.java 0.00% 4 Missing ⚠️
...main/java/org/apache/hudi/io/HoodieReadHandle.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18375      +/-   ##
============================================
- Coverage     68.38%   66.14%   -2.25%     
+ Complexity    27505    26822     -683     
============================================
  Files          2428     2436       +8     
  Lines        132926   133340     +414     
  Branches      16000    16044      +44     
============================================
- Hits          90902    88192    -2710     
- Misses        34964    38190    +3226     
+ Partials       7060     6958     -102     
Flag Coverage Δ
common-and-other-modules 37.24% <10.14%> (-7.11%) ⬇️
hadoop-mr-java-client 45.08% <10.81%> (+0.01%) ⬆️
spark-client-hadoop-common 48.50% <12.12%> (+0.29%) ⬆️
spark-java-tests 48.64% <31.15%> (-0.06%) ⬇️
spark-scala-tests 45.54% <35.50%> (+0.13%) ⬆️
utilities 38.48% <7.97%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...c/main/java/org/apache/hudi/table/HoodieTable.java 89.52% <100.00%> (+0.03%) ⬆️
...torage/row/HoodieInternalRowFileWriterFactory.java 89.65% <100.00%> (+0.76%) ⬆️
...org/apache/hudi/common/model/HoodieFileFormat.java 90.90% <100.00%> (+9.95%) ⬆️
...apache/hudi/common/table/read/UpdateProcessor.java 96.82% <100.00%> (+0.05%) ⬆️
...pache/hudi/io/storage/HoodieFileWriterFactory.java 75.00% <100.00%> (+3.12%) ⬆️
...rg/apache/hudi/io/lance/HoodieBaseLanceWriter.java 66.66% <100.00%> (ø)
...main/java/org/apache/hudi/io/HoodieReadHandle.java 70.58% <80.00%> (-4.42%) ⬇️
...org/apache/hudi/hadoop/HoodieLanceInputFormat.java 0.00% <0.00%> (ø)
...he/hudi/SparkFileFormatInternalRecordContext.scala 81.08% <70.58%> (-8.92%) ⬇️
...ution/datasources/lance/SparkLanceReaderBase.scala 78.18% <0.00%> (-7.82%) ⬇️
... and 4 more

... and 166 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Followup] Ensure Spark-SQL works correctly when using lance as base file format

3 participants