Skip to content

[GH-3089] Dispose rasters in RS_DWithin join path and fix empty-envelope filter#3090

Draft
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:fix/rs-dwithin-raster-disposal
Draft

[GH-3089] Dispose rasters in RS_DWithin join path and fix empty-envelope filter#3090
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:fix/rs-dwithin-raster-disposal

Conversation

@jiayuasu

Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

This PR fixes three related issues in the raster distance-join machinery built around RS_DWithin, reported in #3089. All changes are confined to spark/common.

  1. Native raster memory leak. RS_DWithin.eval and the WGS84 envelope builders used by the join planner deserialize GridCoverage2D rasters but never dispose them, so a raster distance join leaks off-heap memory on every row — a real problem on long-running executors. The affected sites are:

    • RS_DWithin.eval (RasterPredicates.scala)
    • TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD
    • both raster branches in BroadcastIndexJoinExec that build the expanded WGS84 envelope

    Each deserialized raster is now wrapped in try/finally and released with raster.dispose(true), mirroring the discipline already used by RS_Predicate.evaluator.

  2. Empty-envelope false positive. When a raster or geometry input is NULL, the join substitutes an empty GeometryCollection. expandRasterFilterEnvelope then expanded that geometry's degenerate envelope by distance, yielding a non-empty filter geometry that spuriously matched rows the per-row predicate would reject. It now returns the empty shape unchanged so the coarse R-tree filter excludes it.

  3. Misleading EXPLAIN output. The raster distance branch of BroadcastIndexJoinExec.simpleString printed RS_Distance(left, right) < r — a function that does not exist in Sedona. It now prints RS_DWithin(left, right, r), naming the actual predicate that drives the join.

How was this patch tested?

sedona-spark-common compiles cleanly with the changes. The fixes are minimal, mechanical resource-management changes within existing code paths exercised by the existing RasterJoinSuite and BroadcastIndexJoinSuite.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

…-envelope filter

Three related issues in the raster distance-join machinery introduced with
RS_DWithin (apacheGH-3089):

1. Off-heap raster leak: RS_DWithin.eval and the WGS84 envelope builders in
   TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD and
   BroadcastIndexJoinExec.streamShapeToExpandedEnvelope deserialize
   GridCoverage2D rasters but never dispose them. On long-running executors a
   raster distance join leaks native memory per row. Wrap each deserialization
   in try/finally and call raster.dispose(true), mirroring the discipline
   already used by RS_Predicate.evaluator.

2. Empty-envelope false positive: expandRasterFilterEnvelope expanded the
   degenerate envelope of the empty GeometryCollection substituted for a NULL
   raster/geometry, producing a non-empty filter geometry that spuriously
   matched rows the per-row predicate would reject. Return the empty shape
   unchanged so the coarse R-tree filter excludes it.

3. Misleading EXPLAIN output: the raster distance branch of
   BroadcastIndexJoinExec.simpleString printed "RS_Distance(left, right) < r",
   a non-existent function. Print "RS_DWithin(left, right, r)" to match the
   actual predicate.
@jiayuasu jiayuasu marked this pull request as draft June 30, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RS_DWithin / raster distance join leaks GridCoverage2D rasters (plus empty-envelope and EXPLAIN issues)

1 participant