Skip to content

[AURON #2229] bugfix Silent fallback to empty partitions may cause data loss#2230

Open
guixiaowen wants to merge 5 commits into
apache:masterfrom
guixiaowen:foriceberg_2229
Open

[AURON #2229] bugfix Silent fallback to empty partitions may cause data loss#2230
guixiaowen wants to merge 5 commits into
apache:masterfrom
guixiaowen:foriceberg_2229

Conversation

@guixiaowen

Copy link
Copy Markdown
Contributor

#2229

Which issue does this PR close?

Closes #2229

Rationale for this change

The current implementation silently falls back to Seq.empty when both DataSource V2 and reflective partition retrieval fail. This may lead to empty input partitions without any failure signal, potentially causing silent data loss or empty query results.

Problem Description

In inputPartitions(exec: BatchScanExec), when partition planning fails:

V2 API failure → caught and ignored
Reflection failure → caught and ignored
Final fallback → Seq.empty

catch { case t: Throwable => logWarning(...) Seq.empty }

and

catch { case t: Throwable => logWarning(...) Seq.empty }

Risk

Returning an empty partition list is dangerous because:

  1. Silent data loss
    Downstream execution may interpret empty partitions as:

  2. no data available
    valid empty scan result

→ leads to incorrect empty query results without failure

Hard to detect in production
Only warning logs are emitted
No exception or job failure
Easy to miss in large pipelines

  1. Misleading semantics
    A scan failure (cannot plan partitions) is semantically different from:

dataset is actually empty

But both are currently indistinguishable.

What changes are included in this PR?

Expected Behavior

When both:

V2 planInputPartitions
reflection-based partition retrieval

fail, the system should:

Fail fast with a clear exception:

throw new IllegalStateException( s"Failed to plan input partitions for ${exec.getClass.getName}")

Are there any user-facing changes?

Nothing.

How was this patch tested?

Depends on existing unit tests.

@guixiaowen guixiaowen changed the title [AURON #2229] Silent fallback to empty partitions may cause data loss… [AURON #2229] Silent fallback to empty partitions may cause data loss May 3, 2026
s"Failed to obtain input partitions via reflection for ${exec.getClass.getName}.",
t)
Seq.empty
throw new IllegalStateException(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are throwing an exception anyways, do we need to catch it in L189?

try {
inputPartitions(exec)
} catch {
case e: IllegalStateException =>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to catch any unexpected exceptions (like fs or metadata errors)?

@guixiaowen guixiaowen changed the title [AURON #2229] Silent fallback to empty partitions may cause data loss [AURON #2229] bugfix Silent fallback to empty partitions may cause data loss May 15, 2026

@weiqingy weiqingy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up. Two questions on the failure path inline.

Comment on lines +82 to +88
try {
inputPartitions(exec)
} catch {
case e: Throwable =>
logWarning(s"Get Partition error: ${e.getMessage}")
return None
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch converts the IllegalStateException thrown at L200 into return None, so on a planning failure the effective behavior is a fallback to Spark's scan path (plus two WARN logs) rather than the "fail fast with a clear exception" the description mentions — no exception reaches the caller. Falling back to Spark is arguably the safer outcome, since the query still returns correct results, so this may well be what you want — worth confirming it's intentional rather than a hard failure.

If Spark-fallback is the intent, the throw-at-L200-then-catch-here round-trip is using an exception purely for control flow. Would signaling failure through the return type read more directly — e.g. inputPartitions returning Option[Seq[InputPartition]], with None = planning failed and Some(empty) = genuinely empty table, and plan() matching on it? That also answers the question on the other thread about whether this catch is removable: it isn't — dropping it would route a failure back into the empty-partition branch and reintroduce the empty-plan case.

Minor: this catches Throwable, while the sibling catch at L196 uses NonFatal. Matching NonFatal here would avoid folding fatal errors like OutOfMemoryError into a silent Spark fallback.

Comment on lines 197 to 199
logWarning(
s"Failed to obtain input partitions via reflection for ${exec.getClass.getName}.",
t)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch now throws on a reflective failure, which is what routes plan() to the Spark fallback. The V2 batch path earlier in this method (the exec.scan.toBatch / planInputPartitions() block, ~L158-165) still catches Throwable and returns Seq.empty, then falls through to reflection. So if V2 planning fails for a real reason — an fs or metadata error — that failure is masked, and if reflection then yields empty, plan() builds a native scan with empty partitions: the same silent-empty outcome #2229 is about, one level up. Is the V2 swallow-to-empty intentional, or should a V2 planning failure surface the same way this one now does?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent fallback to empty partitions may cause data loss

4 participants