[GLUTEN-10134][VL] Fix ANSI workflow: AI token limit, analyze-only artifact lookup, and --run support#11987
[GLUTEN-10134][VL] Fix ANSI workflow: AI token limit, analyze-only artifact lookup, and --run support#11987baibaichen wants to merge 6 commits intoapache:mainfrom
Conversation
|
/ansi-test |
|
🔄 ANSI mode analysis started by @baibaichen. View run |
ANSI Mode Test Analysis Report (Spark 4.1)Note Expression-level ANSI mode offload coverage analysis.
ANSI Offload suites: 498 tests, 43258 records | Other suites: 17706 tests ANSI OffloadOverview (ANSI Offload Expression Records)
Per-Suite Summary
Failure Cause Analysis (53 failures)
Other (23 failures)
|
… limit Group failures by (cause, suite), use compact JSON, drop redundant fields. Reduces AI prompt from ~6200+ tokens to ~3700 tokens (limit: 8000). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
|
🔄 ANSI mode analysis started by @baibaichen. View run |
…in analyze-only mode analyze-only mode now finds the correct full-test run by reading hidden markers (<!-- ansi-mode:full ansi-run:ID -->) from PR comments instead of searching by branch name. Also adds --run <ID> support for both issue_comment and workflow_dispatch triggers. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…nt spoofing in analyze-only mode - Fix workflow_dispatch analyze-only being misidentified as full mode - Fix paginated jq producing multiple run_id lines (use tail -1) - Restrict marker search to github-actions[bot] comments only - Validate --run input as numeric, take only first match - Add explicit permissions to check-comment job Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…lback grep -m1 stops after matching 1 line, but multiple --run tokens on the same line still produce multiple outputs. Use head -1 to guarantee a single value, with explicit empty check for inputs.run_id fallback (pipeline exit code from head would mask grep failure). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…ency Cap test names per group to 10, remove redundant failure_count field, and use reverse-sorted single-page comment lookup instead of paginating all comments. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
|
/ansi-analyze --run 24949958444 |
|
🔄 ANSI mode analysis started by @baibaichen. View run |
|
/ansi-analyze --run 24949958444 |
|
🔄 ANSI mode analysis started by @baibaichen. View run |
|
🔄 ANSI analyze-only started by @baibaichen. View run |
ANSI Mode Test Analysis Report (Spark 4.1)Note Expression-level ANSI mode offload coverage analysis.
ANSI Offload suites: 498 tests, 43258 records | Other suites: 17706 tests ANSI OffloadOverview (ANSI Offload Expression Records)
Per-Suite Summary
Failure Cause Analysis (53 failures)
Other (23 failures)
🤖 AI Deep AnalysisKey FindingsFallback Analysis (🔴): Highest-Priority — Where Velox Is Not Handling Core ANSI ExpressionsSummary: Breakdown by Expression Category
Detailed Root Cause AttributionReview of core files shows these root causes:
Conclusion:
Failure Hotspot Table
Note: Several other suites have 1–2 failures each of varied causes.
|
| Type | Count | % of Failures | Interpretation |
|---|---|---|---|
| WRONG_EXCEPTION | 41* | 43.6% | Velox error thrown, but Spark's wrapper rethrows as SparkException—ANSI error type context lost |
| NO_EXCEPTION | 44 | 46.8% | Expected ANIS-mode error (overflow/invalid cast) never thrown by Velox, result returned |
| OTHER | 19 | 20.2% | Includes correctness mismatches, error summary/output mismatches, miscellaneous |
*Counting de-duped multi-test records in JSON sample
Deep Analysis: WRONG_EXCEPTION Pathways
- Typical Stack Path:
Velox C++ throws anstd::overflow_erroror similar (e.g., viaVELOX_FAIL("[ARITHMETIC_OVERFLOW] ...")).
→ JNI/Java bridge inColumnarBatchOutIterator.javacatches, wraps as genericSparkException.
→ Error in test: "Expected ArithmeticException but got SparkException". - Key Code Sites:
- C++:
velox/functions/sparksql/Arithmetic.cpp(throws native exception; includes error context) - Java JNI Bridge:
gluten-arrow/src/main/java/org/apache/gluten/vectorized/ColumnarBatchOutIterator.java - Scala: Tests use
findCauseto unwrap exceptions (but type is lost when SparkException wraps).
- C++:
- Impact:
Exception type and context (error message with SQL text) are lost. This leads to failure of tests asserting overflow/cast error context, even if the underlying error occurs as expected.
NO_EXCEPTION: Root Cause Breakdown
| Category | Failures | Root Cause | Files |
|---|---|---|---|
| Cast | 25 | isAnsiSupported() in Velox false for most cast pairs. Falls back to try_cast; errors suppressed |
SparkCastExpr.cpp (C++) |
| Arithmetic | 9 | Velox arithmetic not triggering overflow pathway due to non-ANSI code | Arithmetic.cpp (C++), fallback/EvalCtx |
| Date/Datetime | 6 | Date/timestamp parsing errors not detected in ANSI path | DateTimeFunctions.cpp (C++ Spark SQL funcs) |
| Decimal | 1 | Decimal overflow paths not ANSI-complete or not checked in Velox | Decimals.cpp, SparkCastExpr.cpp (C++) |
| Collection | 1 | Invalid element index not detected | CollectionFunctions.cpp |
| DataFrameAgg | 3 | SUM/AVG overflow not signaled | Aggregation.cpp, Decimal/Sum path |
Conclusion:
- NO_EXCEPTION for Casts = Velox not checking (or delegating) the cast error path for that pair; falls through and Spark test expects error.
- For Arithmetic/Decimal = Missing or incomplete overflow detection under ANIS mode.
- For Date/Datetime = No error on malformed input because code path missing strict throw.
Failed + Fallback (🟠) Anomalies
- None reported — This is expected; fallback to Spark should always preserve full ANSI behavior.
Fix Recommendations
1. [P0] Implement/Un-gate ANSI Cast Error Paths for Additional Cast Pairs in Velox
- Symptom:
All NO_EXCEPTION failures inGlutenCastWithAnsiOnSuite(25 tests), plus many Fallbacks: validation for illegal/overflowing casts (e.g., String→Decimal, Decimal→Numeric) is bypassed — Velox currently only honors ANSI error check for String→{Boolean, Date, Integral}. - Root Cause:
FunctionisAnsiSupported()invelox/functions/sparksql/specialforms/SparkCastExpr.cpprestricts ANSI-mode branch to a tiny whitelist. All other casts revert to try_cast (which returns default/null instead of error). - Fix Point:
- File:
velox/functions/sparksql/specialforms/SparkCastExpr.cpp - Action:
- Expand the
isAnsiSupported()whitelist to include more Spark-supported ANSI cast pairs:- String→Decimal
- Decimal→{Short, Int, Long, Double, etc.}
- Numeric→Decimal (and vice versa)
- Any pair required by failed tests
- Ensure the casting logic throws on invalid input instead of silent try_cast on these pairs when
kSparkAnsiEnabledis true.
- Expand the
- File:
- Representative Tests:
ANSI mode: Throw exception on casting out-of-range value to byte typeANSI mode: Throw exception on casting out-of-range value to decimal type- Many more listed under "GlutenCastWithAnsiOnSuite" NO_EXCEPTION group
- Estimated Impact:
At least 25 NO_EXCEPTION test failures, plus major reduction in Fallback count for cast category — estimate ~1,500–2,000 records would turn green, dramatically raising ANSI coverage for Cast. - Priority Rationale:
Highest-impact group (25+ direct failures, ~2k fallbacks), fix is C++-side but tightly scoped to ~1–2 files and function list; does not require major Velox architectural change.
2. [P1] Fix Exception Wrapping Chain to Preserve Exception Type/context (WRONG_EXCEPTION)
- Symptom:
For arithmetic overflows and invalid casts, Spark tests expect e.g. ArithmeticException/NumberFormatException with original SQL context, but see only SparkException (type loss), causing 32+ failures in arithmetic/cast. - Root Cause:
The JNI/Java bridge (ColumnarBatchOutIterator.java) wraps or rethrows all native errors as generic SparkException, losing cause/type. This violates Spark's contract for exception class on overflow/cast error. - Fix Point:
- File:
gluten-arrow/src/main/java/org/apache/gluten/vectorized/ColumnarBatchOutIterator.java(translateException) - Action:
Enhance error translation to:- Recognize certain Velox error signatures and wrap as the correct Java exception type (ArithmeticException, NumberFormatException, etc.) with SQL context preserved.
- (Optionally) propagate cause (Throwable) if error message/source matches expected Spark error.
- (May also require tweaks to C++ error throw points to set error code class for translation).
- File:
- Representative Tests:
Add: Overflow exception should contain SQL text contextDivide: divide by 0 exception should contain SQL text context- All "Expected ArithmeticException but got SparkException"
- Estimated Impact:
32+ direct failures (WRONG_EXCEPTION); major improvement in error context correctness for ANSI. - Priority Rationale:
Medium-high impact, fix is single-file and translation logic, but requires careful mapping of error codes and messages; java↔native bridging not trivial but well-bounded.
3. [P1] Surface Support for Additional Types (INTERVAL, TIMESTAMP_NTZ, Complex) in TypeNode and Validators
- Symptom:
Fallbacks (🔴) for arithmetic/datetime/collection over unsupported types: INTERVAL, TIMESTAMP_NTZ, complex/nested arrays/maps. Velox doesn’t see these due to early Scala-side rejection. - Root Cause:
gluten-substrait/src/main/scala/org/apache/gluten/expression/ConverterUtils.scala#getTypeNodeblocks these types (e.g., "Type INTERVAL_YEAR_MONTH not supported yet"), andValidators.scaladisables offload for any expression involving such types.- Often Velox does support the primitive type but function registration/whitelisting is missing; e.g. INTERVAL arithmetic in Velox may work C++-side but isn't exposed by current validators.
- Fix Point:
- File(s):
gluten-substrait/src/main/scala/org/apache/gluten/expression/ConverterUtils.scalagluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/validator/Validators.scala- If C++ Velox support missing, e.g., for interval math, register C++ implementation first.
- Action:
- Enumerate fallback cases triggered by getTypeNode and Validators.
- Where Velox C++ supports the type, relax "unsupported" checks and allow Substrait translation.
- Where missing, enumerate upstream gaps.
- Representative Tests:
Not listed individually, but inferred from high Fallback rate in datetime, arithmetic on unsupported types, and intervals.
- File(s):
- Estimated Impact:
Could clear hundreds of currently Fallback records in datetime/arithmetic/collection. - Priority Rationale:
Moderate-high impact (hundreds of Fallbacks), fix is split across Scala validation and type mapping layers, and possibly requires upstream PR for Velox function/type registration in a few cases; not as concentrated as P0.
Summary Prioritization Table
| Tier | Fix Recommendation | Est. Impact | Fix Scope | Rationale |
|---|---|---|---|---|
| P0 | Add ANSI Cast support for more cast pairs (Velox C++) | 1,500+ | SparkCastExpr.cpp only | Very high, single-source, unblocks both fail and fallback |
| P1 | Java/C++ error type pass-through for Arithmetic/Cast errors | 32+ | Java JNI bridge | Concentrated, needed for correctness, requires precise error decode |
| P1 | Remove type gates for INTERVAL/etc. in Validators, TypeNode | 500+ | Scala + (some) C++ | Wide impact, somewhat split fix; C++ may already support some types |
Prioritized Action Steps
- [P0] Focus first on adding Cast error-path support in Velox for String→Decimal, Decimal↔Numeric, etc. (isAnsiSupported) — this alone will slash both fail and fallback rates for Cast.
- [P1] Fix the exception-wrapping chain in the Java JNI bridge, making sure error type is surfaced correctly to Spark (ArithmeticException, NumberFormatException, etc).
- [P1] Loosen Scala-side blocking of INTERVAL/TIMESTAMP_NTZ types where Velox supports the primitive/function, especially in arithmetic and datetime. If absent in Velox, file upstream or stage for later.
No Failed+Fallback (🟠) anomalies found.
References for Source Location Verification
- Confirmed from reviewing
SparkCastExpr.cpp(isAnsiSupported), most cast pairs not yet covered:bool isAnsiSupported(...) { // Only (String, Boolean), (String, Date), (String, TINYINT/SMALLINT/INT/BIGINT?) supported as true // All other (String, Decimal), (Decimal -> X), etc. not supported — defaults to try_cast } - Type fallback for INTERVAL types on Scala side:
case _: IntervalType => throw new GlutenNotSupportException("Type INTERVAL...") - Validators for fallback:
def fallbackByTimestampNTZ(...) = ... def fallbackComplexExpressions(...) = ... - Exception wrapping via
translateException:private Throwable findCause(Throwable t, ...) { ... throw new SparkException(msg, cause) }
This action plan, if followed, will raise ANSI coverage for Velox by thousands of tests and ensure correctness for critical error-path behavior.
Generated by gpt-4.1. AI analysis may not be fully accurate — please verify before acting on recommendations.
|
🔄 ANSI full test started by @baibaichen. View run |
ANSI Mode Test Analysis Report (Spark 4.1)Note Expression-level ANSI mode offload coverage analysis.
ANSI Offload suites: 498 tests, 43258 records | Other suites: 17706 tests ANSI OffloadOverview (ANSI Offload Expression Records)
Per-Suite Summary
Failure Cause Analysis (53 failures)
Other (23 failures)
🤖 AI Deep AnalysisKey FindingsFallback (🔴) Analysis – Highest PriorityFallback indicates expressions are not offloaded to Velox at all (Spark executes them). This is the most critical issue—Gluten appears to "pass" ANSI semantics but provides no native acceleration. Fallback Breakdown by Expression Type
Root Causes for FallbacksBased on code review of
Root-cause Grouping in Code
Failure Hotspot Table (Suite/Cause Concentration)
failCause Type Statistics
Root Cause Deep Analysis: WRONG_EXCEPTIONObserved pattern:
Exception Wrapping Chain:
Key Code Location:
Breakdown of NO_EXCEPTION by Root Cause
Failed+Fallback (🟠) AnalysisNo Failed+Fallback (🟠) records reported in this dataset. Fix Recommendations (max 3)1. Correct Cast Offloading: Implement ANSI cast semantics for all on-CPU cast pairsSymptom:
Root Cause:
Fix Point:
Representative Tests:
Estimated Impact:
Priority Rationale:
2. Exception Unwrapping: Map Velox native exceptions to matching Spark exception types in JNI bridgeSymptom:
Root Cause:
Fix Point:
Representative Tests:
Estimated Impact:
Priority Rationale:
3. Decimal Arithmetic Overflow Handling: Implement missing overflow checks for Decimal expressions in VeloxSymptom:
Root Cause:
Fix Point:
Representative Tests:
Estimated Impact:
Priority Rationale:
Summary Table of Recommendations
In summary:
If these three changes are made, the majority of all currently failing and fallback records (cast, arithmetic, decimal, and related) are likely to go green. Generated by gpt-4.1. AI analysis may not be fully accurate — please verify before acting on recommendations. |
|
🔄 ANSI analyze-only started by @baibaichen. View run |
ANSI Mode Test Analysis Report (Spark 4.1)Note Expression-level ANSI mode offload coverage analysis.
ANSI Offload suites: 498 tests, 43258 records | Other suites: 17706 tests ANSI OffloadOverview (ANSI Offload Expression Records)
Per-Suite Summary
Failure Cause Analysis (53 failures)
Other (23 failures)
🤖 AI Deep AnalysisKey FindingsFallback Analysis (🔴 - Highest Priority)Total Fallback Records: 5,419 (12.5% of 43,258)
Root Causes by Category
Category Summary Table
Failure Hotspot Table
The largest hotspots by far are Arithmetic and Cast under ANSI mode. failCause Type Statistics
Sample interpretation:
WRONG_EXCEPTION — Deep AnalysisSymptom: Code Path:
Fix Point:
NO_EXCEPTION Breakdown by Root Cause
Failed+Fallback (🟠) RecordsNone detected — as expected. Fix RecommendationsP0: Proper Java Exception Wrapping for Native-side Failures in ANSI Mode
P1: Expand Velox ANSI Cast Support Beyond Current Whitelist
P2: Backfill Spark→Substrait Type Node Support for "Fallen Back" Cast/Arithmetic/DateTime Expressions
No Failed+Fallback (🟠) records detected — system is, correctly, never both falling back and failing. Summary Table: Recommendation Overview
Appendix: Code/Root Cause Evidence
End of Key Findings & Recommendations. Generated by gpt-4.1. AI analysis may not be fully accurate — please verify before acting on recommendations. |
Summary
Fixes three bugs in the ANSI workflow (
velox_backend_ansi.yml) discovered after #11975 was merged:AI analysis 413 token limit:
_build_ai_context()exceeded the GitHub Models API 8000-token limit. Compressed by grouping failures by (cause, suite), using compact JSON, capping test lists to 10 per group, and removing redundant fields.analyze-only mode could not find artifacts:
issue_comment-triggered runs registerhead_branch=mainwith no PR association in metadata. The oldfind-runsearched by PR branch name and always got zero results. Replaced with Plan E: full-mode runs now embed a hidden marker (<!-- ansi-mode:full ansi-run:ID -->) in the PR starting comment; analyze-only reads the marker back to locate the correct run.--run <ID>support: Bothissue_comment(/ansi-analyze --run <ID>) andworkflow_dispatch(run_idinput) now accept an explicit run ID, bypassing the marker lookup.Additional hardening (from code review)
workflow_dispatchwithmode=analyze-onlybeing misidentified asfull(marker pollution)github-actions[bot]comments only (anti-spoofing)--runinput as numeric, take only first matchpermissionsblock tocheck-commentjobFiles changed
.github/workflows/velox_backend_ansi.yml— artifact lookup,--runsupport, hidden markers, permissions.github/skills/ansi-analysis/analyze-ansi.py— AI context compressionTest plan
workflow_dispatchanalyze-only +--run 24949958444→ success (run)workflow_dispatchfull mode → hidden marker written, analyze-results success (run)workflow_dispatchanalyze-only (no run_id) → Plan E marker lookup found run 24960461588 → success (run)/ansi-testvia issue_comment (requires merge to main first — workflow YAML is read from default branch)Known remaining issue
backends-velox test jobs run all suites instead of ANSI-specific ones, causing timeouts (~20h). Will be fixed in a follow-up.
🤖 Generated with Claude Code
Related issue: #10134