Skip to content

Integrate syntax errors with error report#5371

Draft
ritvibhatt wants to merge 20 commits intoopensearch-project:mainfrom
ritvibhatt:syntax-exception-error-message
Draft

Integrate syntax errors with error report#5371
ritvibhatt wants to merge 20 commits intoopensearch-project:mainfrom
ritvibhatt:syntax-exception-error-message

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

PR Reviewer Guide 🔍

(Review updated until commit f02c2a9)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Introduce structured ErrorReport for syntax errors with suggestion registry

Relevant files:

  • common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java
  • common/src/main/java/org/opensearch/sql/common/antlr/suggestion/ExpectedTokensSuggestionProvider.java
  • common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorContext.java
  • common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionProvider.java
  • common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java
  • common/src/main/java/org/opensearch/sql/common/antlr/suggestion/UnquotedTableNameSuggestionProvider.java
  • sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/ContextFactory.java
  • sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/ExpectedTokensSuggestionProviderTest.java
  • sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/SyntaxErrorSuggestionRegistryTest.java
  • sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/UnquotedTableNameSuggestionProviderTest.java

Sub-PR theme: Propagate ErrorReport through callers and update all test assertions

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/parser/UnifiedQueryParserTest.java
  • async-query-core/src/main/java/org/opensearch/sql/spark/utils/SQLQueryUtils.java
  • legacy/src/main/java/org/opensearch/sql/legacy/plugin/RestSQLQueryAction.java
  • legacy/src/test/java/org/opensearch/sql/legacy/plugin/RestSQLQueryActionTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/antlr/PPLSyntaxParserTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReplaceTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstExpressionBuilderTest.java
  • sql/src/test/java/org/opensearch/sql/common/antlr/SyntaxParserTestBase.java
  • sql/src/test/java/org/opensearch/sql/sql/antlr/SQLSyntaxParserTest.java
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4872.yml

⚡ Recommended focus areas for review

Duplicate Logic

The syntaxError method now builds token-based suggestions both inline (lines 84-94) and delegates to SyntaxErrorSuggestionRegistry. However, ExpectedTokensSuggestionProvider (registered in the registry) also handles expected-token suggestions. If the registry returns results from ExpectedTokensSuggestionProvider, the inline fallback block (lines 82-94) is never reached, but if the registry returns empty, the inline block duplicates the same logic. This creates redundant code paths that may diverge over time.

if (!customSuggestions.isEmpty()) {
  // Use the first suggestion from the registry
  reportBuilder.suggestion(customSuggestions.get(0));
} else if (e != null) {
  // Fall back to expected tokens as suggestion if no pattern matches
  IntervalSet possibleContinuations = e.getExpectedTokens();
  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
  if (!suggestions.isEmpty()) {
    String suggestionText =
        possibleContinuations.size() > SUGGESTION_TRUNCATION_THRESHOLD
            ? String.format(
                "Expected one of %d possible tokens. Examples: %s",
                possibleContinuations.size(), String.join(", ", suggestions))
            : "Expected tokens: " + String.join(", ", suggestions);
    reportBuilder.suggestion(suggestionText);
  }
}

throw reportBuilder.build();
Global Mutable State

PROVIDERS is a static CopyOnWriteArrayList that is mutated by register(). Tests calling SyntaxErrorSuggestionRegistry.register() will permanently add providers to the global list for the entire JVM lifetime, potentially causing test pollution and non-deterministic behavior across test runs. There is no unregister or reset mechanism.

private static final CopyOnWriteArrayList<SyntaxErrorSuggestionProvider> PROVIDERS =
    new CopyOnWriteArrayList<>();

static {
  register(new UnquotedTableNameSuggestionProvider(), new ExpectedTokensSuggestionProvider());
}

private SyntaxErrorSuggestionRegistry() {}

public static void register(SyntaxErrorSuggestionProvider... providers) {
  PROVIDERS.addAll(Arrays.asList(providers));
  PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
}
Fragile Instanceof Check

The fallback condition checks e instanceof ErrorReport && ((ErrorReport) e).getCode() == ErrorCode.SYNTAX_ERROR. If future code throws an ErrorReport with a different error code (e.g., semantic errors), it will NOT fall back to the legacy handler, potentially changing existing behavior. Consider whether all ErrorReport instances should trigger fallback, or only those with SYNTAX_ERROR.

if (e instanceof SyntaxCheckException
    || e instanceof UnsupportedCursorRequestException
    || (e instanceof ErrorReport
        && ((ErrorReport) e).getCode() == ErrorCode.SYNTAX_ERROR)) {
  fallBackHandler.accept(channel, e);
} else {
  next.onFailure(e);
}
Regex Fragility

The regex [^A-Za-z0-9_\'"\s.,()*]used to detect special characters in table names may be too narrow or too broad. For example, it excludes-(hyphen) and@which can appear in index names. Additionally, thefollowsFromClause` walk-back only checks a fixed set of SQL keywords and may not correctly handle PPL syntax, leading to false positives or missed suggestions.

if (!offending.matches("[^A-Za-z0-9_`'\"\\s.,()*]")) return List.of();
Test Pollution

The test lowerPriorityProviderWinsOverHigherPriorityProvider registers stub providers into the global static SyntaxErrorSuggestionRegistry. Since there is no cleanup, these stubs persist for subsequent tests in the same JVM run, potentially affecting other tests that rely on the registry's state.

SyntaxErrorSuggestionRegistry.register(highPrioritySuggestion, lowPrioritySuggestion);

// Provide a context that both will match (both stubs ignore the context).
SyntaxErrorContext ctx = ContextFactory.contextFor("SELECT FROM t");
List<String> suggestions = SyntaxErrorSuggestionRegistry.findSuggestions(ctx);

assertEquals("low-wins", suggestions.get(0));

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

PR Code Suggestions ✨

Latest suggestions up to f02c2a9

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Isolate static registry state between tests

This test registers providers into the shared static SyntaxErrorSuggestionRegistry
without cleanup, which will pollute the registry for all subsequent tests in the
same JVM run. The stub providers always return suggestions regardless of context, so
they will interfere with other tests that rely on findSuggestions. Add a @AfterEach
or @BeforeEach reset step using a package-private reset method.

sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/SyntaxErrorSuggestionRegistryTest.java [20-23]

+@org.junit.jupiter.api.BeforeEach
+void resetRegistry() {
+  SyntaxErrorSuggestionRegistry.reset(); // package-private helper
+}
+
+@Test
 void lowerPriorityProviderWinsOverHigherPriorityProvider() {
   StubProvider lowPrioritySuggestion = new StubProvider("low-wins", 1);
   StubProvider highPrioritySuggestion = new StubProvider("high-loses", Integer.MAX_VALUE - 1);
   SyntaxErrorSuggestionRegistry.register(highPrioritySuggestion, lowPrioritySuggestion);
Suggestion importance[1-10]: 7

__

Why: The test registers stub providers into the shared static SyntaxErrorSuggestionRegistry without cleanup, which pollutes the registry for subsequent tests. Since the stubs always return suggestions regardless of context, this is a real test isolation issue that could cause flaky tests.

Medium
Prevent duplicate provider accumulation in registry

The register method adds providers to a shared static CopyOnWriteArrayList without
any deduplication check. In tests (e.g., SyntaxErrorSuggestionRegistryTest), calling
register multiple times will keep accumulating providers, potentially causing stale
or duplicate suggestions across test runs. Consider adding a guard or providing a
way to reset the registry in tests.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [25-28]

 public static void register(SyntaxErrorSuggestionProvider... providers) {
-  PROVIDERS.addAll(Arrays.asList(providers));
+  for (SyntaxErrorSuggestionProvider p : providers) {
+    if (!PROVIDERS.contains(p)) {
+      PROVIDERS.add(p);
+    }
+  }
   PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
 }
 
+/** Visible for testing only. Resets the registry to its initial state. */
+static void reset() {
+  PROVIDERS.clear();
+  register(new UnquotedTableNameSuggestionProvider(), new ExpectedTokensSuggestionProvider());
+}
+
Suggestion importance[1-10]: 6

__

Why: The static PROVIDERS list accumulates providers on every register() call without deduplication, which can cause test pollution and duplicate suggestions. Adding a reset() method and deduplication guard would improve test isolation and correctness.

Low
General
Remove duplicated fallback suggestion logic

The SyntaxErrorSuggestionRegistry already includes ExpectedTokensSuggestionProvider
as a fallback (with Integer.MAX_VALUE priority), so the manual fallback logic in
SyntaxAnalysisErrorListener duplicates that behavior. This duplication can lead to
inconsistent suggestion formatting and makes the registry pattern redundant. Remove
the manual else if (e != null) fallback block and rely solely on the registry.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [79-95]

 if (!customSuggestions.isEmpty()) {
   // Use the first suggestion from the registry
   reportBuilder.suggestion(customSuggestions.get(0));
-} else if (e != null) {
-  // Fall back to expected tokens as suggestion if no pattern matches
-  IntervalSet possibleContinuations = e.getExpectedTokens();
-  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
-  if (!suggestions.isEmpty()) {
-    String suggestionText =
-        possibleContinuations.size() > SUGGESTION_TRUNCATION_THRESHOLD
-            ? String.format(
-                "Expected one of %d possible tokens. Examples: %s",
-                possibleContinuations.size(), String.join(", ", suggestions))
-            : "Expected tokens: " + String.join(", ", suggestions);
-    reportBuilder.suggestion(suggestionText);
-  }
 }
Suggestion importance[1-10]: 6

__

Why: The ExpectedTokensSuggestionProvider is already registered in SyntaxErrorSuggestionRegistry as a fallback, making the manual else if (e != null) block in SyntaxAnalysisErrorListener redundant. However, the formatting differs slightly between the two implementations, so removing it requires verifying behavioral equivalence.

Low
Align fallback handler with new exception hierarchy

Since SyntaxAnalysisErrorListener now always throws ErrorReport (wrapping
SyntaxCheckException as the cause) instead of SyntaxCheckException directly, the e
instanceof SyntaxCheckException branch will never be true for parser-generated
errors. The SyntaxCheckException check should be removed or replaced with a check on
the cause to avoid dead code and potential missed fallbacks.

legacy/src/main/java/org/opensearch/sql/legacy/plugin/RestSQLQueryAction.java [136-143]

-if (e instanceof SyntaxCheckException
-    || e instanceof UnsupportedCursorRequestException
+if (e instanceof UnsupportedCursorRequestException
     || (e instanceof ErrorReport
-        && ((ErrorReport) e).getCode() == ErrorCode.SYNTAX_ERROR)) {
+        && ((ErrorReport) e).getCode() == ErrorCode.SYNTAX_ERROR)
+    || (e instanceof SyntaxCheckException)) {
   fallBackHandler.accept(channel, e);
 }
Suggestion importance[1-10]: 4

__

Why: The e instanceof SyntaxCheckException check may be dead code since SyntaxAnalysisErrorListener now throws ErrorReport instead of SyntaxCheckException directly. However, the improved code in the suggestion keeps SyntaxCheckException as a separate branch, which doesn't meaningfully change the logic and the original code already handles this via ErrorReport.

Low

Previous suggestions

Suggestions up to commit 2f16563
CategorySuggestion                                                                                                                                    Impact
General
Remove duplicated fallback suggestion logic

The ExpectedTokensSuggestionProvider is already registered in
SyntaxErrorSuggestionRegistry as a fallback provider, so the else if (e != null)
branch in SyntaxAnalysisErrorListener duplicates the fallback logic. This means the
expected-tokens suggestion could be generated twice or the registry's
ExpectedTokensSuggestionProvider is redundant. The fallback logic should be handled
exclusively by the registry to avoid duplication.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [79-95]

+List<String> customSuggestions = SyntaxErrorSuggestionRegistry.findSuggestions(context);
 if (!customSuggestions.isEmpty()) {
-  // Use the first suggestion from the registry
   reportBuilder.suggestion(customSuggestions.get(0));
-} else if (e != null) {
-  // Fall back to expected tokens as suggestion if no pattern matches
-  IntervalSet possibleContinuations = e.getExpectedTokens();
-  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
-  ...
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the else if (e != null) branch in SyntaxAnalysisErrorListener duplicates the ExpectedTokensSuggestionProvider logic already registered in SyntaxErrorSuggestionRegistry. This redundancy could lead to inconsistent behavior and should be consolidated.

Low
Fix non-atomic concurrent registration race condition

The register method uses CopyOnWriteArrayList for thread safety during iteration,
but addAll followed by sort is not atomic. Concurrent calls to register could
interleave, resulting in a partially sorted or inconsistent list. Consider
synchronizing the register method to ensure atomicity of the add-and-sort operation.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [25-28]

-public static void register(SyntaxErrorSuggestionProvider... providers) {
+public static synchronized void register(SyntaxErrorSuggestionProvider... providers) {
   PROVIDERS.addAll(Arrays.asList(providers));
   PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies a potential race condition between addAll and sort in the register method. However, this is a static registry that is only populated at startup (via the static block), so concurrent registration is unlikely in practice, making this a low-priority concern.

Low
Remove unreachable dead code in catch clause

Since ErrorReport is now thrown instead of SyntaxCheckException by the parser,
catching SyntaxCheckException here is dead code (it will never be thrown by the
parser anymore). If SyntaxCheckException can still be thrown from other code paths,
this should be documented; otherwise it should be removed to avoid confusion.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [65-69]

-} catch (SyntaxCheckException | UnsupportedOperationException | ErrorReport e) {
+} catch (UnsupportedOperationException | ErrorReport e) {
   throw e;
 } catch (Exception e) {
   throw new IllegalStateException("Failed to plan query", e);
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly notes that SyntaxCheckException may be dead code in the catch clause since the parser now throws ErrorReport. However, SyntaxCheckException could still be thrown from other code paths, so removing it without full analysis could introduce regressions.

Low
Possible issue
Remove unreachable dead code condition

The fallback condition only checks if the ErrorReport's direct cause is a
SyntaxCheckException, but since ErrorReport now wraps SyntaxCheckException as the
underlying cause, the check e instanceof SyntaxCheckException will never be true
anymore (as the parser now throws ErrorReport instead). The first condition e
instanceof SyntaxCheckException is now dead code and the logic should rely solely on
the ErrorReport check. Consider simplifying to just check for ErrorReport wrapping a
SyntaxCheckException, or verify whether SyntaxCheckException can still be thrown
directly elsewhere.

legacy/src/main/java/org/opensearch/sql/legacy/plugin/RestSQLQueryAction.java [135-139]

-if (e instanceof SyntaxCheckException
-    || e instanceof UnsupportedCursorRequestException
+if (e instanceof UnsupportedCursorRequestException
     || (e instanceof ErrorReport
         && ((ErrorReport) e).getCause() instanceof SyntaxCheckException)) {
   fallBackHandler.accept(channel, e);
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that e instanceof SyntaxCheckException may be dead code since the parser now throws ErrorReport wrapping SyntaxCheckException. However, SyntaxCheckException might still be thrown from other code paths not changed in this PR, so this is a moderate concern rather than a critical bug.

Low
Suggestions up to commit c88bd44
CategorySuggestion                                                                                                                                    Impact
General
Remove duplicated fallback suggestion logic

The ExpectedTokensSuggestionProvider is already registered in
SyntaxErrorSuggestionRegistry and handles the fallback expected-tokens logic. The
else if (e != null) branch in SyntaxAnalysisErrorListener.syntaxError duplicates
this logic, which means the fallback suggestion can be generated twice or
inconsistently. Remove the duplicate else if branch and rely solely on the registry
(which includes ExpectedTokensSuggestionProvider as a fallback).

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [74-97]

 // Use the suggestion registry to find pattern-based suggestions
 SyntaxErrorContext context =
     new SyntaxErrorContext(recognizer, offendingToken, tokens, query, e);
 List<String> customSuggestions = SyntaxErrorSuggestionRegistry.findSuggestions(context);
 
 if (!customSuggestions.isEmpty()) {
-  // Use the first suggestion from the registry
   reportBuilder.suggestion(customSuggestions.get(0));
-} else if (e != null) {
-  // Fall back to expected tokens as suggestion if no pattern matches
-  IntervalSet possibleContinuations = e.getExpectedTokens();
-  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
-  if (!suggestions.isEmpty()) {
-    String suggestionText =
-        possibleContinuations.size() > SUGGESTION_TRUNCATION_THRESHOLD
-            ? String.format(
-                "Expected one of %d possible tokens. Examples: %s",
-                possibleContinuations.size(), String.join(", ", suggestions))
-            : "Expected tokens: " + String.join(", ", suggestions);
-    reportBuilder.suggestion(suggestionText);
-  }
 }
Suggestion importance[1-10]: 6

__

Why: The ExpectedTokensSuggestionProvider is already registered in the registry and handles the fallback logic. The else if (e != null) branch duplicates this, potentially causing inconsistent behavior. However, the duplicate logic uses topSuggestions() which may differ slightly from ExpectedTokensSuggestionProvider, so this needs careful verification.

Low
Prevent global registry pollution between tests

The test registers stub providers into the global static
SyntaxErrorSuggestionRegistry, which is shared across all tests. This pollutes the
registry for other tests running in the same JVM, potentially causing flaky
failures. The test should use a local registry instance or clean up after itself.

sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/SyntaxErrorSuggestionRegistryTest.java [23-29]

 SyntaxErrorSuggestionRegistry.register(highPrioritySuggestion, lowPrioritySuggestion);
+try {
+  SyntaxErrorContext ctx = ContextFactory.contextFor("SELECT FROM t");
+  List<String> suggestions = SyntaxErrorSuggestionRegistry.findSuggestions(ctx);
+  assertEquals("low-wins", suggestions.get(0));
+} finally {
+  // Clean up registered stubs to avoid polluting other tests
+  // (requires exposing an unregister/reset method on the registry)
+  SyntaxErrorSuggestionRegistry.reset();
+}
 
-// Provide a context that both will match (both stubs ignore the context).
-SyntaxErrorContext ctx = ContextFactory.contextFor("SELECT FROM t");
-List<String> suggestions = SyntaxErrorSuggestionRegistry.findSuggestions(ctx);
-
-assertEquals("low-wins", suggestions.get(0));
-
Suggestion importance[1-10]: 4

__

Why: The test registers stub providers into the global static registry without cleanup, which can pollute other tests. However, the improved_code references a SyntaxErrorSuggestionRegistry.reset() method that doesn't exist in the PR, making the suggestion incomplete as-is.

Low
Include actual token in suggestion message

The suggestion hardcodes the example hello+world regardless of the actual
offending token or table name in the query. The suggestion should include the actual
offending context (e.g., the token text or surrounding identifier) to be more
actionable and accurate for users.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/UnquotedTableNameSuggestionProvider.java [20-21]

 return List.of(
-    "Quote table names containing special characters with backticks, e.g. `hello+world`");
+    "Quote table names containing special characters with backticks, e.g. `table" + offending + "name`");
Suggestion importance[1-10]: 3

__

Why: The hardcoded example `hello+world` is not contextual. However, the improved_code is syntactically incorrect (string concatenation with an undefined offending variable) and doesn't accurately reflect a valid implementation, making this suggestion unreliable.

Low
Possible issue
Fix race condition in provider registration

CopyOnWriteArrayList.sort() is not atomic with respect to addAll(), so concurrent
calls to register() can result in a partially-sorted or inconsistent list. Since
register() is called both from the static initializer and from tests, this is a real
race condition. Use a synchronized block or replace with a thread-safe sorted
structure.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [25-28]

-public static void register(SyntaxErrorSuggestionProvider... providers) {
+public static synchronized void register(SyntaxErrorSuggestionProvider... providers) {
   PROVIDERS.addAll(Arrays.asList(providers));
   PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
 }
Suggestion importance[1-10]: 5

__

Why: The addAll and sort operations on CopyOnWriteArrayList are not atomic, creating a potential race condition during concurrent register() calls. Adding synchronized is a valid fix, though in practice this is only called from static initializers and tests, limiting real-world impact.

Low
Suggestions up to commit 1f9b08b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against negative stop index for EOF tokens

Token.getStopIndex() returns -1 for EOF tokens, so end would be 0 and
query.substring(0) would return the entire query instead of an empty string. Add a
guard for the EOF case (token type Token.EOF or stopIndex < 0).

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorContext.java [31-35]

 public String getRemainingQuery() {
   if (offendingToken == null) return "";
-  int end = offendingToken.getStopIndex() + 1;
+  int stopIndex = offendingToken.getStopIndex();
+  if (stopIndex < 0) return ""; // EOF token
+  int end = stopIndex + 1;
   return end >= query.length() ? "" : query.substring(end);
 }
Suggestion importance[1-10]: 7

__

Why: Token.getStopIndex() returns -1 for EOF tokens, which would cause getRemainingQuery() to return the entire query instead of an empty string. This is a real edge case bug that could produce incorrect suggestions for queries ending at EOF.

Medium
Fix non-atomic add-then-sort race condition

CopyOnWriteArrayList.sort() replaces the list's contents atomically, but addAll
followed by sort is not atomic — a concurrent findSuggestions call between the two
operations could observe a partially-updated, unsorted list. Use a synchronized
block or replace with a lock-based list to make the two operations atomic.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [25-28]

-public static void register(SyntaxErrorSuggestionProvider... providers) {
+public static synchronized void register(SyntaxErrorSuggestionProvider... providers) {
   PROVIDERS.addAll(Arrays.asList(providers));
   PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
 }
Suggestion importance[1-10]: 5

__

Why: The race condition between addAll and sort is a valid concern for concurrent usage, but in practice register is called only during static initialization and test setup, making this a low-risk issue. Adding synchronized is a reasonable defensive improvement.

Low
General
Remove duplicated fallback suggestion logic

The ExpectedTokensSuggestionProvider is already registered in
SyntaxErrorSuggestionRegistry as a fallback (with Integer.MAX_VALUE priority), so
the else if (e != null) branch in SyntaxAnalysisErrorListener duplicates that logic.
This means the expected-tokens suggestion can be generated twice or the
registry-based one is always shadowed. Remove the duplicate else if branch and rely
solely on the registry.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [77-95]

 List<String> customSuggestions = SyntaxErrorSuggestionRegistry.findSuggestions(context);
-
 if (!customSuggestions.isEmpty()) {
-  // Use the first suggestion from the registry
   reportBuilder.suggestion(customSuggestions.get(0));
-} else if (e != null) {
-  // Fall back to expected tokens as suggestion if no pattern matches
-  IntervalSet possibleContinuations = e.getExpectedTokens();
-  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
-  ...
 }
 
+throw reportBuilder.build();
+
Suggestion importance[1-10]: 6

__

Why: The else if (e != null) branch in SyntaxAnalysisErrorListener duplicates the logic already handled by ExpectedTokensSuggestionProvider registered in the registry, creating redundant code and potential inconsistency. Removing the duplicate branch would simplify the code and rely on the registry as the single source of suggestions.

Low
Prevent static registry pollution between tests

SyntaxErrorSuggestionRegistry.PROVIDERS is a static field shared across all tests.
Registering stub providers here permanently pollutes the registry for all subsequent
tests in the same JVM run, potentially causing flaky failures in other tests that
rely on the registry's default state. The test should save and restore the registry
state, or the registry should expose a reset/unregister mechanism for testing.

sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/SyntaxErrorSuggestionRegistryTest.java [20-29]

+// Consider adding a package-private reset method to SyntaxErrorSuggestionRegistry for tests:
+// static void resetToDefaults() { ... }
+// Then call it in @BeforeEach / @AfterEach to isolate test state.
 void lowerPriorityProviderWinsOverHigherPriorityProvider() {
   StubProvider lowPrioritySuggestion = new StubProvider("low-wins", 1);
   StubProvider highPrioritySuggestion = new StubProvider("high-loses", Integer.MAX_VALUE - 1);
   SyntaxErrorSuggestionRegistry.register(highPrioritySuggestion, lowPrioritySuggestion);
+  // ... rest of test ...
+  SyntaxErrorSuggestionRegistry.resetToDefaults(); // restore after test
+}
Suggestion importance[1-10]: 5

__

Why: Registering stub providers into the static SyntaxErrorSuggestionRegistry without cleanup can cause test pollution across the test suite. However, the improved_code only adds comments rather than actual implementation, making the suggestion incomplete.

Low
Suggestions up to commit c6c83cc
CategorySuggestion                                                                                                                                    Impact
General
Remove duplicate fallback suggestion logic

The ExpectedTokensSuggestionProvider is already registered in
SyntaxErrorSuggestionRegistry as a fallback (with Integer.MAX_VALUE priority), so
the else if (e != null) branch in SyntaxAnalysisErrorListener duplicates that logic.
This means the expected-tokens suggestion can be generated twice or the registry's
fallback is bypassed. Remove the duplicate else if block and rely solely on the
registry for all suggestion generation.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [79-95]

 // Use the suggestion registry to find pattern-based suggestions
 SyntaxErrorContext context =
     new SyntaxErrorContext(recognizer, offendingToken, tokens, query, e);
 List<String> customSuggestions = SyntaxErrorSuggestionRegistry.findSuggestions(context);
 
 if (!customSuggestions.isEmpty()) {
-  // Use the first suggestion from the registry
   reportBuilder.suggestion(customSuggestions.get(0));
-} else if (e != null) {
-  // Fall back to expected tokens as suggestion if no pattern matches
-  IntervalSet possibleContinuations = e.getExpectedTokens();
-  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
-  ...
 }
Suggestion importance[1-10]: 7

__

Why: The ExpectedTokensSuggestionProvider is already registered in the registry with Integer.MAX_VALUE priority as a fallback, so the else if (e != null) block in SyntaxAnalysisErrorListener duplicates that logic. This could lead to inconsistent behavior where the registry's fallback is bypassed. Removing the duplicate block would simplify the code and ensure all suggestion logic flows through the registry.

Medium
Verify and clarify redundant exception type check

Since ErrorReport is now thrown instead of SyntaxCheckException from the parser (as
shown in the PR), the SyntaxCheckException check here may be redundant for the
syntax error case. However, more critically, if ErrorReport wraps a
SyntaxCheckException as its cause, only ErrorReport needs to be checked. Verify that
SyntaxCheckException can still be thrown independently in other code paths; if not,
remove it to avoid dead code and confusion.

legacy/src/main/java/org/opensearch/sql/legacy/plugin/RestSQLQueryAction.java [135-138]

-if (e instanceof SyntaxCheckException
-    || e instanceof ErrorReport
+if (e instanceof ErrorReport
+    || e instanceof SyntaxCheckException
     || e instanceof UnsupportedCursorRequestException) {
   fallBackHandler.accept(channel, e);
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify whether SyntaxCheckException is still independently thrown, but the improved_code is essentially identical to the existing_code (just reordered), making this a low-value observation. The PR intentionally keeps SyntaxCheckException for backward compatibility with other code paths.

Low
Possible issue
Prevent global state pollution between tests

This test mutates the global static SyntaxErrorSuggestionRegistry.PROVIDERS list by
registering stub providers, which will persist across tests and can cause
interference with other tests (e.g., UnquotedTableNameSuggestionProviderTest or
ExpectedTokensSuggestionProviderTest). The test should either reset the registry
after the test or use a local registry instance to avoid polluting the shared state.

sql/src/test/java/org/opensearch/sql/sql/antlr/suggestion/SyntaxErrorSuggestionRegistryTest.java [20-29]

 void lowerPriorityProviderWinsOverHigherPriorityProvider() {
   StubProvider lowPrioritySuggestion = new StubProvider("low-wins", 1);
   StubProvider highPrioritySuggestion = new StubProvider("high-loses", Integer.MAX_VALUE - 1);
   SyntaxErrorSuggestionRegistry.register(highPrioritySuggestion, lowPrioritySuggestion);
+  try {
+    SyntaxErrorContext ctx = ContextFactory.contextFor("SELECT FROM t");
+    List<String> suggestions = SyntaxErrorSuggestionRegistry.findSuggestions(ctx);
+    assertEquals("low-wins", suggestions.get(0));
+  } finally {
+    // Reset registry to avoid polluting other tests
+    SyntaxErrorSuggestionRegistry.reset();
+  }
+}
Suggestion importance[1-10]: 6

__

Why: The test registers stub providers into the global static SyntaxErrorSuggestionRegistry.PROVIDERS list without cleanup, which can pollute state for other tests. However, the improved_code references a SyntaxErrorSuggestionRegistry.reset() method that doesn't exist in the PR, making the suggestion partially invalid as-is.

Low
Fix thread-safety of provider registration and sorting

CopyOnWriteArrayList.sort() is not atomic with respect to addAll(), so concurrent
calls to register() can produce a partially-sorted or inconsistent list. Since the
static initializer already registers the default providers, consider making
register() synchronized or using a different thread-safe approach to ensure the sort
is always consistent after concurrent modifications.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [25-28]

-public static void register(SyntaxErrorSuggestionProvider... providers) {
+public static synchronized void register(SyntaxErrorSuggestionProvider... providers) {
   PROVIDERS.addAll(Arrays.asList(providers));
   PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
 }
Suggestion importance[1-10]: 5

__

Why: The addAll and sort operations on CopyOnWriteArrayList are not atomic, so concurrent register() calls could produce an inconsistently sorted list. Adding synchronized would fix this race condition, though in practice register() is typically only called during initialization.

Low
Suggestions up to commit 79f9b23
CategorySuggestion                                                                                                                                    Impact
General
Remove accidentally committed scratch file

This file (test_error_format.java) appears to be a temporary scratch/debug file
committed accidentally to the repository. It does not belong in the codebase and
should be removed before merging.

test_error_format.java [1-21]

-public class test_error_format {
-  public static void main(String[] args) {
-    ...
-  }
-}
+// File should be deleted from the repository
Suggestion importance[1-10]: 7

__

Why: The file test_error_format.java is a temporary debug/scratch file that should not be committed to the repository. It has no test framework, no package declaration, and serves no production or test purpose.

Medium
Remove duplicated fallback suggestion logic

The fallback logic for expected tokens in SyntaxAnalysisErrorListener duplicates the
functionality already implemented in ExpectedTokensSuggestionProvider, which is
registered in SyntaxErrorSuggestionRegistry with Integer.MAX_VALUE priority. Since
findSuggestions already falls through to ExpectedTokensSuggestionProvider when no
other provider matches, this duplicate fallback block is redundant and should be
removed to keep a single source of truth.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [79-95]

+List<String> customSuggestions = SyntaxErrorSuggestionRegistry.findSuggestions(context);
 if (!customSuggestions.isEmpty()) {
-  // Use the first suggestion from the registry
   reportBuilder.suggestion(customSuggestions.get(0));
-} else if (e != null) {
-  // Fall back to expected tokens as suggestion if no pattern matches
-  IntervalSet possibleContinuations = e.getExpectedTokens();
-  List<String> suggestions = topSuggestions(recognizer, possibleContinuations);
-  ...
 }
 
+throw reportBuilder.build();
+
Suggestion importance[1-10]: 5

__

Why: The fallback expected-tokens logic in SyntaxAnalysisErrorListener duplicates what ExpectedTokensSuggestionProvider already does via the registry, creating a maintenance burden and potential inconsistency. Removing it would consolidate suggestion logic in one place.

Low
Remove potentially dead exception catch branch

Since ErrorReport is now the exception thrown by the parser (wrapping
SyntaxCheckException as a cause), catching both independently is correct. However,
SyntaxCheckException may no longer be thrown directly from the parser, making that
branch dead code. Verify whether SyntaxCheckException can still be thrown standalone
here; if not, remove it to avoid misleading code.

async-query-core/src/main/java/org/opensearch/sql/spark/utils/SQLQueryUtils.java [89-91]

-} catch (SyntaxCheckException | ErrorReport syntaxCheckException) {
+} catch (ErrorReport syntaxCheckException) {
   return false;
 }
Suggestion importance[1-10]: 4

__

Why: Since the parser now throws ErrorReport instead of SyntaxCheckException, catching SyntaxCheckException may be dead code. However, it's a low-risk defensive catch and may still be needed if SyntaxCheckException is thrown from other code paths.

Low
Add missing trailing newline at end of file

The file is missing a trailing newline at the end, which can cause issues with some
Markdown renderers and version control tools. Add a newline character after the last
line.

cluster_demo_with_explanations.md [236]

+The cluster command transforms log analysis from "searching for known patterns" to "discovering unknown patterns automatically."
 
-
Suggestion importance[1-10]: 1

__

Why: The existing_code and improved_code are identical, meaning no actual change is demonstrated. Adding a trailing newline is a very minor style concern with negligible impact.

Low
Possible issue
Prevent duplicate provider registrations

The register method adds providers to the global static list without checking for
duplicates. Calling register multiple times (e.g., in tests via
SyntaxErrorSuggestionRegistry.register(...)) will keep appending the same providers,
causing duplicate suggestions and unpredictable ordering. Add a deduplication check
or clear-and-replace strategy before adding.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [25-28]

 public static void register(SyntaxErrorSuggestionProvider... providers) {
-  PROVIDERS.addAll(Arrays.asList(providers));
+  for (SyntaxErrorSuggestionProvider provider : providers) {
+    if (!PROVIDERS.contains(provider)) {
+      PROVIDERS.add(provider);
+    }
+  }
   PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
 }
Suggestion importance[1-10]: 6

__

Why: The register method can accumulate duplicate providers across multiple calls (e.g., in tests), leading to duplicate suggestions. The fix adds deduplication logic, though contains relies on reference equality for these objects, which may not always work as expected.

Low
Fix ambiguous field name after stats aggregation

The count field referenced here is the result of stats count() by cluster_label, but
the actual field name generated by stats count() in PPL is count(), not count. This
will cause the filter to silently fail or produce an error. Use an alias in the
stats command to make the field name unambiguous.

cluster_demo_with_explanations.md [198]

-| where count > 100
+| stats count() as event_count by cluster_label
+| where event_count > 100
Suggestion importance[1-10]: 6

__

Why: The count field referenced in the where clause is indeed ambiguous - in PPL, stats count() produces a field named count(), not count. Using an alias like count() as event_count would make the query unambiguous and correct.

Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 14475e5

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c9ad1f2

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e473f7f

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f7cbe56

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 575202d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8e8ab9e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 71ad3bb

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit bd46b1e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3cd01e4

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 79f9b23.

PathLineSeverityDescription
scripts/docs_exporter/__pycache__/export_to_docs_website.cpython-314.pyc1mediumBinary compiled Python bytecode file committed to the repository. Its contents cannot be inspected in the diff. Python 3.14 is pre-release, making this unusual. Committing .pyc files is not standard practice and the file could contain arbitrary obfuscated logic not visible in code review.
test_cluster_output.java1lowAd-hoc scratch test file committed to the repository root (not under src/ or test/). Likely an accidentally committed developer artifact. No malicious code found in its content, but its presence is anomalous.
test_error_format.java1lowAd-hoc scratch test file committed to the repository root (not under src/ or test/). Likely an accidentally committed developer artifact. No malicious code found in its content, but its presence is anomalous.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 078dc07

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 940310d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 51952fb

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 283ebd9

- Add back ErrorReport handling in SQLQueryUtils.isFlintExtensionQuery()
- Fix API test expectations to use ErrorReport instead of SyntaxCheckException

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 79f9b23

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c6c83cc

Remove debug/temporary files that were accidentally committed:
- cluster_demo_data.json
- cluster_demo_with_explanations.md
- playground_urls.md
- test_cluster_output.java
- test_error_format.java
- export_to_docs_website.cpython-314.pyc

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@ritvibhatt ritvibhatt force-pushed the syntax-exception-error-message branch from c6c83cc to 1f9b08b Compare April 28, 2026 16:24
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1f9b08b

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c88bd44

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2f16563

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f02c2a9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant