[IOTDB-17179] CLI: support automatic reconnection when connection is lost#17181
[IOTDB-17179] CLI: support automatic reconnection when connection is lost#17181miantalha45 wants to merge 2 commits intoapache:masterfrom
Conversation
- Detect connection-related SQLExceptions (refused, timeout, closed, reset, etc.) - In AbstractCli: rethrow connection-related SQLException from executeQuery, setTimeZone, showTimeZone so CLI can handle them - In Cli: on connection loss, close current connection, retry reconnect up to 3 times with 1s interval, then retry the failed command; print 'Connection lost. Reconnected. Retrying command.' on success; exit with clear message after all retries fail - Add isConnectionRelated() and matchesConnectionFailure() in AbstractCli for shared detection; openConnection(), setupConnection(), closeConnectionQuietly() and ReadLineResult in Cli for reconnect flow - Update AbstractCliTest to declare throws SQLException for handleInputCmd calls Co-authored-by: Cursor <cursoragent@cursor.com>
When reconnect succeeds but the retried command fails with a session/statement error (e.g. StatementId doesn't exist in this session), show a friendly message instead of the raw exception. Apply the same handling in AbstractCli.executeQuery so the message is shown both during reconnect-retry and when the user runs the next command. Add isSessionOrStatement
There was a problem hiding this comment.
Pull request overview
Adds automatic reconnection + retry behavior to the IoTDB CLI interactive session when connection-related failures occur, aligning CLI behavior with other IoTDB clients.
Changes:
- Introduces shared connection-failure/session-state error detection in
AbstractCliand propagates connection-relatedSQLExceptions to the main loop. - Refactors
Cli.receiveCommands()/readerReadLine()to attempt reconnect (bounded retries + backoff) and retry the failed input line. - Updates
AbstractCliTestfor the newthrows SQLExceptionmethod signatures.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
iotdb-client/cli/src/main/java/org/apache/iotdb/cli/Cli.java |
Implements reconnect/retry loop, connection open/setup/close helpers, and new read-loop result signaling. |
iotdb-client/cli/src/main/java/org/apache/iotdb/cli/AbstractCli.java |
Adds connection/session error classifiers and changes command execution to rethrow connection-related SQLExceptions. |
iotdb-client/cli/src/test/java/org/apache/iotdb/cli/AbstractCliTest.java |
Adjusts unit test compilation due to updated method signatures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try { | ||
| connection = openConnection(); | ||
| setupConnection(connection); | ||
| ctx.getPrinter().println("Connection lost. Reconnected. Retrying command."); | ||
| processCommand(ctx, result.failedCommand, connection); | ||
| reconnected = true; | ||
| break; |
There was a problem hiding this comment.
The reconnect retry try { connection = openConnection(); setupConnection(connection); ... } can throw TException from setupConnection(), but the retry loop only catches SQLException. This means some connection-loss cases can bypass the reconnect logic and bubble out immediately. Suggestion: include TException (or a common supertype) in the reconnect-attempt catch and handle it as a reconnect failure (with the same retry/backoff), while still distinguishing non-connection SQL errors from command execution.
| if (isSessionOrStatementError(e)) { | ||
| ctx.getPrinter() | ||
| .println( | ||
| "Reconnected, but the previous command could not be completed. Please run your command again."); |
There was a problem hiding this comment.
executeQuery prints "Reconnected, but the previous command could not be completed..." for session/statement-state errors, but this method cannot know whether a reconnect actually happened. If the same server-side error occurs without any reconnect, this message is misleading. Suggestion: make the message not assume reconnection (e.g., "Session state was reset..."), or only print this in the reconnect-and-retry path where reconnection is known to have occurred.
| "Reconnected, but the previous command could not be completed. Please run your command again."); | |
| "Session state was reset and the previous command could not be completed. Please run your command again."); |
| } catch (SQLException e) { | ||
| if (isConnectionRelated(e)) { | ||
| return ReadLineResult.reconnectAndRetry(s); | ||
| } |
There was a problem hiding this comment.
The reconnection trigger/loop is a significant behavior change but there are no unit tests exercising it (e.g., simulating a connection-related SQLException from processCommand, verifying retry count/backoff, and ensuring non-connection SQLExceptions do not trigger reconnect or an incorrect "Could not reconnect" exit). Consider adding focused tests around this new flow to prevent regressions.
| ctx.getPrinter().println("Connection lost. Reconnected. Retrying command."); | ||
| processCommand(ctx, result.failedCommand, connection); | ||
| reconnected = true; | ||
| break; |
There was a problem hiding this comment.
In the reconnect-and-retry path, the return value of processCommand(...) is ignored. If the retried input contains an exit/quit statement (or otherwise returns false), the CLI will continue the main loop instead of stopping as it normally would. Suggestion: capture the boolean return from processCommand and convert it into a result.stop / break out of the outer loop when it indicates the user requested exit.
| } catch (SQLException e) { | ||
| if (isSessionOrStatementError(e)) { | ||
| // Reconnect succeeded but retry failed due to session/statement state; ask user to | ||
| // run the command again. | ||
| ctx.getPrinter() | ||
| .println( | ||
| "Reconnected, but the previous command could not be completed. Please run your command again."); | ||
| reconnected = true; | ||
| break; | ||
| } | ||
| if (attempt == RECONNECT_RETRY_NUM) { | ||
| ctx.getErr() | ||
| .printf( | ||
| "%s: Could not reconnect after %d attempts. Please check that the server is running and try again.%n", | ||
| IOTDB_ERROR_PREFIX, RECONNECT_RETRY_NUM); | ||
| ctx.exit(CODE_ERROR); | ||
| } |
There was a problem hiding this comment.
In the reconnect loop, any SQLException thrown while retrying the failed command is treated as a reconnect failure. This can incorrectly trigger further reconnect attempts (and eventually exit with "Could not reconnect") for non-connection SQL errors (e.g., syntax/permission errors) that happen after the reconnect succeeds. Suggestion: only retry when isConnectionRelated(e) is true; otherwise report the SQL error from the retried command and continue the main loop (or stop) without attempting further reconnects.
| try { | ||
| connection = openConnection(); | ||
| setupConnection(connection); | ||
| ctx.getPrinter().println("Connection lost. Reconnected. Retrying command."); | ||
| processCommand(ctx, result.failedCommand, connection); | ||
| reconnected = true; | ||
| break; | ||
| } catch (SQLException e) { | ||
| if (isSessionOrStatementError(e)) { | ||
| // Reconnect succeeded but retry failed due to session/statement state; ask user to | ||
| // run the command again. | ||
| ctx.getPrinter() | ||
| .println( | ||
| "Reconnected, but the previous command could not be completed. Please run your command again."); | ||
| reconnected = true; | ||
| break; | ||
| } | ||
| if (attempt == RECONNECT_RETRY_NUM) { | ||
| ctx.getErr() | ||
| .printf( | ||
| "%s: Could not reconnect after %d attempts. Please check that the server is running and try again.%n", | ||
| IOTDB_ERROR_PREFIX, RECONNECT_RETRY_NUM); | ||
| ctx.exit(CODE_ERROR); | ||
| } | ||
| } |
There was a problem hiding this comment.
On reconnect attempts, if openConnection() succeeds but setupConnection() or processCommand() throws, the newly created connection is not closed before the next attempt (the variable is overwritten on the next iteration). This can leak sockets/sessions across retries. Suggestion: in the catch path, call closeConnectionQuietly(connection) and set connection = null before continuing to the next attempt.
| private static void setupConnection(IoTDBConnection connection) | ||
| throws java.sql.SQLException, org.apache.thrift.TException { | ||
| connection.setQueryTimeout(queryTimeout); | ||
| properties = connection.getServerProperties(); |
There was a problem hiding this comment.
setupConnection() appends to AGGREGRATE_TIME_LIST every time a reconnect happens, but the list is static and never cleared/deduped. During repeated reconnects this will grow unbounded and may introduce duplicate entries. Suggestion: clear the list before adding, or change it to a Set (if ordering is not required).
| properties = connection.getServerProperties(); | |
| properties = connection.getServerProperties(); | |
| AGGREGRATE_TIME_LIST.clear(); |
| s = ctx.getLineReader().readLine(cliPrefix + "> ", null); | ||
| boolean continues = processCommand(ctx, s, connection); | ||
| if (!continues) { | ||
| return true; | ||
| try { | ||
| boolean continues = processCommand(ctx, s, connection); | ||
| if (!continues) { | ||
| return ReadLineResult.stopLoop(); | ||
| } | ||
| } catch (SQLException e) { | ||
| if (isConnectionRelated(e)) { | ||
| return ReadLineResult.reconnectAndRetry(s); | ||
| } |
There was a problem hiding this comment.
When a connection-related SQLException occurs, the CLI retries the entire raw input line s. If the line contains multiple statements separated by ;, some statements may already have executed successfully before the failure, and retrying the whole line can re-run those statements (duplicate writes / side effects). Suggestion: either only retry the specific statement that failed (track progress inside processCommand), or disable auto-retry for multi-statement input and ask the user to rerun manually.
| closeConnectionQuietly(connection); | ||
| connection = null; | ||
| boolean reconnected = false; | ||
| for (int attempt = 1; attempt <= RECONNECT_RETRY_NUM; attempt++) { |
There was a problem hiding this comment.
Test is always true, because of this condition.
Test is always true, because of this condition.
|
@miantalha45 Copilot has identified some interesting issues, and there are conflicts with the current PR. |
|
Sure. I am on it |
Description
Add automatic reconnection to the IoTDB CLI when the connection to the server is lost during an interactive session (e.g. server restart, network blip, or idle timeout). The CLI no longer exits immediately on connection-related errors; it attempts to reconnect with the same parameters and retries the failed command, aligning behavior with the Session API, JDBC, and C++/Python clients.
Content1 — Detection and reconnection flow
SQLException. We treat an exception as connection-related if its message (or cause message, lowercased) contains any of:connection,refused,timeout,closed,reset,network,broken pipe. This logic lives inAbstractCli.isConnectionRelated(SQLException)andmatchesConnectionFailure(String)so it can be shared and reused.DriverManager.getConnectionand the existinginfoproperties. Helper methodsopenConnection(),setupConnection(), andcloseConnectionQuietly()inCliencapsulate open/setup/close so the main loop stays clear.RECONNECT_RETRY_NUMandRECONNECT_RETRY_INTERVAL_MSinClicontrol this; they are not yet user-configurable.Connection lost. Reconnected. Retrying command.If all reconnection attempts fail we print:IoTDB: Could not reconnect after 3 attempts. Please check that the server is running and try again.and exit with error code.Content2 — Class and method organization
isConnectionRelated(SQLException)(package-private static) andmatchesConnectionFailure(String)(private static) for shared detection. InexecuteQuery,setTimeZone, andshowTimeZone, we catchSQLException(orExceptionwhere the API does not throwSQLException) and rethrow whenisConnectionRelated(e); otherwise we keep the existing "print error and return error code" behavior.handleInputCmdandprocessCommandnow declarethrows SQLExceptionso connection failures propagate to the CLI loop instead of being swallowed.ReadLineResult(inner class withstop,failedCommand) and factory methodscontinueLoop(),stopLoop(),reconnectAndRetry(String)so the read-eval loop can signal "continue", "exit", or "reconnect and retry this command".receiveCommands()no longer uses try-with-resources for the connection; it holds the connection in a variable, and whenreaderReadLine()returns a result withfailedCommand != null, it runs the reconnect loop (close → retry open/setup → print message → retry command).readerReadLine()wrapsprocessCommand()in a try-catch; on connection-relatedSQLExceptionit returnsreconnectAndRetry(s)with the current line; on otherSQLExceptionit prints and returnsstopLoop().testHandleInputInputCmd()now declaresthrows SQLExceptionand importsjava.sql.SQLExceptionso it compiles with the updatedhandleInputCmdsignature.Content3 — Corner cases and alternatives
reconnectAndRetryand run the same reconnect/retry flow again (each time with up to 3 reconnect attempts). Non-connectionSQLExceptions still print the error and stop the loop (exit) as before. Interrupt and EOF handling inreaderReadLine()are unchanged.isSessionOrStatementError()in AbstractCli and show: "Reconnected, but the previous command could not be completed. Please run your command again." so the user is not shown the raw exception. This handling is applied both in the reconnect-retry path in Cli and in AbstractCli.executeQuery for the normal command path.This PR has:
Key changed/added classes (or packages if there are too many classes) in this PR
org.apache.iotdb.cli.AbstractCli—isConnectionRelated,matchesConnectionFailure,isSessionOrStatementError,matchesSessionOrStatementFailure; rethrow connection-relatedSQLExceptioninexecuteQuery,setTimeZone,showTimeZone; inexecuteQueryshow friendly message for session/statement errors;handleInputCmd,processCommandnowthrows SQLExceptionorg.apache.iotdb.cli.Cli—ReadLineResult,openConnection(),setupConnection(),closeConnectionQuietly(); refactoredreceiveCommands()andreaderReadLine()for reconnect-and-retry flow; handle session/statement error in reconnect-retry pathorg.apache.iotdb.cli.AbstractCliTest—testHandleInputInputCmd()updated forthrows SQLExceptionCloses #17179