feat(script): add SCRIPT KILL and lua-time-limit support#3505
Conversation
This commit merges all changes of the current dev branch compared to unstable into a single commit. It includes implementation of the lua-time-limit configuration option, SCRIPT KILL command to safely interrupt long-running scripts, connection guard optimizations, worker event loop recursive event handling, and macOS TCP listener socket sharing.
There was a problem hiding this comment.
Pull request overview
Adds Redis-compatible SCRIPT KILL command and a lua-time-limit configuration to interrupt long-running Lua scripts. The implementation registers a Lua count hook that detects timeouts and disconnected clients, tracks running script contexts on the Server, lets SCRIPT KILL/SHUTDOWN bypass concurrency locks, and (on macOS) shares listener fds across workers via dup() so idle workers can still accept connections while a worker is blocked in a script. The PR also enables LUAJIT_ENABLE_CHECKHOOK so hooks fire from JIT-compiled code and adds Worker::PollEventLoop() to drain output buffers (TLS-aware) during long script execution.
Changes:
- Add
ScriptRunCtxGuard+LuaMaskCountHookto enforce timeout/kill semantics, withServer-side running-script registry andScriptKill()returning NOTBUSY/UNKILLABLE/OK. - Intercept commands with a BUSY reply when a script timed out and bypass concurrency guards for
SCRIPT KILL/SHUTDOWN; addWorker::PollEventLoopfor periodic event/output flushing, including TLS handling. - macOS: share listener fds across workers and pass
-DLUAJIT_ENABLE_CHECKHOOK/-Wl,-no_deduplicatebuild flags; add Go tests and sanitize test name slashes for temp dirs.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/storage/scripting.h | Adds LuaMaskCountHook decl and timing/kill fields to ScriptRunCtx. |
| src/storage/scripting.cc | Implements ScriptRunCtxGuard, LuaMaskCountHook, KillScript, write-dirty tracking, and wires guard into EvalGenericCommand/FunctionCall. |
| src/server/server.h / server.cc | Adds running-script registry, ScriptKill, BUSY flag, kills scripts on shutdown, and shares macOS listener fds when constructing/adjusting workers. |
| src/server/worker.h / worker.cc | New ctor taking shared TCP listen fds, tracks tcp_listen_fds_, adds PollEventLoop with TLS-aware flushing. |
| src/server/redis_connection.cc | Async-close handling while command runs, OnRead reentrancy guard, BUSY interception, and lock bypass for SCRIPT KILL/SHUTDOWN. |
| src/commands/cmd_script.cc | Adds SCRIPT KILL subcommand and allows it on replicas. |
| src/config/config.h / config.cc | Adds mutable lua-time-limit integer config. |
| kvrocks.conf | Documents lua-time-limit. |
| CMakeLists.txt | Adds Apple+LuaJIT linker flag and ENABLE_LUAJIT compile definition. |
| cmake/luajit.cmake | Switches to XCFLAGS, adds LUAJIT_ENABLE_CHECKHOOK, preserves LuaJIT's own CFLAGS. |
| tests/gocase/unit/scripting/scripting_test.go | Adds SCRIPT KILL, BUSY, UNKILLABLE, and disconnected-client tests. |
| tests/gocase/util/server.go | Sanitizes / in t.Name() for the temp directory pattern. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bool is_script_kill = (util::EqualICase(cmd_tokens.front(), "script") && cmd_tokens.size() >= 2 && | ||
| util::EqualICase(cmd_tokens[1], "kill")); | ||
| bool is_shutdown = util::EqualICase(cmd_tokens.front(), "shutdown"); | ||
|
|
||
| if (srv_->IsScriptTimedOut()) { | ||
| if (!is_script_kill && !is_shutdown) { | ||
| Reply(redis::Error({Status::RedisErrorNoPrefix, | ||
| "BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE."})); | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
Fixed. I've tightened the matching condition to check for exactly two tokens (cmd_tokens.size() == 2) for SCRIPT KILL commands. Furthermore, I moved the IsScriptTimedOut() check to occur after the authentication and namespace checks, but still before acquiring any concurrency/exclusivity guards, preventing unauthenticated clients from bypassing the BUSY check.
| } else { | ||
| auto *output = bufferevent_get_output(bev); | ||
| if (evbuffer_get_length(output) > 0) { | ||
| evbuffer_write(output, conn->GetFD()); | ||
| } |
There was a problem hiding this comment.
Fixed. I've updated the logic to check the return value of evbuffer_write and log non-recoverable errors (i.e. those other than EAGAIN, EWOULDBLOCK, and EINTR) to prevent silent failures. Bypassing write watermarks, callbacks, and rate limiting is acceptable here because client bufferevents in Kvrocks do not utilize these features.
| }; | ||
|
|
||
| static void KillScript(lua_State *lua) { | ||
| lua_sethook(lua, LuaMaskCountHook, LUA_MASKLINE, 0); |
There was a problem hiding this comment.
Clarified with comments. We keep the LUA_MASKLINE mask active on the hook so that if a script attempts to catch the raised error using pcall or xpcall to continue running, the hook will immediately trigger again on the next line and re-raise the error. This ensures the script is terminated. Added inline documentation to clarify this intent.
| bool Server::IsScriptTimedOut() const { return is_script_timeout_.load(std::memory_order_relaxed); } | ||
|
|
There was a problem hiding this comment.
Fixed. Added ReevaluateScriptTimeout() method to Server and registered a configuration callback for lua-time-limit in config.cc. When lua-time-limit is changed via CONFIG SET, ReevaluateScriptTimeout() is called to dynamically update is_script_timeout_. Also adjusted the default value of lua-time-limit to 5000 in both config.cc and kvrocks.conf.
|
Hi @xtco3o Thanks for your PR. Please read our AI guidelines at https://kvrocks.apache.org/community/contributing#guidelines-for-ai-assisted-contributions. This is a highly technical PR. I’ll need to study it carefully. |
|
|
||
| if (should_poll) { | ||
| auto *worker = script_run_ctx->conn->Owner(); | ||
| worker->PollEventLoop(); |
There was a problem hiding this comment.
I find this poll a bit strange. Are you sure it won’t introduce any security issues?
References:
How to Configure Redis lua-time-limit for Script Safety
https://oneuptime.com/blog/post/2026-03-31-redis-lua-time-limit-script-safety/view
This PR adds support for the lua-time-limit configuration option and the SCRIPT KILL command to interrupt long-running Lua scripts.
Changes and Technical Rationale:
Lua Hook with JIT Execution
Lock Bypass for SCRIPT KILL
macOS Listener Socket Sharing
Lock-Free Fast Path for Timeout Check
SSL Output Flushing in Event Loop
Tests