Skip to content

fix(primaryStorage): rollback persisted records on controller build failure#3868

Open
zstack-robot-1 wants to merge 1 commit into5.5.6from
sync/jin.ma/fix/ZSTAC-84817
Open

fix(primaryStorage): rollback persisted records on controller build failure#3868
zstack-robot-1 wants to merge 1 commit into5.5.6from
sync/jin.ma/fix/ZSTAC-84817

Conversation

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

Summary

AddExternalPrimaryStorage with malformed JSON config left dirty rows in DB, breaking PrimaryStorage service permanently (QueryPrimaryStorage 503).

Root Cause

ExternalPrimaryStorageFactory.createPrimaryStorage persists ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO before invoking saveControllerIfNeed → buildControllerSvc → ZbsStorageController.reloadDbInfo → JSONObjectUtil.toObject. A RuntimeException from JSON parsing left those rows persisted with no rollback. The dirty VO then made every subsequent buildPsController() throw the same parse error, so the PrimaryStorage service stayed unhealthy and QueryPrimaryStorage kept returning 503.

Changes

  • storage/ExternalPrimaryStorageFactory.java: wrap saveControllerIfNeed in try/catch; on Throwable, dbf.remove(ref) + dbf.remove(lvo), clear controllers/nodes map entries, then rethrow.
  • test/.../ZbsPrimaryStorageCase.groovy: new SubCase testAddExternalPrimaryStorageWithMalformedJsonShouldRollback asserts no leftover rows after malformed-JSON Add and that QueryPrimaryStorage still works.

Testing

  • mvn compile -pl storage -am -Dmaven.test.skip passes locally.
  • CI: ZbsPrimaryStorageCase runs the new SubCase.
  • Manual: tested via reporter on MN 172.24.227.139.

Resolves: ZSTAC-84817

sync from gitlab !9743

… failure

When AddExternalPrimaryStorage receives an invalid JSON config, ExternalPrimaryStorageFactory.createPrimaryStorage persisted ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO before invoking saveControllerIfNeed -> buildControllerSvc. An exception thrown during JSON parsing left those rows in the DB. The dirty VO then made buildPsController fail every time, breaking the PS service so QueryPrimaryStorage returned 503 permanently. Wrap saveControllerIfNeed in try/catch and remove the persisted records before rethrowing.

Resolves: ZSTAC-84817

Change-Id: Icf6c648133d7866edf35940d56a28f74f4c64817
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

概览

在外部主存储工厂的控制器初始化过程中添加了错误处理机制。当控制器构建失败时,捕获异常、清理已持久化的数据库记录、清除内存缓存,然后重新抛出异常。同时添加了集成测试以验证此失败场景的正确行为和数据库回滚。

变更

内聚 / 文件 总结
外部主存储工厂错误处理
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
createPrimaryStorage 方法中添加 try/catch (Throwable) 块来捕获控制器保存失败;执行清理操作(删除 PrimaryStorageOutputProtocolRefVOExternalPrimaryStorageVO 记录);清除内存中的控制器和节点缓存;记录清理失败并重新抛出原始异常。
ZBS主存储集成测试
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
新增集成测试用例,验证使用格式错误的JSON配置调用 addExternalPrimaryStorage 时的异常处理;确认数据库事务正确回滚(验证 ExternalPrimaryStorageVO 记录未被创建);确保 queryPrimaryStorage 查询功能仍可正常工作。

预估代码审查工作量

🎯 2 (简单) | ⏱️ ~12 分钟

诗篇

🐰 错误已捉,数据要清,
缓存要疏,异常重抛,
测试来验,事务回滚,
存储稳健,兔子欢呼!🎉


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error PR标题格式正确,遵循'[scope]: '格式,但长度为75字符,超过了72字符的要求。 请将标题缩短至72字符以内,例如:'fix(primaryStorage): rollback records on controller build failure'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed PR描述详细说明了根本原因、具体改动、测试情况和关联问题号,与代码变更内容完全相关。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/jin.ma/fix/ZSTAC-84817

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 729-753: Replace the Chinese comment block above the test method
testAddExternalPrimaryStorageWithMalformedJsonShouldRollback with an English
comment that explains: this test verifies AddExternalPrimaryStorage rolls back
when config contains malformed JSON (JSONObjectUtil.toObject throws), and no
ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
records remain to avoid dirty VO causing buildPsController failures and
QueryPrimaryStorage 503s; update any inline Chinese in the same comment region
to concise, correct English while keeping references to ExternalPrimaryStorageVO
and the failure expectations intact.
- Around line 733-755: The test only checks ExternalPrimaryStorageVO but misses
asserting rollback for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO;
capture counts for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO (using
Q.New(PrimaryStorageVO.class).count() and
Q.New(PrimaryStorageOutputProtocolRefVO.class).count()) before the failing
addExternalPrimaryStorage call, then after the expect(AssertionError.class)
block assert those counts are unchanged and also assert no rows exist with name
"zbs-bad-json" in PrimaryStorageVO and no protocol refs tied to that primary
storage in PrimaryStorageOutputProtocolRefVO, so failures don't leave residual
rows across those related tables.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 1e13331a-5bab-4deb-bbc0-e3ac9ea06519

📥 Commits

Reviewing files that changed from the base of the PR and between cf48ab8 and f0dc8d3.

📒 Files selected for processing (2)
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy

Comment on lines +323 to +337
try {
saveControllerIfNeed(lvo);
} catch (Throwable t) {
logger.warn(String.format("failed to build controller for primary storage[uuid:%s, identity:%s], rolling back persisted records",
lvo.getUuid(), identity), t);
try {
dbf.remove(ref);
dbf.remove(lvo);
} catch (Throwable cleanupEx) {
logger.warn(String.format("failed to roll back persisted records for primary storage[uuid:%s]", lvo.getUuid()), cleanupEx);
}
controllers.remove(lvo.getUuid());
nodes.remove(lvo.getUuid());
throw t;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

这次补偿只覆盖“当前新增”的脏记录,修复前已经落库的坏数据仍会继续打挂服务。

Line 324 失败时这里只回滚本次 createPrimaryStorage() 刚写入的记录,但 buildPsController()(Line 180-187)和 nodeLeft() 触发的重建路径(Line 1077)仍会直接消费库里已有的 ExternalPrimaryStorageVO。如果现场在修复前已经因为同类 malformed JSON 留下过脏记录,升级后启动/切主时还是会在这些路径上反复抛异常,QueryPrimaryStorage 依旧可能不可用。建议补一个启动期的隔离/跳过坏记录,或者增加显式的数据修复逻辑。

Comment on lines +729 to +753
// AddExternalPrimaryStorage 收到非法 JSON config 导致 buildControllerSvc 抛异常时,
// ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO 不应残留在 DB 中。
// 否则 buildPsController 会被脏 VO 持续打挂,QueryPrimaryStorage 永久 503。
void testAddExternalPrimaryStorageWithMalformedJsonShouldRollback() {
long psCountBefore = Q.New(ExternalPrimaryStorageVO.class).count()

// malformed JSON — JSONObjectUtil.toObject(config, Config.class) 会抛 RuntimeException
expect(AssertionError.class) {
addExternalPrimaryStorage {
zoneUuid = zone.uuid
name = "zbs-bad-json"
identity = "zbs"
defaultOutputProtocol = "CBD"
config = "{this is not valid json"
url = ""
}
}

// 失败后不应在 DB 中遗留任何 zbs-bad-json 相关记录
assert Q.New(ExternalPrimaryStorageVO.class).count() == psCountBefore
assert !Q.New(ExternalPrimaryStorageVO.class)
.eq(ExternalPrimaryStorageVO_.name, "zbs-bad-json")
.isExists()

// 失败之后 QueryPrimaryStorage 仍可正常返回(不被脏数据打挂)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

请把新增注释改成英文。

Line 729-753 新增的是中文注释,不符合仓库规范。

可直接替换的英文注释
-    // AddExternalPrimaryStorage 收到非法 JSON config 导致 buildControllerSvc 抛异常时,
-    // ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO 不应残留在 DB 中。
-    // 否则 buildPsController 会被脏 VO 持续打挂,QueryPrimaryStorage 永久 503。
+    // When AddExternalPrimaryStorage receives malformed JSON config and buildControllerSvc throws,
+    // ExternalPrimaryStorageVO / PrimaryStorageVO / PrimaryStorageOutputProtocolRefVO must not remain in the database.
+    // Otherwise buildPsController keeps failing on the dirty VO and QueryPrimaryStorage can return 503 permanently.
@@
-        // malformed JSON — JSONObjectUtil.toObject(config, Config.class) 会抛 RuntimeException
+        // Malformed JSON: JSONObjectUtil.toObject(config, Config.class) should throw RuntimeException.
@@
-        // 失败后不应在 DB 中遗留任何 zbs-bad-json 相关记录
+        // No zbs-bad-json related records should remain in the database after the failure.
@@
-        // 失败之后 QueryPrimaryStorage 仍可正常返回(不被脏数据打挂)
+        // QueryPrimaryStorage should still return successfully after the failure.
As per coding guidelines `代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// AddExternalPrimaryStorage 收到非法 JSON config 导致 buildControllerSvc 抛异常时,
// ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO 不应残留在 DB 中。
// 否则 buildPsController 会被脏 VO 持续打挂,QueryPrimaryStorage 永久 503
void testAddExternalPrimaryStorageWithMalformedJsonShouldRollback() {
long psCountBefore = Q.New(ExternalPrimaryStorageVO.class).count()
// malformed JSONJSONObjectUtil.toObject(config, Config.class) 会抛 RuntimeException
expect(AssertionError.class) {
addExternalPrimaryStorage {
zoneUuid = zone.uuid
name = "zbs-bad-json"
identity = "zbs"
defaultOutputProtocol = "CBD"
config = "{this is not valid json"
url = ""
}
}
// 失败后不应在 DB 中遗留任何 zbs-bad-json 相关记录
assert Q.New(ExternalPrimaryStorageVO.class).count() == psCountBefore
assert !Q.New(ExternalPrimaryStorageVO.class)
.eq(ExternalPrimaryStorageVO_.name, "zbs-bad-json")
.isExists()
// 失败之后 QueryPrimaryStorage 仍可正常返回(不被脏数据打挂)
// When AddExternalPrimaryStorage receives malformed JSON config and buildControllerSvc throws,
// ExternalPrimaryStorageVO / PrimaryStorageVO / PrimaryStorageOutputProtocolRefVO must not remain in the database.
// Otherwise buildPsController keeps failing on the dirty VO and QueryPrimaryStorage can return 503 permanently.
void testAddExternalPrimaryStorageWithMalformedJsonShouldRollback() {
long psCountBefore = Q.New(ExternalPrimaryStorageVO.class).count()
// Malformed JSON: JSONObjectUtil.toObject(config, Config.class) should throw RuntimeException.
expect(AssertionError.class) {
addExternalPrimaryStorage {
zoneUuid = zone.uuid
name = "zbs-bad-json"
identity = "zbs"
defaultOutputProtocol = "CBD"
config = "{this is not valid json"
url = ""
}
}
// No zbs-bad-json related records should remain in the database after the failure.
assert Q.New(ExternalPrimaryStorageVO.class).count() == psCountBefore
assert !Q.New(ExternalPrimaryStorageVO.class)
.eq(ExternalPrimaryStorageVO_.name, "zbs-bad-json")
.isExists()
// QueryPrimaryStorage should still return successfully after the failure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`
around lines 729 - 753, Replace the Chinese comment block above the test method
testAddExternalPrimaryStorageWithMalformedJsonShouldRollback with an English
comment that explains: this test verifies AddExternalPrimaryStorage rolls back
when config contains malformed JSON (JSONObjectUtil.toObject throws), and no
ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
records remain to avoid dirty VO causing buildPsController failures and
QueryPrimaryStorage 503s; update any inline Chinese in the same comment region
to concise, correct English while keeping references to ExternalPrimaryStorageVO
and the failure expectations intact.

Comment on lines +733 to +755
long psCountBefore = Q.New(ExternalPrimaryStorageVO.class).count()

// malformed JSON — JSONObjectUtil.toObject(config, Config.class) 会抛 RuntimeException
expect(AssertionError.class) {
addExternalPrimaryStorage {
zoneUuid = zone.uuid
name = "zbs-bad-json"
identity = "zbs"
defaultOutputProtocol = "CBD"
config = "{this is not valid json"
url = ""
}
}

// 失败后不应在 DB 中遗留任何 zbs-bad-json 相关记录
assert Q.New(ExternalPrimaryStorageVO.class).count() == psCountBefore
assert !Q.New(ExternalPrimaryStorageVO.class)
.eq(ExternalPrimaryStorageVO_.name, "zbs-bad-json")
.isExists()

// 失败之后 QueryPrimaryStorage 仍可正常返回(不被脏数据打挂)
def psList = queryPrimaryStorage {} as List
assert psList != null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

这个回归用例还没有覆盖基表和协议映射表的回滚。

现在只校验了 ExternalPrimaryStorageVO。如果后面有人只删掉子表、遗漏 PrimaryStorageVOPrimaryStorageOutputProtocolRefVO,这个用例仍然会通过,但数据库里还是会残留脏数据。既然本次修复显式补偿了这些记录,建议把相关表的 count 前后也一起断言。

可参考的补充断言
 void testAddExternalPrimaryStorageWithMalformedJsonShouldRollback() {
     long psCountBefore = Q.New(ExternalPrimaryStorageVO.class).count()
+    long primaryStorageCountBefore = Q.New(org.zstack.header.storage.primary.PrimaryStorageVO.class).count()
+    long protocolRefCountBefore = Q.New(org.zstack.header.storage.primary.PrimaryStorageOutputProtocolRefVO.class).count()

     expect(AssertionError.class) {
         addExternalPrimaryStorage {
             zoneUuid = zone.uuid
             name = "zbs-bad-json"
@@
     }

     assert Q.New(ExternalPrimaryStorageVO.class).count() == psCountBefore
+    assert Q.New(org.zstack.header.storage.primary.PrimaryStorageVO.class).count() == primaryStorageCountBefore
+    assert Q.New(org.zstack.header.storage.primary.PrimaryStorageOutputProtocolRefVO.class).count() == protocolRefCountBefore
     assert !Q.New(ExternalPrimaryStorageVO.class)
             .eq(ExternalPrimaryStorageVO_.name, "zbs-bad-json")
             .isExists()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`
around lines 733 - 755, The test only checks ExternalPrimaryStorageVO but misses
asserting rollback for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO;
capture counts for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO (using
Q.New(PrimaryStorageVO.class).count() and
Q.New(PrimaryStorageOutputProtocolRefVO.class).count()) before the failing
addExternalPrimaryStorage call, then after the expect(AssertionError.class)
block assert those counts are unchanged and also assert no rows exist with name
"zbs-bad-json" in PrimaryStorageVO and no protocol refs tied to that primary
storage in PrimaryStorageOutputProtocolRefVO, so failures don't leave residual
rows across those related tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants