feat: dataset error list by shortlight5980 · Pull Request #7084 · labring/FastGPT

shortlight5980 · 2026-06-09T08:52:25Z

背景

数据集训练异常目前主要按集合查看，知识库级别缺少统一入口；同时训练中的临时失败、最终异常、阻塞异常状态口径不够清晰，容易导致用户不知道哪些任务还会自动重试，哪些需要手动处理。

改动内容

新增知识库级训练异常列表接口，支持按 collection 分组查看最终/阻塞异常，并支持集合内异常 chunk 加载更多。
新增前端异常列表弹窗，支持：
- 知识库级查看异常集合。
- 集合级查看异常 chunk。
- 单条重试、编辑后重训、删除异常训练数据。
- 批量重试 collection/dataset 范围内最终异常。
优化余额不足锁队列逻辑：
- worker 已领取且 retryCount 被扣到 0 的当前任务，也会被标记为阻塞异常，避免从异常列表中消失。
补充相关 OpenAPI schema、i18n、状态工具函数和单元测试。

测试

pnpm --filter @fastgpt/service exec vitest run -c vitest.config.ts test/core/dataset/training/controller.test.ts test/core/dataset/training/query.test.ts

pnpm --filter @fastgpt/app exec vitest run -c vitest.config.ts test/api/core/dataset/training/getDatasetTrainingError.test.ts test/api/core/dataset/collection/trainingStatus.test.ts test/api/core/dataset/training/updateTrainingData.test.ts test/pages/api/core/dataset/training/updateTrainingData.test.ts

git diff --check

github-actions · 2026-06-09T08:55:33Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	14.8%	1221 / 8250
🔵	Statements	14.75%	1277 / 8652
🔵	Functions	12.95%	256 / 1976
🔵	Branches	12.43%	564 / 4536

File Coverage

No changed files found.

Generated in workflow #851 for commit f9d9669 by the Vitest Coverage Report Action

github-actions · 2026-06-09T09:02:47Z

✅ Admin Preview Image Ready!

ghcr.io/labring/fastgpt-pr:admin_f9d9669eff083ad40f66838b36ee5ded276f3d7c

🕒 Time: 2026-06-16 12:22:39 (UTC+8)

github-actions · 2026-06-09T09:04:26Z

✅ Build Successful - Preview fastgpt Image for this PR:

ghcr.io/labring/fastgpt-pr:fastgpt_f9d9669eff083ad40f66838b36ee5ded276f3d7c

🕒 Time: 2026-06-16 12:21:50 (UTC+8)

FinleyGe · 2026-06-10T10:20:32Z

Review finding

The new dataset-level training error endpoint should validate and cap its nested pagination inputs before passing them into Mongo aggregation.

GetDatasetTrainingErrorBodySchema accepts itemOffset / itemPageSize as any string or number, but the handler converts them with Number() and sends the result into $skip / $limit.
Invalid strings can become NaN and surface as a Mongo 500 instead of a request validation error.
Very large pageSize / itemPageSize values can amplify the dataset-level path into many concurrent aggregate calls plus very large responses, because the handler runs one item aggregate per returned collection group.

Suggested fix: use z.coerce.number().int().min(...).max(...) for itemOffset and itemPageSize, clamp the parsed pageSize in the handler, and add a negative API test for invalid/oversized pagination values.

Refs:

packages/global/openapi/core/dataset/training/api.ts
projects/app/src/pages/api/core/dataset/training/getDatasetTrainingError.ts

FinleyGe · 2026-06-11T03:19:27Z

      if (hasTrainingData) {
        getData(pageNum);
      }
+      await refreshDatasetTrainingError().catch(() => undefined);


This polling path now calls the full dataset training-error aggregation every 6 seconds even when the collection list is otherwise idle. getDatasetTrainingError({ pageSize: 1, itemPageSize: 1 }) still runs the grouped error aggregate plus the total aggregate, and when errors exist it also fetches collection metadata and first error items. Consider replacing this badge probe with a lighter exists/count endpoint or only refreshing it when there is active training / dataset status is not active / a user action just changed training state.

FinleyGe · 2026-06-11T03:19:27Z

+        }
+      }
+    },
+    { $sort: { modeRank: 1, chunkIndex: 1, _id: 1 } },


This error-list pagination sorts on computed modeRank, so the new { teamId, datasetId, collectionId } index can only narrow the match; it cannot support the sort. For collections with many final-error chunks, each page has to compute the $switch, sort the matched set in the aggregation, then skip/limit. Longer term, consider materializing an error status / mode rank field and indexing the list order, or otherwise avoid computed-sort pagination on this hot path.

c121914yu · 2026-06-10T06:14:17Z

      }
      // Check team points and lock(No mistakes will be thrown here)
-      if (!(await checkTeamAiPointsAndLock(data.teamId))) {
+      if (!(await checkTeamAiPointsAndLock(data.teamId, String(data._id)))) {


为啥要 dataId

c121914yu · 2026-06-10T06:14:53Z

 export default NextAPI(handler);

 export { handler };
+export type updateTrainingDataBody = UpdateTrainingDataBody;


新的 api，不会再导出 type 了，都在 openapi 里写 zod

c121914yu · 2026-06-10T06:15:56Z

 }

 export default NextAPI(handler);
+export type getTrainingDataDetailBody = GetTrainingDataDetailBody;


这个也是

c121914yu · 2026-06-10T06:18:09Z

-  trainingAmount: z.number().meta({ description: '训练数量' }),
-  hasError: z.boolean().optional().meta({ description: '是否错误' })
-});
+export const DatasetCollectionsListItemSchema = z


注意测旧的接口不要报错，要兼容

c121914yu · 2026-06-10T06:19:20Z

  fileCustom = 'fileCustom'
 }
+
+export const BLOCKED_LOCK_TIME = new Date('2050-01-01');


也丢到 query.ts 里

pull-request-size Bot added the size/XXL label Jun 9, 2026

shortlight5980 force-pushed the error_list branch from f57e244 to 00753ec Compare June 10, 2026 09:49

FinleyGe self-assigned this Jun 10, 2026

FinleyGe reviewed Jun 10, 2026

View reviewed changes

Comment thread projects/app/src/pages/api/core/dataset/training/getDatasetTrainingError.ts Outdated

shortlight5980 force-pushed the error_list branch from 00753ec to 0688c9a Compare June 10, 2026 11:21

FinleyGe reviewed Jun 11, 2026

View reviewed changes

shortlight5980 force-pushed the error_list branch from 0688c9a to 8893a39 Compare June 11, 2026 07:03

FinleyGe previously approved these changes Jun 11, 2026

View reviewed changes

shortlight5980 dismissed FinleyGe’s stale review via de3cc27 June 12, 2026 13:18

shortlight5980 force-pushed the error_list branch 6 times, most recently from b7c5919 to f34379e Compare June 15, 2026 12:16

c121914yu reviewed Jun 16, 2026

View reviewed changes

feat: dataset error list

f9d9669

shortlight5980 force-pushed the error_list branch from f34379e to f9d9669 Compare June 16, 2026 04:11

c121914yu merged commit b3ba5de into labring:main Jun 16, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: dataset error list#7084

feat: dataset error list#7084
c121914yu merged 1 commit into
labring:mainfrom
shortlight5980:error_list

shortlight5980 commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

FinleyGe commented Jun 10, 2026

Uh oh!

Uh oh!

FinleyGe Jun 11, 2026

Uh oh!

FinleyGe Jun 11, 2026

Uh oh!

c121914yu Jun 10, 2026

Uh oh!

c121914yu Jun 10, 2026

Uh oh!

c121914yu Jun 10, 2026

Uh oh!

c121914yu Jun 10, 2026

Uh oh!

c121914yu Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shortlight5980 commented Jun 9, 2026

背景

改动内容

测试

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FinleyGe commented Jun 10, 2026

Uh oh!

Uh oh!

FinleyGe Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

FinleyGe Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

c121914yu Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

c121914yu Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

c121914yu Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

c121914yu Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

c121914yu Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading