Skip to content

feat: dataset error list#7084

Merged
c121914yu merged 1 commit into
labring:mainfrom
shortlight5980:error_list
Jun 16, 2026
Merged

feat: dataset error list#7084
c121914yu merged 1 commit into
labring:mainfrom
shortlight5980:error_list

Conversation

@shortlight5980

Copy link
Copy Markdown
Collaborator

背景

数据集训练异常目前主要按集合查看,知识库级别缺少统一入口;同时训练中的临时失败、最终异常、阻塞异常状态口径不够清晰,容易导致用户不知道哪些任务还会自动重试,哪些需要手动处理。

改动内容

  • 新增知识库级训练异常列表接口,支持按 collection 分组查看最终/阻塞异常,并支持集合内异常 chunk 加载更多。
  • 新增前端异常列表弹窗,支持:
    • 知识库级查看异常集合。
    • 集合级查看异常 chunk。
    • 单条重试、编辑后重训、删除异常训练数据。
    • 批量重试 collection/dataset 范围内最终异常。
  • 优化余额不足锁队列逻辑:
    • worker 已领取且 retryCount 被扣到 0 的当前任务,也会被标记为阻塞异常,避免从异常列表中消失。
  • 补充相关 OpenAPI schema、i18n、状态工具函数和单元测试。

测试

pnpm --filter @fastgpt/service exec vitest run -c vitest.config.ts test/core/dataset/training/controller.test.ts test/core/dataset/training/query.test.ts

pnpm --filter @fastgpt/app exec vitest run -c vitest.config.ts test/api/core/dataset/training/getDatasetTrainingError.test.ts test/api/core/dataset/collection/trainingStatus.test.ts test/api/core/dataset/training/updateTrainingData.test.ts test/pages/api/core/dataset/training/updateTrainingData.test.ts

git diff --check

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 14.8% 1221 / 8250
🔵 Statements 14.75% 1277 / 8652
🔵 Functions 12.95% 256 / 1976
🔵 Branches 12.43% 564 / 4536
File CoverageNo changed files found.
Generated in workflow #851 for commit f9d9669 by the Vitest Coverage Report Action

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Admin Preview Image Ready!

ghcr.io/labring/fastgpt-pr:admin_f9d9669eff083ad40f66838b36ee5ded276f3d7c

🕒 Time: 2026-06-16 12:22:39 (UTC+8)

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Build Successful - Preview fastgpt Image for this PR:

ghcr.io/labring/fastgpt-pr:fastgpt_f9d9669eff083ad40f66838b36ee5ded276f3d7c

🕒 Time: 2026-06-16 12:21:50 (UTC+8)

@FinleyGe

Copy link
Copy Markdown
Collaborator

Review finding

The new dataset-level training error endpoint should validate and cap its nested pagination inputs before passing them into Mongo aggregation.

  • GetDatasetTrainingErrorBodySchema accepts itemOffset / itemPageSize as any string or number, but the handler converts them with Number() and sends the result into $skip / $limit.
  • Invalid strings can become NaN and surface as a Mongo 500 instead of a request validation error.
  • Very large pageSize / itemPageSize values can amplify the dataset-level path into many concurrent aggregate calls plus very large responses, because the handler runs one item aggregate per returned collection group.

Suggested fix: use z.coerce.number().int().min(...).max(...) for itemOffset and itemPageSize, clamp the parsed pageSize in the handler, and add a negative API test for invalid/oversized pagination values.

Refs:

  • packages/global/openapi/core/dataset/training/api.ts
  • projects/app/src/pages/api/core/dataset/training/getDatasetTrainingError.ts

Comment thread projects/app/src/pages/api/core/dataset/training/getDatasetTrainingError.ts Outdated
if (hasTrainingData) {
getData(pageNum);
}
await refreshDatasetTrainingError().catch(() => undefined);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This polling path now calls the full dataset training-error aggregation every 6 seconds even when the collection list is otherwise idle. getDatasetTrainingError({ pageSize: 1, itemPageSize: 1 }) still runs the grouped error aggregate plus the total aggregate, and when errors exist it also fetches collection metadata and first error items. Consider replacing this badge probe with a lighter exists/count endpoint or only refreshing it when there is active training / dataset status is not active / a user action just changed training state.

}
}
},
{ $sort: { modeRank: 1, chunkIndex: 1, _id: 1 } },

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error-list pagination sorts on computed modeRank, so the new { teamId, datasetId, collectionId } index can only narrow the match; it cannot support the sort. For collections with many final-error chunks, each page has to compute the $switch, sort the matched set in the aggregation, then skip/limit. Longer term, consider materializing an error status / mode rank field and indexing the list order, or otherwise avoid computed-sort pagination on this hot path.

FinleyGe
FinleyGe previously approved these changes Jun 11, 2026
}
// Check team points and lock(No mistakes will be thrown here)
if (!(await checkTeamAiPointsAndLock(data.teamId))) {
if (!(await checkTeamAiPointsAndLock(data.teamId, String(data._id)))) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥要 dataId

export default NextAPI(handler);

export { handler };
export type updateTrainingDataBody = UpdateTrainingDataBody;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新的 api,不会再导出 type 了,都在 openapi 里写 zod

}

export default NextAPI(handler);
export type getTrainingDataDetailBody = GetTrainingDataDetailBody;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个也是

trainingAmount: z.number().meta({ description: '训练数量' }),
hasError: z.boolean().optional().meta({ description: '是否错误' })
});
export const DatasetCollectionsListItemSchema = z

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注意测旧的接口不要报错,要兼容

fileCustom = 'fileCustom'
}

export const BLOCKED_LOCK_TIME = new Date('2050-01-01');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也丢到 query.ts 里

@c121914yu c121914yu merged commit b3ba5de into labring:main Jun 16, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants