Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "autocode",
"version": "0.7.0",
"version": "0.8.0",
"description": "Claude Code plugin for competitive programming problem-setting workflows.",
"author": {
"name": "SummerOneTwo",
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.8.0] - 2026-04-28

### Improvements

- **最终测试数据配比约束**: `problem_generate_tests` 采样策略更新为优先保证最终测试集中 `type=3/4`(extreme + tle)不少于一半(候选不足时尽量满足),并返回 `limit_case_count`、`limit_case_minimum_required`、`limit_case_quota_met` 统计字段。
- **验证阶段硬约束**: `problem_verify_tests` 新增 `limit_ratio` 校验(默认启用),基于生成 manifest 强制检查最终测试中 `type=3/4` 是否达到至少一半,不满足将直接验证失败;可通过 `enable_limit_ratio=false` 显式关闭。
- **文档与工作流同步**: 更新 README、workflow skill、agent 提示与 prompts 文案,统一说明“最终测试至少一半极限数据”的质量门槛。

## [0.7.0] - 2026-04-27

### Features
Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ AutoCode/
5. 构建生成器 (`generator_build`)
6. 运行压力测试 (`stress_test_run`, completed_rounds == total_rounds)
7. 按需构建检查器 (`checker_build`, accuracy >= 0.9)
8. 生成测试数据 (`problem_generate_tests`, generated_test_count > 0)
8. 生成测试数据`problem_generate_tests`, generated_test_count > 0,且最终 extreme/tle 至少占一半;候选不足时尽量满足)
9. 验证测试数据 (`problem_verify_tests`, passed)
10. 打包 Polygon (`problem_pack_polygon`)

Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,8 @@ AutoCode 提供 15 个原子工具,分为 7 组。所有工具返回统一格
| 工具 | 描述 | 关键参数 |
|------|------|----------|
| `problem_create` | 初始化题目目录 | `problem_dir`, `problem_name` |
| `problem_generate_tests` | 生成最终测试数据 | `problem_dir`, `test_count` |
| `problem_generate_tests` | 生成最终测试数据(最终数据集中 extreme/tle 至少占一半,候选不足时尽量满足) | `problem_dir`, `test_count` |
| `problem_verify_tests` | 验证测试数据质量(含 extreme/tle 占比硬校验) | `problem_dir`, `tests_dir`, `verify_types` |
| `problem_pack_polygon` | 打包为 Polygon 格式 | `problem_dir`, `time_limit`, `memory_limit` |

## 工作流教程:A+B 问题
Expand Down Expand Up @@ -378,6 +379,8 @@ problem_generate_tests(
)
```

说明:最终写入的测试中,`extreme`(type=3)与 `tle`(type=4)合计不少于一半;若候选里极限类不足,则会在可用候选范围内尽量满足并返回对应统计字段。

### 步骤 7:打包为 Polygon 格式

```python
Expand Down Expand Up @@ -477,6 +480,8 @@ problem_pack_polygon(
| `extreme` | 3 | 边界情况:溢出、精度、hash 碰撞 |
| `tle` | 4 | 诱导 TLE 的性能测试数据 |

`problem_generate_tests` 的默认采样策略会优先保证最终测试集中 `extreme` + `tle` 至少占 50%,剩余名额再按配置平衡分配(或按确定性顺序填充)。

### 文件结构

```
Expand Down
2 changes: 2 additions & 0 deletions agents/autocode-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@ Always work through this sequence unless the task is explicitly outside problem

When the user asks for a later step directly, explain which prerequisite step is missing and complete the missing work first.

When running `problem_generate_tests`, enforce test quality: final test data should contain at least half limit-oriented cases (`type=3` extreme + `type=4` tle) when candidate availability allows.

Treat hook feedback as authoritative. If a hook denies a tool call, fix the workflow gap instead of retrying the same call.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "autocode-mcp"
version = "0.7.0"
version = "0.8.0"
description = "MCP Server for competitive programming problem creation, based on AutoCode paper"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
2 changes: 1 addition & 1 deletion scripts/workflow_guard.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ def session_start() -> int:
"stress_test_run(completed_rounds == total_rounds) -> "
"checker_build if needed (accuracy >= 0.9) -> "
"problem_validate(validation_passed) -> "
"problem_generate_tests(generated_test_count > 0) -> "
"problem_generate_tests(generated_test_count > 0, and prefer >=50% type3/type4 in final tests when candidates are sufficient) -> "
"problem_verify_tests(passed) -> problem_pack_polygon. "
"If a hook blocks a step, complete the missing prerequisite instead of retrying blindly."
)
Expand Down
6 changes: 4 additions & 2 deletions skills/autocode-workflow/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Based on the paper "AutoCode: LLMs as Problem Setters for Competitive Programmin
│ Phase 8: Test Generation │
│ ┌────────────────────┴────────────────────┐ │
│ │ problem_generate_tests │ Generate final test data │
│ │ (dedup + validator filter + balance) │ │
│ │ (dedup + validator filter + extreme>=50%)│ │
│ └────────────────────┬────────────────────┘ │
│ │ │
│ Phase 9: Packaging │
Expand Down Expand Up @@ -235,6 +235,7 @@ Required: problem_dir
Recommended: test_count=50, enable_dedup=true, enable_validator_filter=true
Output: tests/01.in ~ tests/50.in + corresponding .ans files
Verify: Check generated_tests count matches test_count
Quality Gate: In final tests, type 3/4 (extreme + tle) should be >= ceil(test_count/2) when candidates are sufficient
```

### Phase 9: Packaging
Expand Down Expand Up @@ -283,7 +284,7 @@ Generate 3-5 mutant solutions with common bugs:
| 5 | `stress_test_run` | Step 4 | `"All N rounds passed"` |
| 6 | `checker_build` (optional) | Step 5 | `accuracy >= 0.9` |
| 7 | `problem_validate` | Step 5 or 6 | `success=true`, all samples passed |
| 8 | `problem_generate_tests` | Step 7 | `generated_tests == test_count` |
| 8 | `problem_generate_tests` | Step 7 | `generated_tests == test_count` and `type3+type4 >= ceil(test_count/2)` (if candidates sufficient) |
| 9 | `problem_pack_polygon` | Step 8 | `success=true` |

### FORBIDDEN Actions
Expand Down Expand Up @@ -335,6 +336,7 @@ Before considering the problem complete:
- [ ] Statement samples validated (problem_validate passed)
- [ ] Sample files validated (problem_validate passed)
- [ ] Final test data generated (50+ tests)
- [ ] Final test data has at least 50% extreme/tle cases when candidate pool allows
- [ ] Polygon package created

## Example Complete Workflow
Expand Down
2 changes: 1 addition & 1 deletion src/autocode_mcp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"""
import os

__version__ = "0.7.0"
__version__ = "0.8.0"

# 获取 templates 目录路径(包内目录)
_PACKAGE_DIR = os.path.dirname(__file__)
Expand Down
8 changes: 5 additions & 3 deletions src/autocode_mcp/prompts/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@
## 3. 后处理
- 使用 Validator 过滤无效输入
- 去重(基于 signature)
- 平衡分布
- 先保证最终测试中至少一半是 extreme/tle(type=3/4,候选不足时尽量满足)
- 再平衡分布
- 采样

## 质量指标
Expand Down Expand Up @@ -141,8 +142,9 @@
### 后处理
1. Validator 过滤
2. 去重(MD5 signature)
3. 平衡分布
4. 采样
3. 先保证最终测试中 extreme/tle(type=3/4)不少于一半(候选不足时尽量满足)
4. 对剩余名额平衡分布
5. 采样
"""

# Checker 构建提示词
Expand Down
Loading
Loading