Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions BATCH_REVIEW_IMPLEMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# 批量代码审查功能实现说明

## 概述
本次更新实现了按文件批次进行代码审查的功能,解决了一次性将所有修改代码发送给AI导致上下文不够、AI忘记提示词模板要求的问题。

## 主要改动

### 1. code_reviewer.py 新增功能

#### 新增方法:`review_changes_in_batches`
- **功能**:按文件批次审查代码变更,然后汇总所有审查结果
- **参数**:
- `changes`: 代码变更列表,每个元素是一个包含文件信息的字典
- `commits_text`: 提交信息
- **返回值**:汇总后的审查结果

**工作流程**:
1. 遍历每个文件的变更
2. 对每个文件单独调用 `review_code` 方法进行审查
3. 如果单个文件的tokens超过 `REVIEW_MAX_TOKENS`,会自动截断
4. 收集所有文件的审查结果
5. 如果只有一个文件,直接返回该文件的审查结果
6. 如果有多个文件,调用 `_summarize_reviews` 方法汇总结果

#### 新增方法:`_summarize_reviews`
- **功能**:使用 `summary_merge_review_prompt` 提示词汇总多个审查结果
- **参数**:
- `partial_reviews`: 各批次的审查结果列表
- **返回值**:汇总后的总审查报告

**工作流程**:
1. 加载 `summary_merge_review_prompt` 提示词配置
2. 将所有分批审查结果用分隔符拼接
3. 调用LLM进行汇总
4. 返回格式化后的汇总结果

### 2. worker.py 修改

#### `handle_merge_request_event` 函数
**修改前**:
```python
review_result = CodeReviewer().review_and_strip_code(str(changes), commits_text)
```

**修改后**:
```python
code_reviewer = CodeReviewer()
review_result = code_reviewer.review_changes_in_batches(changes, commits_text)
```

**变化**:不再将所有changes转换为字符串一次性审查,而是将changes列表传递给批量审查方法。

#### `handle_push_event` 函数
与 `handle_merge_request_event` 相同的修改方式。

#### `handle_github_push_event` 函数
与GitLab push事件处理相同的修改方式。

#### `handle_github_pull_request_event` 函数
与GitLab merge request处理相同的修改方式。

## 环境变量配置

项目提供了灵活的环境变量来控制批量审查行为:

### BATCH_REVIEW_ENABLED
- **说明**:是否启用批量审查功能
- **可选值**:
- `1`:启用批量审查(默认)
- `0`:禁用批量审查,使用传统的一次性审查方式
- **默认值**:`1`
- **使用场景**:
- 启用时:代码变更会按批次分组审查,然后汇总
- 禁用时:所有代码变更一次性发送给AI审查(可能遇到上下文限制问题)

### BATCH_REVIEW_FILES_PER_BATCH
- **说明**:每批次审查的文件数量
- **可选值**:整数,建议范围 `1-5`
- **默认值**:`1`(每个文件单独审查)
- **配置建议**:
- `1`:每个文件独立审查,精确度最高,但LLM调用次数最多
- `2-3`:适合中等规模的变更,平衡精确度和性能
- `4-5`:适合大量小文件变更,减少LLM调用次数
- 更大值:可能导致单次审查上下文过长,不推荐

### REVIEW_MAX_TOKENS
- **说明**:每批次审查的最大token限制
- **默认值**:`10000`
- **作用**:防止单批次内容过长,超出部分会自动截断

### 配置示例

在 `conf/.env` 文件中添加:

```bash
# 启用批量审查 0-否 1-是
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Corrected comment punctuation and spacing: '启用批量审查 0-否 1-是' should be '启用批量审查(0=禁用,1=启用)' to match the style used in conf/.env.dist line 42.

Suggested change
# 启用批量审查 0-否 1-是
# 启用批量审查(0=禁用,1=启用)

Copilot uses AI. Check for mistakes.
BATCH_REVIEW_ENABLED=1

# 每批次审查文件数(建议1-5)
BATCH_REVIEW_FILES_PER_BATCH=2

```

## 性能对比

| 配置 | 10个文件变更的LLM调用次数 | 精确度 | 适用场景 |
|------|-------------------------|--------|----------|
| `BATCH_REVIEW_ENABLED=0` | 1次 | 低 | 小规模变更 |
| `FILES_PER_BATCH=1` | 11次(10+1汇总) | 最高 | 追求最高质量 |
| `FILES_PER_BATCH=3` | 5次(4批次+1汇总) | 高 | **推荐配置** |
| `FILES_PER_BATCH=5` | 3次(2批次+1汇总) | 中 | 大量小文件 |

## 提示词配置

已使用项目中新增的 `summary_merge_review_prompt` 提示词,该提示词专门用于:
- 整合多个分批审查报告
- 去除重复问题
- 重新计算整体评分
- 生成结构化的总审查报告
16 changes: 10 additions & 6 deletions biz/queue/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ def handle_push_event(webhook_data: dict, gitlab_token: str, gitlab_url: str, gi

if len(changes) > 0:
commits_text = ';'.join(commit.get('message', '').strip() for commit in commits)
review_result = CodeReviewer().review_and_strip_code(str(changes), commits_text)
code_reviewer = CodeReviewer()
review_result = code_reviewer.review_changes_in_batches(changes, commits_text)
score = CodeReviewer.parse_review_score(review_text=review_result)
for item in changes:
additions += item['additions']
Expand Down Expand Up @@ -131,9 +132,10 @@ def handle_merge_request_event(webhook_data: dict, gitlab_token: str, gitlab_url
logger.error('Failed to get commits')
return

# review 代码
# review 代码 - 使用批量审查方法
commits_text = ';'.join(commit['title'] for commit in commits)
review_result = CodeReviewer().review_and_strip_code(str(changes), commits_text)
code_reviewer = CodeReviewer()
review_result = code_reviewer.review_changes_in_batches(changes, commits_text)

# 将review结果提交到Gitlab的 notes
handler.add_merge_request_notes(f'Auto Review Result: \n{review_result}')
Expand Down Expand Up @@ -188,7 +190,8 @@ def handle_github_push_event(webhook_data: dict, github_token: str, github_url:

if len(changes) > 0:
commits_text = ';'.join(commit.get('message', '').strip() for commit in commits)
review_result = CodeReviewer().review_and_strip_code(str(changes), commits_text)
code_reviewer = CodeReviewer()
review_result = code_reviewer.review_changes_in_batches(changes, commits_text)
score = CodeReviewer.parse_review_score(review_text=review_result)
for item in changes:
additions += item.get('additions', 0)
Expand Down Expand Up @@ -271,9 +274,10 @@ def handle_github_pull_request_event(webhook_data: dict, github_token: str, gith
logger.error('Failed to get commits')
return

# review 代码
# review 代码 - 使用批量审查方法
commits_text = ';'.join(commit['title'] for commit in commits)
review_result = CodeReviewer().review_and_strip_code(str(changes), commits_text)
code_reviewer = CodeReviewer()
review_result = code_reviewer.review_changes_in_batches(changes, commits_text)

# 将review结果提交到GitHub的 notes
handler.add_pull_request_notes(f'Auto Review Result: \n{review_result}')
Expand Down
108 changes: 108 additions & 0 deletions biz/utils/code_reviewer.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,114 @@ def review_code(self, diffs_text: str, commits_text: str = "") -> str:
]
return self.call_llm(messages)

def review_changes_in_batches(self, changes: List[Dict[str, Any]], commits_text: str = "") -> str:
"""
按文件批次审查代码变更,然后汇总所有审查结果
:param changes: 代码变更列表,每个元素是一个包含文件信息的字典
:param commits_text: 提交信息
:return: 汇总后的审查结果
"""
if not changes:
logger.info("代码变更为空")
return "代码为空"

# 检查是否启用批量审查
batch_review_enabled = os.getenv("BATCH_REVIEW_ENABLED", "1") == "1"

# 如果未启用批量审查,使用原有的一次性审查方式
if not batch_review_enabled:
logger.info("批量审查功能未启用,使用传统一次性审查方式")
return self.review_and_strip_code(str(changes), commits_text)

review_max_tokens = int(os.getenv("REVIEW_MAX_TOKENS", 10000))
# 获取每批次审查的文件数量配置
files_per_batch = int(os.getenv("BATCH_REVIEW_FILES_PER_BATCH", 1))
logger.info(f"批量审查已启用,每批次审查 {files_per_batch} 个文件")

partial_reviews = []
total_files = len(changes)

# 按配置的批次大小分批进行审查
for batch_start in range(0, total_files, files_per_batch):
batch_end = min(batch_start + files_per_batch, total_files)
batch_changes = changes[batch_start:batch_end]
batch_num = (batch_start // files_per_batch) + 1
total_batches = (total_files + files_per_batch - 1) // files_per_batch

logger.info(f"正在审查第 {batch_num}/{total_batches} 批次 (文件 {batch_start + 1}-{batch_end}/{total_files})")

# 收集当前批次的文件路径
batch_file_paths = [
change.get('new_path') or change.get('old_path', 'unknown')
for change in batch_changes
]

# 将批次内的文件转换为文本
batch_text = str(batch_changes)

# 计算tokens数量,如果超过限制则截断
tokens_count = count_tokens(batch_text)
if tokens_count > review_max_tokens:
logger.warning(f"批次 {batch_num} 的变更超过 {review_max_tokens} tokens,将截断")
batch_text = truncate_text_by_tokens(batch_text, review_max_tokens)

# 审查当前批次
try:
review_result = self.review_code(batch_text, commits_text).strip()
if review_result.startswith("```markdown") and review_result.endswith("```"):
review_result = review_result[11:-3].strip()

# 添加批次标识
batch_header = f"### 批次 {batch_num} (文件: {', '.join(batch_file_paths)})\n"
partial_reviews.append(f"{batch_header}{review_result}")
logger.info(f"批次 {batch_num} 审查完成")
except Exception as e:
logger.error(f"审查批次 {batch_num} 时出错: {e}")
partial_reviews.append(f"### 批次 {batch_num}\n审查失败: {str(e)}")

# 如果只有一个批次,直接返回结果(去掉批次标识)
if len(partial_reviews) == 1:
# 去掉批次标题行
result = partial_reviews[0]
lines = result.split('\n', 1)
return lines[1] if len(lines) > 1 else result

# 汇总多个批次的审查结果
logger.info(f"开始汇总 {len(partial_reviews)} 个批次的审查结果")
summary_result = self._summarize_reviews(partial_reviews)
return summary_result

def _summarize_reviews(self, partial_reviews: List[str]) -> str:
"""
使用 summary_merge_review_prompt 汇总多个审查结果
:param partial_reviews: 各批次的审查结果列表
:return: 汇总后的总审查报告
"""
# 加载汇总提示词
summary_prompts = self._load_prompts("summary_merge_review_prompt", os.getenv("REVIEW_STYLE", "professional"))

# 拼接所有分批审查结果
partial_reviews_text = "\n\n---\n\n".join(partial_reviews)

# 构建汇总请求消息
messages = [
summary_prompts["system_message"],
{
"role": "user",
"content": summary_prompts["user_message"]["content"].format(
partial_reviews_text=partial_reviews_text
),
},
]

# 调用LLM进行汇总
summary_result = self.call_llm(messages).strip()
if summary_result.startswith("```markdown") and summary_result.endswith("```"):
summary_result = summary_result[11:-3].strip()

logger.info("审查结果汇总完成")
return summary_result

@staticmethod
def parse_review_score(review_text: str) -> int:
"""解析 AI 返回的 Review 结果,返回评分"""
Expand Down
6 changes: 6 additions & 0 deletions conf/.env.dist
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ REVIEW_MAX_TOKENS=10000
#Review 风格选项:professional(专业) | sarcastic(毒舌) | gentle(温和) | humorous(幽默)
REVIEW_STYLE=professional

# 批量审查配置
# 是否启用批量审查功能(1=启用,0=禁用,使用传统一次性审查)
BATCH_REVIEW_ENABLED=1
# 每批次审查的文件数量(建议1-5,设置为1表示每个文件单独审查后汇总)
BATCH_REVIEW_FILES_PER_BATCH=3

#钉钉配置
DINGTALK_ENABLED=0
DINGTALK_WEBHOOK_URL=https://oapi.dingtalk.com/robot/send?access_token=xxx
Expand Down
Loading