feat: 支持按租户/服务配置平均 token 数以优化并发计算 #27

Copilot · 2025-12-15T09:35:14Z

平均 Token 数硬编码在 rate_limiter.py 导致并发计算不准确。Insert 和 Query 场景的 token 消耗差异大，固定保守值限制了系统吞吐量。

Changes

src/config.py: 各服务配置类新增 avg_tokens_per_request 字段
- LLM/DS_OCR: 3500 (default)
- Embedding: 20000
- Rerank: 500
src/rate_limiter.py:
- get_rate_limiter() 新增 avg_tokens_per_request 参数
- 提取默认配置到 SERVICE_DEFAULTS 字典
src/tenant_config.py: 各服务合并方法传递 avg_tokens_per_request
src/multi_tenant.py / src/deepseek_ocr_client.py: 更新调用传参
env.example: 添加环境变量说明

配置优先级

租户配置 (API)
环境变量 (LLM_AVG_TOKENS_PER_REQUEST)
代码默认值

使用示例

# Insert 密集场景，降低平均 token 提升并发
LLM_AVG_TOKENS_PER_REQUEST=2500

// 租户 API
PUT /tenants/{tenant_id}/config
{
  "llm_config": {
    "avg_tokens_per_request": 4000
  }
}

Original prompt

This section details on the original issue you should resolve

<issue_title>Average tokens per request hardcoded - inaccurate concurrency calculation</issue_title>
<issue_description>## 问题描述
平均 Token 数硬编码导致并发计算不准确。

受影响的文件

src/rate_limiter.py 行 420-423

硬编码值

llm: 3500

embedding: 500

rerank: 500

ds_ocr: 3500

问题

这些值因场景而异（Query vs Insert），但被固定为保守值，导致不必要的速率限制，降低系统吞吐量。

解决方案

应该支持按租户/服务配置平均 token 数。</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Average tokens per request hardcoded - inaccurate concurrency calculation #14

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>

BukeLy

ds_ocr.avg_tokens_per_request
ds_ocr.tokens_per_minute
这两个参数的目的是什么?在没有设置TPM/RPM情况下估算吗?还有就是我记得关于Token我引入了Transform的tokenizer来计算,为什么还需要这两个参数呢?

BukeLy · 2025-12-15T10:17:31Z

src/deepseek_ocr_client.py

-            tokens_per_minute=getattr(self.config, 'tokens_per_minute', config.ds_ocr.tokens_per_minute)
+            tokens_per_minute=getattr(self.config, 'tokens_per_minute', config.ds_ocr.tokens_per_minute),
+            avg_tokens_per_request=getattr(self.config, 'avg_tokens_per_request', config.ds_ocr.avg_tokens_per_request)


ds_ocr.avg_tokens_per_request
ds_ocr.tokens_per_minute
这两个参数的目的是什么?在没有设置TPM/RPM情况下估算吗?还有就是我记得关于Token我引入了Transform的tokenizer来计算,为什么还需要这两个参数呢? @copilot

这两个参数的作用不同：

tokens_per_minute - API 服务商的 TPM 限制（如硅基流动 40000），用于 RateLimiter 控制每分钟最大 token 消耗，防止 429 错误。

avg_tokens_per_request - 用于启动时预估安全并发数的参数。在 calculate_optimal_concurrent() 中计算：concurrent = TPM / avg_tokens / max_in_flight。

虽然 count_tokens() 用 tiktoken 精确计算每个请求的实际 token（运行时扣减 TPM 配额），但 avg_tokens_per_request 是初始化时预估应允许多少并发请求。这是两个不同用途：

运行时：count_tokens() 精确计算实际消耗

初始化：avg_tokens_per_request 预估并发上限

不同场景（Insert vs Query）的平均 token 差异大，现在可配置这个预估值来优化并发。

Initial plan

7cd7c2e

Copilot AI assigned Copilot and BukeLy Dec 15, 2025

Copilot started work on behalf of BukeLy December 15, 2025 09:35 View session

Copilot AI and others added 2 commits December 15, 2025 09:43

feat: 支持按租户/服务配置平均 token 数以优化并发计算

b14fda4

Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>

refactor: 提取 rate_limiter 默认配置到字典以提高可维护性

f5d0441

Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update average tokens per request for better concurrency calculation~~ feat: 支持按租户/服务配置平均 token 数以优化并发计算 Dec 15, 2025

Copilot finished work on behalf of BukeLy December 15, 2025 09:47

Copilot AI requested a review from BukeLy December 15, 2025 09:48

BukeLy requested changes Dec 15, 2025

View reviewed changes

Copilot started work on behalf of BukeLy December 15, 2025 10:18 View session

Copilot AI requested a review from BukeLy December 15, 2025 10:21

Copilot finished work on behalf of BukeLy December 15, 2025 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: 支持按租户/服务配置平均 token 数以优化并发计算 #27

feat: 支持按租户/服务配置平均 token 数以优化并发计算 #27

Uh oh!

Copilot AI commented Dec 15, 2025 •

edited

Loading

Uh oh!

BukeLy left a comment

Uh oh!

BukeLy Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: 支持按租户/服务配置平均 token 数以优化并发计算 #27

Are you sure you want to change the base?

feat: 支持按租户/服务配置平均 token 数以优化并发计算 #27

Uh oh!

Conversation

Copilot AI commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

配置优先级

使用示例

受影响的文件

硬编码值

问题

解决方案

Comments on the Issue (you are @copilot in this section)

Uh oh!

BukeLy left a comment

Choose a reason for hiding this comment

Uh oh!

BukeLy Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 15, 2025 •

edited

Loading