Optimize conversation summarization logic and caching #1846

Barrixar · 2025-11-07T00:48:26Z

Summary

This PR introduces performance optimizations to the conversation history summarization system. The changes use a faster endpoint for Simple mode, eliminate redundant computations, and add endpoint-aware budget validation.

Problem Statement

Current conversation history summarization has several optimization opportunities:

Model selection: Always uses GPT-4.1 or the original request's endpoint, even for Simple mode
Redundant token counting: Always performs client-side tokenization even when server provides counts
Eager metric computation: Computes telemetry metrics on all code paths including errors
Token budget overhead: Creates endpoint clones with 5% expansion in Full mode

Changes

1. Use copilot-fast Endpoint for Simple Mode

Impact: Uses lighter, faster model appropriate for Simple mode

// Use faster model for Simple mode
if (mode === SummaryMode.Simple) {
  if (!this.cachedSimpleEndpoint) {
    this.cachedSimpleEndpoint = await this.endpointProvider.getChatEndpoint('copilot-fast');
  }
  endpoint = this.cachedSimpleEndpoint;
}

Simple mode uses copilot-fast instead of GPT-4.1
Cached to avoid repeated lookups
Quality/speed trade-off appropriate for summarization

2. Server-Side Token Counting

Impact: Eliminates client-side tokenization when server provides counts

// Use server-reported count with fallback to client-side
const summarySize = response.usage?.completion_tokens ?? await this.sizing.countTokens(response.value);

Prioritizes server-provided token counts
Fallback ensures compatibility

3. Lazy Telemetry Metric Evaluation

Impact: Avoids computing metrics on error paths

// Only compute metrics when actually needed
const getNumRounds = () => {
  const numRoundsInHistory = this.props.promptContext.history
    .map(turn => turn.rounds.length)
    .reduce((a, b) => a + b, 0);
  // ...
};
// Called as: numRounds: getNumRounds()

Wraps history iteration in lazy getters
Error paths skip computation

4. Remove Token Budget Expansion

Impact: Eliminates unnecessary object cloning

// BEFORE: Created clone with 5% buffer
const expandedEndpoint = endpoint.cloneWithTokenOverride(endpoint.modelMaxPromptTokens * 1.05);

// AFTER: Use endpoint directly
summarizationPrompt = await renderPromptElement(this.instantiationService, endpoint, ...);

Removes artificial 5% budget expansion
Cleaner code, reduced allocations

5. Endpoint-Aware Budget Validation

Impact: Ensures validation uses the correct endpoint's budget

// Pass actual endpoint's budget for validation
return this.handleSummarizationResponse(
  summaryResponse,
  mode,
  stopwatch.elapsed(),
  endpoint.model,
  endpoint.modelMaxPromptTokens  // Use actual endpoint's budget
);

// Validate against correct budget
const effectiveBudget = !!this.props.maxSummaryTokens
  ? Math.min(endpointMaxTokens, this.props.maxSummaryTokens)
  : endpointMaxTokens;

Validates against the actual endpoint's budget used for the request
Simple mode uses copilot-fast budget, Full mode uses its endpoint's budget
Aligns validation with the endpoint that generated the response

6. Accurate Telemetry

Impact: Tracks actual model used

// Track actual model used instead of always tracking original endpoint
this.sendSummarizationTelemetry(..., endpoint.model, ...);

Reports copilot-fast for Simple mode
Reports correct model for Full mode
Improves monitoring accuracy

Expected Benefits

Faster Simple mode summarization (lighter copilot-fast model)
Reduced computation (server token counts, lazy telemetry)
Lower cost (copilot-fast vs GPT-4.1 for Simple mode)
Endpoint-aware budget validation
Improved telemetry (actual model tracking)

Testing

Scenarios Validated

✅ Simple mode with copilot-fast endpoint
✅ Full mode with original endpoint
✅ Full mode with forceGpt41 enabled
✅ Fallback from Full to Simple on error
✅ maxSummaryTokens budget capping
✅ Server token counts vs client fallback
✅ Error handling paths
✅ Telemetry accuracy

Regression Check

✅ Backward compatible with all code paths
✅ Type-safe implementation
✅ Error handling intact

Code Quality

Minimal changes: Focused refactoring with clear intent
Well-commented: Explains each optimization
Type-safe: Proper parameter types, no casts
Maintainable: Straightforward logic

Files Changed

src/extension/prompts/node/agent/summarizedConversationHistory.tsx

Migration Notes

No breaking changes
No configuration changes required
Telemetry model names will change to reflect actual models used (expected)
Optimizations automatic for all users

Review Focus Areas:

Budget validation correctness in handleSummarizationResponse
Endpoint selection logic for Simple vs Full modes
Telemetry accuracy with actual model tracking

Copilot

Pull Request Overview

This PR optimizes the conversation history summarization process in the agent by introducing performance improvements and reducing computational overhead. The changes focus on using faster models for simple summarizations, eliminating unnecessary token overhead, and optimizing telemetry collection.

Uses the 'copilot-fast' endpoint for Simple mode summarizations with caching
Removes token budget expansion to reduce overhead
Optimizes telemetry by lazy-evaluating expensive metrics

src/extension/prompts/node/agent/summarizedConversationHistory.tsx

Updated comments to clarify the use of endpoint's token budget for validation.

roblourens · 2025-11-07T20:51:51Z

I will try to take a look, but this area of code is tough, and this sounds like an AI-generated PR that isn't adding a huge amount of value.

I don't like deleting naively expandedEndpoint, this doesn't show an understanding of why it was there in the first place.

Barrixar · 2025-11-07T22:04:40Z

I will try to take a look, but this area of code is tough, and this sounds like an AI-generated PR that isn't adding a huge amount of value.

@roblourens

For the record, i fed my changes and draft PR description to AI, to help me formalize it and analyze my implementation to update its description where it had gaps. AI is very useful for such tasks (Research, documentation, etc) as well as i personally no longer comment my code because after coding i can simply tell an AI what i did, why, and let it generate good-looking inline documentation and comments as needed.

Why did i do that? Because this repo as well as related ones from Microsoft are very corporate-oriented, so if an outsider arrives it may help to make a proper, well-formulated impression, especially if you are changing critical code.

However, i am an active user of Copilot Agent also for auxiliary tasks like investigating hard-to-unrafel bugs in my projects, and regression-check my patches if touching muddy water logic. I don't have GitHub Pro+ just to let it write documentation and PR bodies, haha. It helps me to avoid a can of worms (Shipping bugs..) every time i touch my projects. You can see how this is relevant: I've definately also let it check the changes from this PR. The area of code is tough, yes, but I can't identify any concerns. I have duly considered the architecture.

I don't like deleting naively expandedEndpoint, this doesn't show an understanding of why it was there in the first place.

Sorry for that, i will dive deeper into the subject and soon return with another commit if need be.

Optimize conversation summarization logic and caching

e3de052

Copilot AI review requested due to automatic review settings November 7, 2025 00:48

vs-code-engineering bot assigned DonJayamanne Nov 7, 2025

vs-code-engineering bot added the triage-needed label Nov 7, 2025

Copilot AI reviewed Nov 7, 2025

View reviewed changes

src/extension/prompts/node/agent/summarizedConversationHistory.tsx Show resolved Hide resolved

src/extension/prompts/node/agent/summarizedConversationHistory.tsx Show resolved Hide resolved

src/extension/prompts/node/agent/summarizedConversationHistory.tsx Show resolved Hide resolved

Clarify token budget validation in summarization

c1fc9c3

Updated comments to clarify the use of endpoint's token budget for validation.

DonJayamanne requested a review from roblourens November 7, 2025 03:08

DonJayamanne removed their assignment Nov 7, 2025

DonJayamanne removed the request for review from roblourens November 7, 2025 03:08

roblourens self-assigned this Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize conversation summarization logic and caching #1846

Optimize conversation summarization logic and caching #1846

Barrixar commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roblourens commented Nov 7, 2025

Uh oh!

Barrixar commented Nov 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize conversation summarization logic and caching #1846

Are you sure you want to change the base?

Optimize conversation summarization logic and caching #1846

Conversation

Barrixar commented Nov 7, 2025

Summary

Problem Statement

Changes

1. Use copilot-fast Endpoint for Simple Mode

2. Server-Side Token Counting

3. Lazy Telemetry Metric Evaluation

4. Remove Token Budget Expansion

5. Endpoint-Aware Budget Validation

6. Accurate Telemetry

Expected Benefits

Testing

Scenarios Validated

Regression Check

Code Quality

Files Changed

Migration Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roblourens commented Nov 7, 2025

Uh oh!

Barrixar commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Barrixar commented Nov 7, 2025 •

edited

Loading