Skip to content

Conversation

@Barrixar
Copy link

@Barrixar Barrixar commented Nov 7, 2025

Summary

This PR introduces performance optimizations to the conversation history summarization system. The changes use a faster endpoint for Simple mode, eliminate redundant computations, and add endpoint-aware budget validation.

Problem Statement

Current conversation history summarization has several optimization opportunities:

  1. Model selection: Always uses GPT-4.1 or the original request's endpoint, even for Simple mode
  2. Redundant token counting: Always performs client-side tokenization even when server provides counts
  3. Eager metric computation: Computes telemetry metrics on all code paths including errors
  4. Token budget overhead: Creates endpoint clones with 5% expansion in Full mode

Changes

1. Use copilot-fast Endpoint for Simple Mode

Impact: Uses lighter, faster model appropriate for Simple mode

// Use faster model for Simple mode
if (mode === SummaryMode.Simple) {
  if (!this.cachedSimpleEndpoint) {
    this.cachedSimpleEndpoint = await this.endpointProvider.getChatEndpoint('copilot-fast');
  }
  endpoint = this.cachedSimpleEndpoint;
}
  • Simple mode uses copilot-fast instead of GPT-4.1
  • Cached to avoid repeated lookups
  • Quality/speed trade-off appropriate for summarization

2. Server-Side Token Counting

Impact: Eliminates client-side tokenization when server provides counts

// Use server-reported count with fallback to client-side
const summarySize = response.usage?.completion_tokens ?? await this.sizing.countTokens(response.value);
  • Prioritizes server-provided token counts
  • Fallback ensures compatibility

3. Lazy Telemetry Metric Evaluation

Impact: Avoids computing metrics on error paths

// Only compute metrics when actually needed
const getNumRounds = () => {
  const numRoundsInHistory = this.props.promptContext.history
    .map(turn => turn.rounds.length)
    .reduce((a, b) => a + b, 0);
  // ...
};
// Called as: numRounds: getNumRounds()
  • Wraps history iteration in lazy getters
  • Error paths skip computation

4. Remove Token Budget Expansion

Impact: Eliminates unnecessary object cloning

// BEFORE: Created clone with 5% buffer
const expandedEndpoint = endpoint.cloneWithTokenOverride(endpoint.modelMaxPromptTokens * 1.05);

// AFTER: Use endpoint directly
summarizationPrompt = await renderPromptElement(this.instantiationService, endpoint, ...);
  • Removes artificial 5% budget expansion
  • Cleaner code, reduced allocations

5. Endpoint-Aware Budget Validation

Impact: Ensures validation uses the correct endpoint's budget

// Pass actual endpoint's budget for validation
return this.handleSummarizationResponse(
  summaryResponse,
  mode,
  stopwatch.elapsed(),
  endpoint.model,
  endpoint.modelMaxPromptTokens  // Use actual endpoint's budget
);

// Validate against correct budget
const effectiveBudget = !!this.props.maxSummaryTokens
  ? Math.min(endpointMaxTokens, this.props.maxSummaryTokens)
  : endpointMaxTokens;
  • Validates against the actual endpoint's budget used for the request
  • Simple mode uses copilot-fast budget, Full mode uses its endpoint's budget
  • Aligns validation with the endpoint that generated the response

6. Accurate Telemetry

Impact: Tracks actual model used

// Track actual model used instead of always tracking original endpoint
this.sendSummarizationTelemetry(..., endpoint.model, ...);
  • Reports copilot-fast for Simple mode
  • Reports correct model for Full mode
  • Improves monitoring accuracy

Expected Benefits

  • Faster Simple mode summarization (lighter copilot-fast model)
  • Reduced computation (server token counts, lazy telemetry)
  • Lower cost (copilot-fast vs GPT-4.1 for Simple mode)
  • Endpoint-aware budget validation
  • Improved telemetry (actual model tracking)

Testing

Scenarios Validated

  • ✅ Simple mode with copilot-fast endpoint
  • ✅ Full mode with original endpoint
  • ✅ Full mode with forceGpt41 enabled
  • ✅ Fallback from Full to Simple on error
  • ✅ maxSummaryTokens budget capping
  • ✅ Server token counts vs client fallback
  • ✅ Error handling paths
  • ✅ Telemetry accuracy

Regression Check

  • ✅ Backward compatible with all code paths
  • ✅ Type-safe implementation
  • ✅ Error handling intact

Code Quality

  • Minimal changes: Focused refactoring with clear intent
  • Well-commented: Explains each optimization
  • Type-safe: Proper parameter types, no casts
  • Maintainable: Straightforward logic

Files Changed

  • src/extension/prompts/node/agent/summarizedConversationHistory.tsx

Migration Notes

  • No breaking changes
  • No configuration changes required
  • Telemetry model names will change to reflect actual models used (expected)
  • Optimizations automatic for all users

Review Focus Areas:

  1. Budget validation correctness in handleSummarizationResponse
  2. Endpoint selection logic for Simple vs Full modes
  3. Telemetry accuracy with actual model tracking

Copilot AI review requested due to automatic review settings November 7, 2025 00:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the conversation history summarization process in the agent by introducing performance improvements and reducing computational overhead. The changes focus on using faster models for simple summarizations, eliminating unnecessary token overhead, and optimizing telemetry collection.

  • Uses the 'copilot-fast' endpoint for Simple mode summarizations with caching
  • Removes token budget expansion to reduce overhead
  • Optimizes telemetry by lazy-evaluating expensive metrics

Updated comments to clarify the use of endpoint's token budget for validation.
@DonJayamanne DonJayamanne removed their assignment Nov 7, 2025
@DonJayamanne DonJayamanne removed the request for review from roblourens November 7, 2025 03:08
@roblourens
Copy link
Member

I will try to take a look, but this area of code is tough, and this sounds like an AI-generated PR that isn't adding a huge amount of value.

I don't like deleting naively expandedEndpoint, this doesn't show an understanding of why it was there in the first place.

@roblourens roblourens self-assigned this Nov 7, 2025
@Barrixar
Copy link
Author

Barrixar commented Nov 7, 2025

I will try to take a look, but this area of code is tough, and this sounds like an AI-generated PR that isn't adding a huge amount of value.

@roblourens

For the record, i fed my changes and draft PR description to AI, to help me formalize it and analyze my implementation to update its description where it had gaps. AI is very useful for such tasks (Research, documentation, etc) as well as i personally no longer comment my code because after coding i can simply tell an AI what i did, why, and let it generate good-looking inline documentation and comments as needed.

Why did i do that? Because this repo as well as related ones from Microsoft are very corporate-oriented, so if an outsider arrives it may help to make a proper, well-formulated impression, especially if you are changing critical code.

However, i am an active user of Copilot Agent also for auxiliary tasks like investigating hard-to-unrafel bugs in my projects, and regression-check my patches if touching muddy water logic. I don't have GitHub Pro+ just to let it write documentation and PR bodies, haha. It helps me to avoid a can of worms (Shipping bugs..) every time i touch my projects. You can see how this is relevant: I've definately also let it check the changes from this PR. The area of code is tough, yes, but I can't identify any concerns. I have duly considered the architecture.

I don't like deleting naively expandedEndpoint, this doesn't show an understanding of why it was there in the first place.

Sorry for that, i will dive deeper into the subject and soon return with another commit if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants