Skip to content

feat: Enhance LLM configuration and routing with model profile attach…#869

Merged
MODSetter merged 1 commit intomainfrom
dev
Mar 11, 2026
Merged

feat: Enhance LLM configuration and routing with model profile attach…#869
MODSetter merged 1 commit intomainfrom
dev

Conversation

@MODSetter
Copy link
Owner

@MODSetter MODSetter commented Mar 11, 2026

…ment

  • Added _attach_model_profile function to attach model context metadata to ChatLiteLLM.
  • Updated create_chat_litellm_from_config and create_chat_litellm_from_agent_config to utilize the new profile attachment.
  • Improved context profile caching in llm_router_service.py to include both minimum and maximum input tokens, along with token model names for better context management.
  • Introduced new methods for token counting and context trimming based on model profiles.

Description

Motivation and Context

FIX #

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR enhances LLM configuration and routing by introducing model profile attachment and context-aware message trimming capabilities. The changes add a _attach_model_profile function that captures model context metadata (like max_input_tokens) from LiteLLM's model info and attaches it to ChatLiteLLM instances. The router service is updated to cache both minimum and maximum input token limits across all deployments, enabling smarter context management. Most significantly, this introduces sophisticated context trimming logic that uses binary search to intelligently truncate large messages (especially tool responses and document context) when they exceed the model's context window, preferring XML document boundaries for cleaner cuts and preserving system messages to maintain agent instructions.

⏱️ Estimated Review Time: 30-90 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/agents/new_chat/llm_config.py
2 surfsense_backend/app/services/llm_router_service.py

Need help? Join our Discord

Analyze latest changes

…ment

- Added `_attach_model_profile` function to attach model context metadata to `ChatLiteLLM`.
- Updated `create_chat_litellm_from_config` and `create_chat_litellm_from_agent_config` to utilize the new profile attachment.
- Improved context profile caching in `llm_router_service.py` to include both minimum and maximum input tokens, along with token model names for better context management.
- Introduced new methods for token counting and context trimming based on model profiles.
@vercel
Copy link

vercel bot commented Mar 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
surf-sense-frontend Building Building Preview, Comment Mar 11, 2026 1:19am

Request Review

@MODSetter MODSetter merged commit 1ab5640 into main Mar 11, 2026
10 of 13 checks passed
Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 5571e8a..eec4db4

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (1)

surfsense_backend/app/services/llm_router_service.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant