Skip to content

feat(plugins): introduce middleware token proxy plugin suite (reorder, dedup, lookup, skill index)#50

Open
SNM-SNM wants to merge 12 commits into
mainfrom
feature/msc-cache-optimization
Open

feat(plugins): introduce middleware token proxy plugin suite (reorder, dedup, lookup, skill index)#50
SNM-SNM wants to merge 12 commits into
mainfrom
feature/msc-cache-optimization

Conversation

@SNM-SNM

@SNM-SNM SNM-SNM commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Overview

This PR introduces a suite of Middleware Token Proxy Plugins designed to sit between agent frameworks (e.g., OpenClaw) and LLM engines (e.g., SGLang) to optimize KV cache sharing, minimize context budgets, and perform cache-aware routing.

These modules implement the core optimizations detailed in the MSc cache optimization project (Direction A).


Key Features

1. Static Context Optimization Plugins (WP1)

  • ContextReorderPlugin: Processes OpenAI-formatted request batches to group similar contexts, maximizing RadixAttention prefix sharing on SGLang.
  • ContextDedupPlugin: Conversation-aware history compressor. It tracks session turns and automatically replaces redundant historical messages with lightweight reference hints (e.g., [Reference to Turn 1]) utilizing the ConversationTracker.

2. Dynamic Routing & Tool Filtering Plugins (WP2)

  • KVCacheLookupPlugin: Subscribes to SGLang worker event streams via ZeroMQ (ZMQ) to build a real-time, in-memory Shadow Radix Tree representing the workers' GPU KV cache. Routes incoming requests to the worker with the longest prefix match.
  • SkillAwareContextPlugin: Dynamically filters and injects tool schemas into the request's tools array based on the _required_skills list, trimming unused tool definitions to save context budget.

3. Core Engine Optimizations & Fixes

  • Multiprocessing Bypass: Added an execution bypass in compute_distance_cpu.py when num_workers == 1 to eliminate multiprocessing.Pool initialization and IPC serialization overhead.
  • Windows Terminal Compatibility: Replaced Unicode checkmark character with standard + in all logging/printing calls to prevent UnicodeEncodeError crashes on non-UTF-8 consoles.
  • Dependencies: Added pytest-asyncio, pyzmq, and msgspec to dependencies to ensure successful integration and testing.

Verification & Testing

  • Mock Proxy Test: Added a complete pipeline mock test in evaluation/core_merge/mock_proxy.py to demonstrate the end-to-end integration and telemetry collection of all four plugins.
  • Unit Tests: Added tests/test_kv_lookup.py and tests/test_skill_index.py.
  • CI status: All 184 CPU tests passed successfully.

@SNM-SNM SNM-SNM requested review from Chivier and SecretSettler June 11, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant