Skip to content

feat: GoZen v3.0.0 - Context Compression & Middleware Pipeline#11

Open
john-zhh wants to merge 36 commits intomainfrom
feature/v3.0-platform
Open

feat: GoZen v3.0.0 - Context Compression & Middleware Pipeline#11
john-zhh wants to merge 36 commits intomainfrom
feature/v3.0-platform

Conversation

@john-zhh
Copy link
Contributor

Overview

This PR implements GoZen v3.0.0 with two major BETA features:

  1. Context Compression - Transparent context compression for large conversations
  2. Middleware Pipeline - Pluggable middleware architecture for extensibility

All features are disabled by default and marked as BETA.

Features

Context Compression (BETA)

Transparent context compression that intercepts large conversation histories, summarizes them with a cheap model, and forwards compressed requests upstream.

  • Token estimation for messages
  • Configurable threshold and target tokens
  • Preserves recent messages uncompressed
  • Web API for configuration and statistics

Middleware Pipeline (BETA)

Transform GoZen into a programmable AI API gateway with a pluggable middleware chain.

Architecture:

  • Middleware interface for custom middleware development
  • Pipeline executor with priority-based ordering
  • Registry for middleware lifecycle management
  • PluginLoader for local (.so) and remote plugin support

Built-in Middleware:

Name Priority Description
context-injection 10 Auto-inject .cursorrules, CLAUDE.md
session-memory 15 Cross-session intelligence (v3.1)
request-logger 20 Log requests and responses
orchestration 50 Multi-model orchestration (v3.2)

Third-Party Plugin Support

  • Local plugins: Load Go plugins (.so files) from disk
  • Remote plugins: Download from URL with SHA256 verification
  • Middleware development guide for third-party developers

Configuration

{
  "compression": {
    "enabled": false,
    "threshold_tokens": 50000,
    "target_tokens": 20000,
    "summary_model": "claude-3-haiku-20240307",
    "preserve_recent": 4
  },
  "middleware": {
    "enabled": false,
    "middlewares": [
      {
        "name": "context-injection",
        "enabled": true,
        "source": "builtin"
      }
    ]
  }
}

Web API

Compression

  • GET/PUT /api/v1/compression - Configuration
  • GET /api/v1/compression/stats - Statistics

Middleware

  • GET/PUT /api/v1/middleware - Configuration
  • GET /api/v1/middleware/{name} - Details
  • POST /api/v1/middleware/{name}/enable - Enable
  • POST /api/v1/middleware/{name}/disable - Disable
  • POST /api/v1/middleware/reload - Reload all

Files Changed

New Files

  • internal/proxy/compression.go - Context compressor
  • internal/middleware/*.go - Middleware package
  • internal/web/api_compression.go - Compression API
  • internal/web/api_middleware.go - Middleware API
  • docs/middleware-development.md - Development guide

Modified Files

  • internal/config/config.go - New config types
  • internal/config/store.go - New getters/setters
  • internal/proxy/server.go - Integration
  • internal/daemon/server.go - Initialization
  • cmd/root.go - Version bump to 3.0.0

Testing

All existing tests pass. New tests added for:

  • Context compression
  • Middleware pipeline
  • Built-in middlewares
  • Registry and loader
go test ./...

Breaking Changes

None. All features are opt-in and disabled by default.

john-zhh and others added 27 commits February 18, 2026 08:58
…ing (v2.2.0)

This release adds comprehensive observability and smart routing capabilities:

- Usage Tracking: Record API usage with cost calculation based on model pricing
- Budget Control: Set daily/weekly/monthly limits with warn/downgrade/block actions
- Provider Health: Monitor provider health with success rate and latency metrics
- Smart Load Balancing: Support failover, round-robin, least-latency, least-cost strategies
- Session Insights: Track per-session usage with turn-by-turn details
- Webhook Notifications: Send alerts for budget warnings, provider status, failovers
- Web UI: New Usage tab with cost summary, budget status, and provider health

New files:
- internal/proxy/usage.go, budget.go, healthcheck.go, loadbalancer.go, metrics.go
- internal/notify/webhook.go
- internal/web/api_usage.go, api_health.go, api_sessions.go, api_webhooks.go, api_pricing.go

Config version: 7 → 8
SQLite schema version: 2 → 3

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
… Mistral, Qwen models

Expand default model pricing to cover common programming models:
- OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini
- DeepSeek: deepseek-chat, deepseek-coder, deepseek-reasoner
- MiniMax: abab6.5s/6.5/6.5t/5.5-chat
- GLM (Zhipu): glm-4-plus/0520/air/airx/long/flash/flashx, codegeex-4
- Google Gemini: gemini-2.0-flash, gemini-1.5-pro/flash
- Mistral: mistral-large/small, codestral, ministral, pixtral
- Qwen (Alibaba): qwen-max/plus/turbo/long, qwen-coder-plus/turbo

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
[BETA] Context Compression:
- Add CompressionConfig for transparent context compression
- Implement ContextCompressor with token estimation and summarization
- Add compression Web API endpoints

[BETA] Middleware Pipeline:
- Add pluggable middleware architecture with Middleware interface
- Implement Pipeline executor with priority-based ordering
- Add Registry for middleware lifecycle management
- Add PluginLoader for local (.so) and remote plugin support

Built-in Middleware:
- context-injection: Auto-inject .cursorrules, CLAUDE.md
- request-logger: Log all requests and responses
- session-memory: Cross-session intelligence (v3.1 feature)
- orchestration: Multi-model orchestration - voting, chain, review (v3.2 feature)

Web API:
- GET/PUT /api/v1/compression - Compression config
- GET /api/v1/compression/stats - Compression statistics
- GET/PUT /api/v1/middleware - Middleware config
- POST /api/v1/middleware/{name}/enable|disable
- POST /api/v1/middleware/reload

Documentation:
- Add middleware development guide for third-party developers

All features are disabled by default and marked as BETA.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Observatory: session monitoring, stuck detection, idle timeout
- Guardrails: spending caps, rate limiting, sensitive operation detection
- Coordinator: file locking, change awareness, context warnings
- TaskQueue: priority-based task management with retry support
- Runtime: autonomous agent execution with planning/execution/validation phases
- Web API endpoints for all agent components

All features are BETA and disabled by default.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Add tests for proxy package (metrics, usage, budget, healthcheck, loadbalancer, session, compression, logger)
- Add tests for web package (API v2 endpoints, server helpers)
- Add tests for config and daemon packages
- Achieve 82% coverage for proxy package (target: ≥80%)
- Achieve 80.1% coverage for web package (target: ≥80%)

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- proxy/session.go: Fix race condition in GetSessionUsage, GetSessionInsight,
  and GetContextWarning by holding lock during sync.Map access
- proxy/healthcheck.go: Fix double-close panic in Stop() by tracking stopped state
- agent/runtime.go: Fix ignored rand.Read error, add lock for task.Plan assignment
- agent/observatory.go: Fix data race by reading config.StuckThreshold under lock
- config/migrate.go: Clean up incomplete file on copy failure
- web/auth.go: Add graceful shutdown for sessionCleanupLoop, fix rand.Read error
- web/server.go: Add sync.RWMutex for syncMgr access
- web/api_sync.go: Use lock when accessing/modifying syncMgr

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- config/config.go: Add nil check in ScenarioRoute.UnmarshalJSON to prevent
  panic when providers array contains null elements
- config/config.go: Add nil checks in ProviderNames and ModelForProvider methods
- proxy/logdb.go: Handle stmt.Exec errors in flushBatch, rollback on failure
- web/auth.go: Fix IP spoofing in clientIP by properly parsing X-Forwarded-For
  header (extract first IP from comma-separated list)

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- agent/taskqueue.go: Handle rand.Read error with timestamp fallback
- daemon/server.go: Fix randomID to properly check os.Open and Read errors
- notify/webhook.go: Handle json.Marshal errors in format functions
- proxy/logdb.go: Add explicit error ignoring with comments for best-effort
  operations (os.Chmod, os.Remove, setSchemaVersion)
- update/check.go: Add explicit error ignoring for cache operations

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- cmd/web.go: ignore exec.Command().Start() errors for browser open
- internal/daemon/daemon.go: ignore os.Remove errors in cleanup functions
- internal/middleware/loader.go: ignore os.Remove errors in cache operations

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- proxy/server.go: safe type assertion for message role
- proxy/session.go: use fmt.Sprintf for duration formatting (fixes overflow)
- web/server.go: explicitly ignore JSON encode errors (best-effort)
- middleware/loader.go: ensure temp file closed via defer

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- proxy/logdb.go: explicitly ignore tx.Rollback/Commit errors (best-effort)
- daemon/server.go: call pullCancel() immediately after Pull() returns

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- proxy/profile_proxy.go: ignore JSON encode error in writeError
- daemon/api.go: close request body after JSON decode

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Add tests for sessionCleanupLoop, StopCleanup, clientIP
- Add tests for HandleFunc, SetSyncManager
- Web coverage: 79.7% -> 80.7%
- Add disclaimer to usage page: data is for reference only

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Add tests for Shutdown resource cleanup (syncCancel, pushTimer, watcher)
- Add tests for session cleanup logic (stale session removal)
- Add tests for initSync cancellation of existing sync
- Add tests for DaemonSysProcAttr, IsDaemonRunning, StopDaemonProcess
- Add test for startProxy
- Update CI coverage requirement: daemon 40% -> 50%

These tests specifically target memory leak prevention by verifying:
- Context cancellation on shutdown
- Timer cleanup
- Goroutine termination paths
- Stale session cleanup

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Update version from 4.0.0 to 3.0.0
- Consolidate all features (v2.2-v4.0) into single v3.0 release
- Create unified release plan document (.dev/v3.0-release-plan.md)

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Website updates:
- Upgrade docs version from 2.1 to 3.0
- Add Japanese (ja) and Korean (ko) locale support
- Add v3.0 feature documentation:
  - Usage Tracking & Budget Control
  - Health Monitoring
  - Load Balancing
  - Webhooks
  - Context Compression
  - Middleware Pipeline
  - Agent Infrastructure

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Add TDD requirement for new feature development
- Add formal release checklist:
  1. Bug check
  2. Version number verification
  3. Website documentation review
  4. README files update
- Add v3.0.0 to version history

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Add v3.0 new features section covering usage tracking, budget control,
provider health monitoring, smart load balancing, webhooks, context
compression, middleware pipeline, and agent infrastructure.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Replace vanilla JS frontend with modern React stack:
- React 18 + TypeScript + Vite build system
- shadcn/ui components (Radix UI + Tailwind CSS)
- TanStack Query for server state, Zustand for UI state
- React Router v6 for navigation
- react-i18next with 6 languages (en, zh-CN, zh-TW, es, ja, ko)
- Dark/light/system theme support
- Type-safe API client with React Query hooks

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Implement Bot Agent system (Phase 1-4):
- Add bot gateway with IPC communication via Unix socket
- Support 5 chat platforms: Telegram, Discord, Slack, Lark, FB Messenger
- Natural language intent parsing for commands
- Process registry with auto-generated unique names
- Session management and approval workflow
- Bot configuration in zen.json with platform-specific settings

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Add GET/PUT /api/v1/bot API endpoints with token masking
- Create Bot config page with 5 tabs: General, Platforms, Interaction, Aliases, Notifications
- Support 5 chat platforms: Telegram, Discord, Slack, Lark, Facebook Messenger
- Add Collapsible UI component for platform config sections
- Add i18n translations for all 6 languages (en, zh-CN, zh-TW, es, ja, ko)

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Add comprehensive unit and integration tests for the bot package:
- gateway_test.go: Start/Stop, handleConnection, IPC message handling
- handlers_test.go: intent processing, message handling, approvals
- client_test.go: client initialization and error cases
- nlu_test.go: NLU parser for various intents and languages
- registry_test.go: process registry operations
- session_test.go: session management
- protocol_test.go: IPC protocol types
- adapters/adapter_test.go: adapter config helpers

Use short socket paths (/tmp/zen-test-*.sock) for macOS compatibility
with Unix socket 104-byte path limit.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
The custom UnmarshalJSON was missing Sync, Pricing, Budgets, Webhooks,
HealthCheck, Compression, Middleware, Agent, and Bot fields, causing
them to be nil after JSON parsing.

Also adds comprehensive tests for the bot API endpoints, bringing
internal/web coverage from 73.8% to 81.2%.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- Add Bot Gateway section to all README files (EN, zh-CN, zh-TW, es)
- Create comprehensive bot.md documentation for website with:
  - Platform setup guides (Telegram, Discord, Slack, Lark, FB Messenger)
  - Bot commands and natural language support
  - Configuration examples
  - Security best practices
- Update sidebars to include bot documentation
- Bump version to 3.0.0-alpha.3

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
@john-zhh john-zhh force-pushed the feature/v3.0-platform branch from 87f657b to 02d2e09 Compare February 19, 2026 12:51
john-zhh and others added 2 commits February 19, 2026 21:03
…oval

When IsDaemonRunning() checked if the daemon was listening on the expected
port, it would remove the PID file if the port check failed (e.g., timeout).
This made it impossible to stop the daemon later, causing upgrade and
restart commands to fail silently while the old daemon kept running.

Now IsDaemonRunning() returns the PID even when port check fails (as long
as the process is alive), and StopDaemonProcess() will attempt to stop
any alive process found in the PID file.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Add comprehensive integration tests for the daemon module covering:
- Daemon start: PID file creation, port listening, status API
- Daemon stop: process termination, PID file removal, port release
- Daemon restart: old process cleanup, PID file update
- Upgrade scenario: stopping daemon even when port check fails
- Stale PID file handling
- Graceful shutdown with active requests

These tests run against the actual binary and verify real-world behavior
that unit tests cannot catch (like the PID file removal bug fixed in
the previous commit).

Run with: go test -tags=integration ./test/integration/...

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
john-zhh and others added 7 commits February 19, 2026 21:25
Run daemon integration tests as part of CI to catch real-world
issues that unit tests cannot detect.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Proxy tests (8 tests):
- Basic routing to provider
- Failover to backup provider when primary fails
- Error handling when all providers fail
- Session persistence across requests
- Profile-based routing
- Streaming response handling
- Slow provider handling
- Invalid profile rejection

Web API tests (12 tests):
- Health endpoint
- Providers list and get
- Profiles list
- Settings get
- Daemon status
- Config reload
- Static files serving
- CORS handling
- Bindings list
- Logs endpoint

These tests use mock HTTP servers to simulate provider behavior,
allowing us to test failover, error handling, and streaming without
hitting real APIs.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
The daemon's Shutdown() runs in a goroutine when handling SIGTERM.
Previously, the process could exit before Shutdown() completed,
leaving the PID file behind. This caused flaky integration tests
in CI environments.

Now we wait for the shutdown goroutine to complete before returning
from runDaemonForeground().

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Add support for running a separate dev daemon with isolated config:

- GOZEN_CONFIG_DIR env var to specify custom config directory
- scripts/dev.sh for managing dev daemon (ports 29840/29841)
- vite.config.ts supports VITE_API_PORT env var

Usage:
  ./scripts/dev.sh          # Start dev daemon
  ./scripts/dev.sh stop     # Stop dev daemon
  ./scripts/dev.sh web      # Start frontend dev server
  ./scripts/dev.sh all      # Start both

The dev daemon uses ~/.zen-dev for config, completely isolated from
the production daemon at ~/.zen. Frontend hot-reloads while API
requests proxy to the dev daemon.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
1. Fix daemon PID file cleanup race condition:
   - Wait for shutdown goroutine to complete before exiting
   - Add fallback cleanup if Start() returns for other reasons

2. Add Nunito as primary font for Web UI and website:
   - Import from Google Fonts
   - Set as default sans-serif in Tailwind config
   - Update website SCSS

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
The auto-generated changelog from GitHub releases is not readable.
Hide the releases page until we have a better changelog generation method.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Settings page improvements:
- General tab: Add default profile/client selection (was just placeholder text)
- Bindings tab: Add "Add Binding" button with dialog
- Sync tab: Move "Sync Enabled" switch to top, add backend configuration
- Password tab: Rename to "Web UI Password" with description

Bot page fixes:
- Remove leftover __CONTINUE_*__ comments from code generation
- Fix Discord section display

Navigation:
- Swap Usage and Logs order in sidebar

Also update types and API functions to match actual backend responses.

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments