Skip to content

Telemetry

Lisa edited this page Dec 25, 2025 · 8 revisions

Runtime Telemetry (v6.4)

CKB integrates with OpenTelemetry to answer the question static analysis can't: "Is this code actually used in production?"

By ingesting runtime metrics, CKB can:

  • Detect dead code with high confidence
  • Show actual call counts for any symbol
  • Enrich impact analysis with observed callers
  • Distinguish between "no static references" and "truly unused"

Quick Start

1. Enable Telemetry

Add to .ckb/config.json:

{
  "telemetry": {
    "enabled": true,
    "serviceMap": {
      "my-api-service": "my-repo"
    }
  }
}

2. Configure Your OpenTelemetry Collector

Point your collector's exporter at CKB:

# otel-collector-config.yaml
exporters:
  otlphttp:
    endpoint: "http://localhost:9120/v1/metrics"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]

3. Verify It's Working

ckb telemetry status

You should see:

Telemetry Status
  Enabled: true
  Last Sync: 2 minutes ago

Coverage:
  Symbol: 78% (23,456 of 30,123 symbols have telemetry)
  Service: 100% (3 of 3 repos mapped)

Coverage Level: HIGH

Coverage Levels

CKB requires sufficient telemetry coverage to enable certain features:

Coverage Symbol % What's Available
High ≥ 70% Full dead code detection, high-confidence verdicts
Medium 40-69% Dead code detection with caveats, observed usage
Low 10-39% Basic observed usage only
Insufficient < 10% Telemetry features disabled

Check your coverage:

ckb telemetry status

Finding Dead Code

With telemetry enabled, you can find code that's never called in production:

# Find dead code candidates (requires medium+ coverage)
ckb dead-code --min-confidence 0.7

# Scope to a specific module
ckb dead-code --scope internal/legacy

# Include low-confidence results
ckb dead-code --min-confidence 0.5

Understanding Confidence

Dead code confidence combines:

  • Static analysis — No references found in code
  • Telemetry — Zero calls observed over the period
  • Match quality — How well telemetry maps to symbols
Confidence Meaning
0.9+ High confidence dead code — safe to remove
0.7-0.9 Likely dead — verify before removing
0.5-0.7 Possibly dead — investigate further
< 0.5 Uncertain — may have dynamic callers

Checking Symbol Usage

Get observed usage for any symbol:

ckb telemetry usage --symbol "internal/api/handler.go:HandleRequest"

Output:

Symbol: HandleRequest
Period: Last 90 days

Observed Usage:
  Total Calls: 1,247,832
  Daily Average: 13,864
  Trend: stable

Match Quality: exact
Last Seen: 2 hours ago

Match Quality Levels

Quality Meaning
exact Symbol name matches telemetry span exactly
strong High-confidence fuzzy match
weak Low-confidence match — verify manually

Service Mapping

CKB needs to know which telemetry service corresponds to which repository.

Explicit Mapping (Recommended)

{
  "telemetry": {
    "serviceMap": {
      "api-gateway": "api-repo",
      "user-service": "users-repo",
      "payment-worker": "payments-repo"
    }
  }
}

Pattern-Based Mapping

For services that follow naming conventions, use regex patterns:

{
  "telemetry": {
    "servicePatterns": [
      {
        "pattern": "^order-.*$",
        "repo": "repo-orders"
      },
      {
        "pattern": "^inventory-.*$",
        "repo": "repo-inventory"
      }
    ]
  }
}

Resolution Order

  1. Explicit serviceMap entry
  2. Pattern match in servicePatterns (first match wins)
  3. ckb_repo_id attribute in telemetry payload
  4. Service name matches repo name exactly

Debugging Unmapped Services

# See which services aren't mapped
ckb telemetry unmapped

# Test if a service name would map
ckb telemetry test-map "my-service-name"

Telemetry-Enhanced Impact Analysis

When telemetry is enabled, analyzeImpact includes observed callers:

# Via CLI
ckb impact <symbol-id> --include-telemetry

# Via MCP
analyzeImpact({ symbolId: "...", includeTelemetry: true })

The response includes:

  • observedCallers — Services that call this symbol at runtime
  • blendedConfidence — Combines static + observed confidence
  • observedOnly — Callers found only via telemetry (not in code)

This catches cases where static analysis misses dynamic dispatch or reflection.


Configuration Reference

Full configuration options:

{
  "telemetry": {
    "enabled": true,

    "serviceMap": {
      "service-name": "repo-id"
    },

    "servicePatterns": [
      { "pattern": "^order-.*$", "repo": "repo-orders" }
    ],

    "aggregation": {
      "bucketSize": "weekly",
      "retentionDays": 180,
      "minCallsToStore": 1,
      "storeCallers": false,
      "maxCallersPerSymbol": 20
    },

    "deadCode": {
      "enabled": true,
      "minObservationDays": 90,
      "excludePatterns": ["**/test/**", "**/migrations/**"],
      "excludeFunctions": ["*Migration*", "Test*", "*Scheduled*"]
    },

    "privacy": {
      "redactCallerNames": false,
      "logUnmatchedEvents": true
    }
  }
}
Setting Default Description
enabled false Enable telemetry features
serviceMap {} Maps service names to repo IDs
servicePatterns [] Regex patterns for service mapping
aggregation.bucketSize "weekly" Aggregation bucket size ("daily", "weekly", "monthly")
aggregation.retentionDays 180 Days to retain telemetry data
aggregation.minCallsToStore 1 Minimum calls to store (filter noise)
aggregation.storeCallers false Store caller service names
aggregation.maxCallersPerSymbol 20 Max callers to store per symbol
deadCode.enabled true Enable dead code detection
deadCode.minObservationDays 90 Minimum days of data before reporting
deadCode.excludePatterns [...] Path glob patterns to exclude
deadCode.excludeFunctions [...] Function name patterns to exclude
privacy.redactCallerNames false Redact caller service names in storage
privacy.logUnmatchedEvents true Log events that couldn't be matched

OTEL Collector Configuration

CKB accepts telemetry via OTLP. Configure your OpenTelemetry Collector:

# otel-collector-config.yaml
exporters:
  otlphttp/ckb:
    endpoint: "http://localhost:9120"
    tls:
      insecure: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/ckb]

Required Metric: calls counter with these attributes:

  • code.function (required) — Function name
  • code.filepath (recommended) — Source file path
  • code.namespace (recommended) — Package/namespace
  • code.lineno (optional) — Line number for exact matching

Resource Attributes:

  • service.name (required) — Maps to repo via serviceMap
  • service.version (optional) — For trend analysis

CLI Commands

# Check status and coverage
ckb telemetry status

# Get usage for a symbol
ckb telemetry usage --symbol "pkg/handler.go:HandleRequest"

# List unmapped services
ckb telemetry unmapped

# Test service name mapping
ckb telemetry test-map "my-service"

# Find dead code
ckb dead-code [--min-confidence 0.7] [--scope module]

MCP Tools

Tool Purpose
getTelemetryStatus Coverage metrics and sync status
getObservedUsage Runtime usage for a symbol
findDeadCodeCandidates Symbols with zero runtime calls

Enhanced tools:

  • analyzeImpact — Add includeTelemetry: true for observed callers
  • getHotspots — Includes observedUsage when telemetry enabled

HTTP API

# Get status
curl http://localhost:8080/telemetry/status

# Get symbol usage
curl "http://localhost:8080/telemetry/usage/SYMBOL_ID?period=30d"

# Find dead code
curl "http://localhost:8080/telemetry/dead-code?minConfidence=0.7"

# List unmapped services
curl http://localhost:8080/telemetry/unmapped

# OTLP ingest endpoint (for collectors)
POST http://localhost:9120/v1/metrics

Troubleshooting

"Telemetry not enabled"

Add to .ckb/config.json:

{ "telemetry": { "enabled": true } }

"Coverage insufficient"

Your instrumentation may not cover enough symbols. Check:

  • Are all services sending telemetry?
  • Is serviceMap configured correctly?
  • Run ckb telemetry unmapped to find gaps

"No data for symbol"

Possible causes:

  • Symbol isn't called at runtime (it may actually be dead)
  • Service mapping is wrong
  • Telemetry span names don't match symbol names

Debug with:

ckb telemetry test-map "your-service-name"

High latency on telemetry queries

Reduce retention or use monthly aggregation:

{
  "telemetry": {
    "aggregation": {
      "retentionDays": 90,
      "bucketSize": "monthly"
    }
  }
}

Best Practices

  1. Start with explicit serviceMap — Don't rely on auto-detection
  2. Check coverage before trusting dead-code — Medium+ coverage required
  3. Use 90-day periods — Catches infrequent code paths (monthly jobs, etc.)
  4. Verify before deleting — Even high-confidence dead code should be reviewed
  5. Monitor unmapped services — New services need to be added to serviceMap

Wide-Result Metrics (v7.4)

In addition to runtime telemetry, CKB tracks internal metrics for MCP wide-result tools. This helps identify which tools experience heavy truncation and may benefit from Frontier mode.

What's Tracked

For each wide-result tool invocation:

  • Tool name — findReferences, searchSymbols, analyzeImpact, getCallGraph, getHotspots, summarizePr
  • Total results — How many results were found
  • Returned results — How many were returned after truncation
  • Truncation count — How many were dropped
  • Response bytes — Actual JSON response size in bytes
  • Estimated tokens — Approximate token cost (~4 bytes per token)
  • Execution time — Latency in milliseconds

Metrics are stored in SQLite (.ckb/ckb.db) and persist across MCP sessions.

CLI: ckb metrics

# Last 7 days (default)
ckb metrics

# Last 30 days
ckb metrics --days=30

# Filter to specific tool
ckb metrics --tool=findReferences

# Human-readable format
ckb metrics --format=human

# Export for version comparison
ckb metrics export --version=v7.4 > benchmarks/baseline.json
ckb metrics export --version=v7.5 --output=benchmarks/v7.5.json

Exporting for Version Comparison

Use ckb metrics export to create versioned snapshots for comparing across releases:

# Before v7.5 Frontier release
ckb metrics export --version=v7.4 > benchmarks/v7.4-baseline.json

# After Frontier implementation
ckb metrics export --version=v7.5 > benchmarks/v7.5-frontier.json

# Compare
diff benchmarks/v7.4-baseline.json benchmarks/v7.5-frontier.json

Export includes:

  • version — Your custom version tag
  • ckbVersion — Actual CKB version (e.g., "7.4.0")
  • exportedAt — ISO 8601 timestamp
  • period / since — Time window for the data

Example export output:

{
  "version": "v7.4",
  "ckbVersion": "7.4.0",
  "exportedAt": "2025-12-23T13:20:30Z",
  "period": "last 30 days",
  "since": "2025-11-23",
  "tools": [
    {
      "name": "searchSymbols",
      "queryCount": 312,
      "totalResults": 15234,
      "totalReturned": 8456,
      "totalTruncated": 6778,
      "truncationRate": 0.445,
      "totalBytes": 4780000,
      "avgBytes": 15321,
      "avgTokens": 3830,
      "avgLatencyMs": 125,
      "needsFrontier": true
    },
    {
      "name": "getCallGraph",
      "queryCount": 189,
      "totalResults": 2341,
      "totalReturned": 2341,
      "totalTruncated": 0,
      "truncationRate": 0,
      "totalBytes": 890000,
      "avgBytes": 4708,
      "avgTokens": 1177,
      "avgLatencyMs": 32,
      "needsFrontier": false
    }
  ]
}

The needsFrontier flag is true when truncation rate exceeds 30%.

v7.4 Telemetry Findings

Initial telemetry data from CKB's own usage shows:

Tool Truncation Rate Needs Frontier?
searchSymbols 45% Yes
getHotspots 50% Yes
findReferences 18% No
getCallGraph 0% No
analyzeImpact 0% No

Conclusion: Frontier mode is worth implementing for searchSymbols and getHotspots only. The other tools rarely or never truncate with current limits.

Byte Tracking

Response bytes are measured by JSON-marshaling the response data before sending. This captures the actual payload size consumed by the LLM context window.

Typical response sizes observed:

Tool Avg Response Avg Tokens
searchSymbols 15-43 KB 4,000-11,000
getHotspots 20-40 KB 5,000-10,000
findReferences 8-15 KB 2,000-4,000
getCallGraph 5-10 KB 1,250-2,500
analyzeImpact 2-5 KB 500-1,250

This data helps measure the actual impact of Frontier mode by comparing bytes before/after pagination.

MCP Tool

getWideResultMetrics

Returns the same aggregated metrics via MCP. Useful for AI-driven analysis of tool performance.

Interpreting Results

Truncation Rate Recommendation
< 10% Tool is performing well, no action needed
10-30% Monitor usage patterns
> 30% Consider Frontier mode for this tool
> 50% Frontier mode strongly recommended

Data Retention

Metrics are retained for 90 days by default. Old records are cleaned up automatically.

Architecture

RecordWideResult() → In-memory aggregator + SQLite persistence
                                    ↓
          ckb metrics CLI ← GetWideResultAggregates()
                                    ↓
       getWideResultMetrics MCP ← Same data via MCP

Persistence is non-blocking (async writes) to avoid impacting tool latency.


Related Pages

Clone this wiki locally