Document direct kubectl logs as fallback for /loki-query by jcschaff · Pull Request #1682 · virtualcell/vcell

jcschaff · 2026-05-06T21:05:10Z

Summary

Adds a "Direct kubectl logs (when Loki isn't enough)" section to .claude/commands/loki-query.md covering: investigations after a pod crash (--previous), real-time tail of a single pod, windows aged out of Loki retention, and the broker pods.
Documents a real operational gap discovered during the 2026-05-06 JMS-wedge investigation: activemqint, activemqsim, and artemismq all run supervisord as PID 1. The actual broker logs to a file inside the pod (/var/log/activemq/activemq.log for ActiveMQ Classic), so kubectl logs deployment/activemqint returns ~10 lines of supervisord lifecycle frozen at pod startup, and Loki's promtail isn't scraping these pods either. Documents the kubectl exec ... cat / awk workaround and notes the ~14h in-pod log rotation horizon as a gap worth fixing by reconfiguring the brokers to log to stdout.
Mirrors the Loki kubeconfig convention (LOKI_KUBECONFIG → ~/.kube/kubeconfig_vxrails.yaml), provides a discovery command for actual deployment/statefulset names, and adds a "Putting it together for a JMS-wedge incident" subsection cross-referencing client-side Loki with broker-side kubectl exec.
Note for reviewers: the same commit is also on #fix-jms-failover-wedge (cherry-picked so JMS-debugging session had the up-to-date doc). Whichever PR merges first absorbs it; the second will show no overlap.

Test plan

Read the rendered diff in the PR view to verify markdown structure (no broken code fences, no orphan list items)
Try one of the documented commands against prod (e.g. kubectl exec deployment/activemqint -- tail -50 /var/log/activemq/activemq.log) to confirm the path and command pattern match reality
Slash-command invocation: /loki-query reads its own definition, so any future incident-investigation session will pick up the new section automatically

🤖 Generated with Claude Code

Loki is good at multi-pod sweeps, structured filters, and historical queries — but not every investigation fits that. Add a section to the /loki-query slash command covering when and how to read pod logs directly: - After a pod crash/exit (--previous), e.g. when JmsFailoverWatchdog recycles a wedged pod and we need the prior container's stack trace. - For real-time tail of a single pod, or windows that have aged out of Loki retention. - For the broker pods (activemqint, activemqsim, artemismq) — verified during the 2026-05-06 wedge investigation that these run supervisord as PID 1 and the actual broker logs to a file inside the pod (/var/log/activemq/activemq.log for ActiveMQ Classic). Neither `kubectl logs` nor Loki sees those events; the only access path is `kubectl exec ... cat`. Document this gap explicitly so future investigators don't waste time on a `kubectl logs` that returns ten supervisord lifecycle lines and call it a dead end. Mirrors the Loki kubeconfig convention (LOKI_KUBECONFIG → ~/.kube/kubeconfig_vxrails.yaml), provides a discovery command for actual deployment/statefulset names, and notes the in-pod log rotation horizon (~14h on activemqint) as a real operational gap worth fixing by reconfiguring these brokers to log to stdout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jcschaff merged commit 24dd01b into master May 7, 2026
13 checks passed

jcschaff deleted the loki-query-kubectl-direct branch May 7, 2026 04:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document direct kubectl logs as fallback for /loki-query#1682

Document direct kubectl logs as fallback for /loki-query#1682
jcschaff merged 1 commit intomasterfrom
loki-query-kubectl-direct

jcschaff commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jcschaff commented May 6, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant