Skip to content

Document direct kubectl logs as fallback for /loki-query#1682

Merged
jcschaff merged 1 commit intomasterfrom
loki-query-kubectl-direct
May 7, 2026
Merged

Document direct kubectl logs as fallback for /loki-query#1682
jcschaff merged 1 commit intomasterfrom
loki-query-kubectl-direct

Conversation

@jcschaff
Copy link
Copy Markdown
Member

@jcschaff jcschaff commented May 6, 2026

Summary

  • Adds a "Direct kubectl logs (when Loki isn't enough)" section to .claude/commands/loki-query.md covering: investigations after a pod crash (--previous), real-time tail of a single pod, windows aged out of Loki retention, and the broker pods.
  • Documents a real operational gap discovered during the 2026-05-06 JMS-wedge investigation: activemqint, activemqsim, and artemismq all run supervisord as PID 1. The actual broker logs to a file inside the pod (/var/log/activemq/activemq.log for ActiveMQ Classic), so kubectl logs deployment/activemqint returns ~10 lines of supervisord lifecycle frozen at pod startup, and Loki's promtail isn't scraping these pods either. Documents the kubectl exec ... cat / awk workaround and notes the ~14h in-pod log rotation horizon as a gap worth fixing by reconfiguring the brokers to log to stdout.
  • Mirrors the Loki kubeconfig convention (LOKI_KUBECONFIG~/.kube/kubeconfig_vxrails.yaml), provides a discovery command for actual deployment/statefulset names, and adds a "Putting it together for a JMS-wedge incident" subsection cross-referencing client-side Loki with broker-side kubectl exec.
  • Note for reviewers: the same commit is also on #fix-jms-failover-wedge (cherry-picked so JMS-debugging session had the up-to-date doc). Whichever PR merges first absorbs it; the second will show no overlap.

Test plan

  • Read the rendered diff in the PR view to verify markdown structure (no broken code fences, no orphan list items)
  • Try one of the documented commands against prod (e.g. kubectl exec deployment/activemqint -- tail -50 /var/log/activemq/activemq.log) to confirm the path and command pattern match reality
  • Slash-command invocation: /loki-query reads its own definition, so any future incident-investigation session will pick up the new section automatically

🤖 Generated with Claude Code

Loki is good at multi-pod sweeps, structured filters, and historical
queries — but not every investigation fits that. Add a section to the
/loki-query slash command covering when and how to read pod logs
directly:

- After a pod crash/exit (--previous), e.g. when JmsFailoverWatchdog
  recycles a wedged pod and we need the prior container's stack trace.
- For real-time tail of a single pod, or windows that have aged out of
  Loki retention.
- For the broker pods (activemqint, activemqsim, artemismq) — verified
  during the 2026-05-06 wedge investigation that these run supervisord
  as PID 1 and the actual broker logs to a file inside the pod
  (/var/log/activemq/activemq.log for ActiveMQ Classic). Neither
  `kubectl logs` nor Loki sees those events; the only access path is
  `kubectl exec ... cat`. Document this gap explicitly so future
  investigators don't waste time on a `kubectl logs` that returns ten
  supervisord lifecycle lines and call it a dead end.

Mirrors the Loki kubeconfig convention (LOKI_KUBECONFIG →
~/.kube/kubeconfig_vxrails.yaml), provides a discovery command for
actual deployment/statefulset names, and notes the in-pod log rotation
horizon (~14h on activemqint) as a real operational gap worth fixing
by reconfiguring these brokers to log to stdout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jcschaff jcschaff merged commit 24dd01b into master May 7, 2026
13 checks passed
@jcschaff jcschaff deleted the loki-query-kubectl-direct branch May 7, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant