Skip to content

fix(gcp_pubsub source): treat expected stream closures as non-errors#25149

Open
andylibrian wants to merge 1 commit intovectordotdev:masterfrom
andylibrian:fix/gcp-pubsub-expected-stream-closure
Open

fix(gcp_pubsub source): treat expected stream closures as non-errors#25149
andylibrian wants to merge 1 commit intovectordotdev:masterfrom
andylibrian:fix/gcp-pubsub-expected-stream-closure

Conversation

@andylibrian
Copy link
Copy Markdown

@andylibrian andylibrian commented Apr 9, 2026

Summary

The gcp_pubsub source emits ERROR-level logs and increments component_errors_total when Google's Pub/Sub server closes a StreamingPull connection for an expected reason. Google's documentation describes these periodic closures as routine behavior:

"The Pub/Sub servers recurrently close the connection after a time period to avoid a long-running sticky connection. The client library automatically reopens a StreamingPull connection."

https://cloud.google.com/pubsub/docs/pull#streamingpull

The observed error:

ERROR vector::internal_events::gcp_pubsub: Failed to fetch events.
  error=status: Unavailable,
  message: "The StreamingPull stream closed for an expected reason and should be recreated..."

This causes false alerts, inflates component_errors_total, and introduces an unnecessary retry_delay_secs (default 1s) pause before reconnecting.

Root cause: translate_error() only special-cases HTTP/2-level resets via is_reset(), which inspects the hyper::Errorh2::Error source chain. The expected closure arrives as a gRPC-level tonic::Status with Code::Unavailable, which is_reset() cannot detect. It falls through to the else branch, emitting GcpPubsubReceiveError (ERROR + metric) and returning State::RetryDelay.

Fix: Add an is_expected_closure() predicate that checks for Code::Unavailable with a message starting with "The StreamingPull stream closed for an expected reason". When matched, log at debug! level and return State::RetryNow for immediate reconnection — following the same pattern as the existing is_reset() handler.

If Google ever changes the message text, detection stops and behavior safely regresses to the current ERROR + delay — not a new failure mode.

Vector configuration

sources:
  gcp_pubsub_source:
    type: gcp_pubsub
    project: *redacted*
    subscription: vector

transforms:
  parse_logs:
    type: remap
    inputs:
      - gcp_pubsub_source
    source: |
      . = parse_json!(.message)

sinks:
  vmlogs:
    type: elasticsearch
    inputs:
      - parse_logs
    endpoints:
      - http://vlinsert:9481/insert/elasticsearch/
    api_version: v8

How did you test this PR?

Unit tests — 4 new tests in src/sources/gcp_pubsub.rs:

  • expected_closure_matches_unavailable_with_known_message — predicate matches expected closure
  • expected_closure_does_not_match_different_message — predicate rejects other Unavailable messages
  • expected_closure_does_not_match_different_code — predicate rejects non-Unavailable codes
  • translate_error_retries_now_on_expected_closure — translate_error returns State::RetryNow

Real GKE test — two-phase test on a GKE cluster consuming from GCP Pub/Sub:

  1. Phase 1 (baseline): Deployed unmodified master (v0.55.0).
    Confirmed ERROR logs and component_errors_total increments continued at the expected rate. Also observed a genuine Unavailable error ("The service was unable to fulfill your request") which is correctly NOT matched by this fix.
  2. Phase 2 (fix): Deployed this branch. After 1+ hour of monitoring: zero Failed to fetch events ERROR logs from the fix pod, while streams continued scaling up normally (concurrency 1→3), confirming expected closures are handled silently with immediate reconnect.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Closes #25151

Files to create/modify

  • changelog.d/22304_gcp_pubsub_expected_closure.fix.md
  • src/sources/gcp_pubsub.rs

Verification

make fmt
make check-fmt
make check-clippy
./scripts/check_changelog_fragments.sh
cargo test -p vector --lib sources::gcp_pubsub --no-default-features --features sources-gcp_pubsub

@andylibrian andylibrian requested a review from a team as a code owner April 9, 2026 06:53
Copilot AI review requested due to automatic review settings April 9, 2026 06:53
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the gcp_pubsub source error handling so routine Pub/Sub StreamingPull stream closures are treated as expected behavior (avoiding ERROR logs, component_errors_total increments, and retry delays), aligning Vector’s behavior with Google’s documented StreamingPull lifecycle.

Changes:

  • Added is_expected_closure() to detect expected Pub/Sub StreamingPull closures (gRPC Unavailable with known message prefix).
  • Updated translate_error() to immediately reconnect (State::RetryNow) and log at debug! level for expected closures.
  • Added unit tests covering the new predicate and translate_error() behavior, plus a changelog fragment.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/sources/gcp_pubsub.rs Detects expected StreamingPull closures and retries immediately without emitting error events; adds unit tests.
changelog.d/22304_gcp_pubsub_expected_closure.fix.md Documents the user-facing change in logging/metrics/retry behavior for expected closures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@andylibrian andylibrian force-pushed the fix/gcp-pubsub-expected-stream-closure branch from 98087f9 to 86eac3a Compare April 9, 2026 07:05
@andylibrian
Copy link
Copy Markdown
Author

andylibrian commented Apr 9, 2026

recheck

@andylibrian
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@andylibrian andylibrian force-pushed the fix/gcp-pubsub-expected-stream-closure branch from 86eac3a to cb81dc6 Compare April 9, 2026 07:43
GCP Pub/Sub sends UNAVAILABLE status with message "The StreamingPull
stream closed for an expected reason" during routine stream management
(subscription changes, load balancing). Previously this was emitted as
a GcpPubsubReceiveError and retried with a delay. Now it is recognized
as benign and retried immediately, matching the existing is_reset()
behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andylibrian andylibrian force-pushed the fix/gcp-pubsub-expected-stream-closure branch from cb81dc6 to 3d97686 Compare April 13, 2026 10:05
@pront
Copy link
Copy Markdown
Member

pront commented Apr 13, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gcp_pubsub source: expected StreamingPull stream closures logged as ERROR

3 participants