Skip to content

bug: GraphQL subscription schema mismatch breaks bot startups #307

@flexus-teams

Description

@flexus-teams

Original Logs

20260410 11:16:25.721 fclnt [INFO] FlexusClient service_name=bob_60008_r_ api_key=None http://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:16:25.722 stexe [INFO] Connecting ws://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:16:25.781 btexe [ERROR] 🛑 That looks bad, my key doesn't work: {'message': "403: Whoops your key didn't work (2).", 'locations': [{'line': 2, 'column': 3}], 'path': ['bot_confirm_exists']}
20260410 11:16:25.786 stexe [INFO] got TransportQueryError (attempt 1/3), sleep 60...
20260410 11:17:25.787 stexe [INFO] Connecting ws://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:17:25.819 btexe [INFO] i_am_still_alive bob:60008 group_id=None
20260410 11:17:25.841 stexe [INFO] got TransportQueryError (attempt 2/3), sleep 60...
20260410 11:18:25.843 stexe [INFO] Connecting ws://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:18:25.885 btexe [ERROR] 🛑 3 exceptions in 5 min, exiting
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_service_exec.py", line 26, in run_typical_single_subscription_with_restart_on_network_errors
    await subscribe_and_do_something(fclient, ws_client, *func_args, **func_kwargs)
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_bot_exec.py", line 412, in subscribe_and_produce_callbacks
    async for r in ws.subscribe(
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1426, in subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1337, in _subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 298, in subscribe
    answer_type, execution_result = await listener.get()
                                    ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/listener_queue.py", line 35, in get
    raise item
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 221, in _receive_data_loop
    answer_type, answer_id, execution_result = self._parse_answer(
                                               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 423, in _parse_answer
    return self._parse_answer_apollo(json_answer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 384, in _parse_answer_apollo
    raise TransportQueryError(

Error Summary

Multiple bot pods in isolated crashloop during startup. The common failure is a GraphQL subscription schema mismatch against FBotThreadsCallsTasks:
Cannot query field 'news_payload_task' ... Did you mean 'news_payload_task_new' / 'news_payload_task_old'?

Observed affected pods include bob, frog, karen, lawyerrat, vix, boss, strategist, productman, admonster, clerkwing, researcher, botticelli, executor.

Stacktrace

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_service_exec.py", line 26, in run_typical_single_subscription_with_restart_on_network_errors
    await subscribe_and_do_something(fclient, ws_client, *func_args, **func_kwargs)
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_bot_exec.py", line 412, in subscribe_and_produce_callbacks
    async for r in ws.subscribe(
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1426, in subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1337, in _subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 298, in subscribe
    answer_type, execution_result = await listener.get()
                                    ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/listener_queue.py", line 35, in get
    raise item
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 221, in _receive_data_loop
    answer_type, answer_id, execution_result = self._parse_answer(
                                               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 423, in _parse_answer
    return self._parse_answer_apollo(json_answer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 384, in _parse_answer_apollo
    raise TransportQueryError(

Root Cause

  • File: flexus_client_kit/ckit_bot_exec.py:436
  • Function: subscribe_and_produce_callbacks
  • Why: the generated subscription still requests news_payload_task from FBotThreadsCallsTasks, but the dataclass/schema now exposes news_payload_task_new and news_payload_task_old instead. This breaks startup subscriptions for bots using this client kit build.
  • Git blame: @oleg Klimov in 87119dc / caff82d4 (schema migration and subscription update)

Code Snippet

bot_threads_calls_tasks(...)
    {
        {gql_utils.gql_fields(ckit_bot_query.FBotThreadsCallsTasks)}
    }

Affected

  • Pods: bob, frog, karen, lawyerrat, vix, boss, strategist, productman, admonster, clerkwing, researcher, botticelli, executor
  • Namespace: isolated
  • Occurrences: repeated CrashLoopBackOff on startup

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions