Skip to content

fix: harden generic video extension for ConvoAI compliance#2140

Open
BenWeekes wants to merge 3 commits intomainfrom
fix/generic-video-python
Open

fix: harden generic video extension for ConvoAI compliance#2140
BenWeekes wants to merge 3 commits intomainfrom
fix/generic-video-python

Conversation

@BenWeekes
Copy link
Copy Markdown
Contributor

Summary

Brings generic_video_python up to date with the latest convoai_to_video contract and to the engineering quality bar applied to deepgram_tts.

Protocol compliance:

  • Send actual AudioFrame.get_sample_rate() in WebSocket voice.sampleRate instead of static config value
  • Add area param to both session start and WebSocket init payloads
  • Send session_token in DELETE body for stop-session (required by latest contract)
  • Add vendor params passthrough — unknown keys in params dict forwarded to start/init payloads

Config normalization:

  • Migrate from @dataclass to Pydantic BaseModel with field validators
  • Canonical field names: channel (alias agora_channel_name), agora_avatar_uid (alias agora_video_uid)
  • All 14 config fields now declared in manifest.json
  • Enum validation for quality, video_encoding, area
  • activity_idle_timeout default aligned to 120

Runtime robustness:

  • Replace blocking requests with async httpx
  • Store and cancel _audio_task and listener_task in on_stop()
  • Mask API keys and tokens in all log output
  • Session cache scoped by endpoint/channel/uid hash (no more singleton /tmp file)

Tests:

  • test_config.py — aliases, masking, enum validation, params passthrough
  • test_extension.py — sample rate propagation, mismatch warning, voice_end trigger
  • test_protocol.py — start/stop/init payload shape, WebSocket messages, mock HTTP transport

Related

Test plan

  • Config loads with legacy aliases (agora_channel_name, agora_video_uid)
  • Actual frame sample rate sent, warning logged once on mismatch
  • Stop request includes both session_id and session_token in body
  • area present in start and init payloads
  • API keys masked in all log output
  • Background tasks cancelled cleanly on stop
  • Vendor params forwarded to both payloads

@github-actions
Copy link
Copy Markdown

Code Review - PR 2140 generic_video_python

Good overall uplift. The migration to Pydantic, async httpx, per-instance session cache, and comprehensive test suite are solid improvements. A few issues to address before merge.

SECURITY

Session token logged in plain text (medium)
In _stop_session, the payload from _build_stop_payload is logged directly. That dict contains session_token which is sensitive. The x-api-key header is already masked via _masked_headers but the body is not. Mask the token before logging it.

CORRECTNESS

normalize_params bypasses Pydantic validators (medium)
config.py line 196 uses setattr to overwrite model fields from the params dict. This skips @field_validator declarations entirely. For example params with quality set to an invalid value would leave self.quality invalid after normalize_params with no validation error. Use model_copy(update=updates) instead so validators are re-applied.

channel now required without verified migration path (low-medium)
validate_required now enforces channel but manifest.json only declares the canonical name channel, dropping the alias agora_channel_name. The Pydantic alias works at JSON-deserialization time, but whether it survives the get_property_to_json round-trip through the TEN property system is worth verifying. An existing deployment with agora_channel_name in its manifest could silently receive an empty channel and fail at startup.

Legacy sessions not stopped on cache upgrade (low)
In connect(), when a cached session has no token, the code clears the cache but the server-side session stays alive. Probably unavoidable given the new contract -- a short inline comment explaining the trade-off would help future readers.

MINOR / STYLE

vendor_params double-filters (nit): after normalize_params() runs, known keys are already stripped from self.params. The filter in vendor_params against _PASSTHROUGH_EXCLUDED_KEYS is redundant.

_ensure_dict dead code (nit): Pydantic coerces params to a dict before normalize_params is called so the None and non-dict branches can never fire. Consider inlining dict(self.params).

validate_required vs @model_validator (style): more idiomatic Pydantic v2 would use @model_validator(mode=after). As written validate_required is only called from create_async so tests calling model_validate directly skip the required-field check.

TEST COVERAGE

create_async not tested: all config tests call model_validate directly. The create_async path (get_property_to_json -> normalize_params -> validate_required) has no test. A mock AsyncTenEnv returning canned JSON would cover it.

Timing-based assertion is fragile: test_audio_sender_uses_actual_sample_rate_and_warns_once relies on asyncio.sleep(0.05) to let the task run. This can be flaky under CI load. An asyncio.Event or deterministic queue drain would be more robust.

POSITIVE NOTES

  • Replacing blocking requests with injectable httpx.AsyncClient is a clean testable improvement
  • Per-instance session cache keyed by a content fingerprint eliminates the singleton /tmp collision risk
  • One-shot mismatch warning via _sample_rate_mismatch_warned is good log hygiene
  • conftest.py stubs are thorough and allow tests to run without the full TEN runtime installed
  • _audio_task and listener_task are now properly tracked and cancelled on stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant