Bug
ModelOutputThunk.usage is always None for the HuggingFace backend unless telemetry tracing or MELLEA_METRICS_ENABLED=true is set.
Root cause
Introduced in #653 (token metrics hooks refactor). In huggingface.py::_post_process_async, token count extraction was gated behind (span is not None or metrics_enabled), so mot.usage is never populated in plain runs.
Fix
Remove the telemetry guard from token count extraction — usage is a standard mot field, not a telemetry concern. Telemetry reporting further down still checks span/metrics_enabled as before.
# Before
if (span is not None or metrics_enabled) and isinstance(hf_output, GenerateDecoderOnlyOutput):
...
# After
if isinstance(hf_output, GenerateDecoderOnlyOutput):
...
Impact
test/backends/test_huggingface.py::test_async_avalue fails asserting mot1.usage is not None.
Bug
ModelOutputThunk.usageis alwaysNonefor the HuggingFace backend unless telemetry tracing orMELLEA_METRICS_ENABLED=trueis set.Root cause
Introduced in #653 (token metrics hooks refactor). In
huggingface.py::_post_process_async, token count extraction was gated behind(span is not None or metrics_enabled), somot.usageis never populated in plain runs.Fix
Remove the telemetry guard from token count extraction — usage is a standard
motfield, not a telemetry concern. Telemetry reporting further down still checksspan/metrics_enabledas before.Impact
test/backends/test_huggingface.py::test_async_avaluefails assertingmot1.usage is not None.