refactor: improve fancylogger implementation#792
refactor: improve fancylogger implementation#792AngeloDanducci wants to merge 2 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
jakelorocco
left a comment
There was a problem hiding this comment.
Finished my initial code review. Still need to test out the logging myself and look a little deeper as well; but figured I'd get these comments to you.
| ``FLOG`` | ||
| When set, log records are forwarded to a local REST endpoint. |
There was a problem hiding this comment.
Should this be prepended with MELLEA? I don't know if this is a common flag that gets set system wide and that multiple programs adhere to.
| ``DEBUG`` | ||
| Legacy flag — equivalent to ``MELLEA_LOG_LEVEL=DEBUG``. |
There was a problem hiding this comment.
I think we should just drop this if we are only maintaining it for backwards compatibility. We can make an announcement to help change user adoption.
| return formatter.format(record) | ||
|
|
||
|
|
||
| class FancyLogger: |
There was a problem hiding this comment.
I somewhat feel that we should rename this to something more sensible? Or instantiate with a better name? Ie something like DefaultMelleaLogger?
I don't know exactly what this looks like, but I know other projects utilize multiple different loggers / streams. I think we should keep our options open to that if needed in the future, and I think naming this class something more descriptive might help with that.
There was a problem hiding this comment.
I agree that we should rename this. It was something I had meant to ask the original team when I got to this work.
| Environment variables | ||
| --------------------- | ||
| ``MELLEA_LOG_LEVEL`` | ||
| Minimum log level name (e.g. ``DEBUG``, ``INFO``, ``WARNING``). Defaults to | ||
| ``INFO``. The legacy ``DEBUG`` variable is still honoured as a fallback. | ||
| ``MELLEA_LOG_JSON`` | ||
| Set to any truthy value (``1``, ``true``, ``yes``) to emit structured JSON on | ||
| the console instead of the colour-coded human-readable format. | ||
| ``FLOG`` | ||
| When set, log records are forwarded to a local REST endpoint. |
There was a problem hiding this comment.
We should also add all these env vars to the docs somewhere if they aren't already there.
jakelorocco
left a comment
There was a problem hiding this comment.
A few more things:
- Slightly orthogonal, we currently use tqdm at some points and we utilize log level to determine if we should silence that output. If a good solution to this is outside the scope of your work, can you please open an issue for it?
- We should include examples of the logs somewhere.
- We should include a best practices document / skill for Mellea developers so that we are consistent with what types of fields get added, how things are formatted going forwards, and what type of events get dedicated logs.
- Do we have plans to nest details / contexts? For instance, if I run a simple
m.instructcall with requirements, I get very simple output:
{"timestamp": "2026-04-08T11:11:25", "level": "INFO", "message": "SUCCESS", "module": "base", "function": "sample", "line_number": 258, "process_id": 73738, "thread_id": 6179762176}
It would be more helpful if it tracked what sampling strategy, what instruct call, potentially even what session, etc...
Are these future potential improvements outside the scope of this PR?
-
Should our logger be catching logger errors? I got one when an item I passed in wasn't already in string form:
TypeError: not all arguments converted during string formatting -
Can you add a test that ensures our logger works with the hooks/plugin system? This might just mean changing the logger used in the conftest.py hooks to be our FancyLogger.
|
So I haven't finished my review yet, but as I won't have time to return to my review till after lunch tomorrow I figured I'd share the issues Claude found for now as well as the comment I made above, I had it review the code as well as how it related to the issue and the epic, it may not be worth addressing all of these, but at least looking into them: Code Review1. The parent epic #442 explicitly calls for 2.
3.
4. Tests import private name The test 5. Test classes that reset the Several test classes manually reset 6. No handling for formatter A |
planetf1
left a comment
There was a problem hiding this comment.
Async behaviour needs checking (nasty to find later) - and mellea/stdlib/session.py:48 already uses contextvars.ContextVar for exactly the same pattern
| # --------------------------------------------------------------------------- | ||
| # Thread-local storage for per-request context fields | ||
| # --------------------------------------------------------------------------- | ||
| _context_local: threading.local = threading.local() |
There was a problem hiding this comment.
threading.local causes context contamination across async coroutines**
In a standard asyncio event loop, all coroutines share one OS thread — and therefore one threading.local namespace. This means concurrent requests contaminate each other's log context:
- Coroutine A enters
log_context(trace_id="request-A") - Coroutine A hits
awaitand yields control - Coroutine B enters
log_context(trace_id="request-B")— overwrites the same_context_local.fieldsdict - Coroutine A resumes and logs — sees
trace_id="request-B"(wrong request) - Either coroutine's
finallycleanup corrupts the other's state
Since mellea's backends are async-first (astream, async sampling), this is the primary execution path.
The fix is contextvars.ContextVar, which gives each asyncio.Task its own isolated copy:
import contextvars
_log_context: contextvars.ContextVar[dict[str, Any]] = contextvars.ContextVar(
"log_context_fields", default={}
)
Then set_log_context does _log_context.set({**_log_context.get(), **fields}), and log_context uses token-based restore:
token = _log_context.set({**_log_context.get(), **fields})
try:
yield
finally:
_log_context.reset(token)
This matters especially because set_log_context, log_context, and clear_log_context are now exported public API from mellea.core
- majority_voting.py:129-143 — asyncio.create_task per sample, then asyncio.gather(*tasks). Multiple sampling calls run concurrently on the same event loop thread. If any of them set log context (e.g., a trace_id per sample), they'd overwrite each other.
- Every backend (ollama, openai, huggingface, watsonx, litellm) uses asyncio.create_task for generation — so concurrent astream calls from different requests share the same threading.local.
- asyncio.gather in async_helpers.py:66 and core/backend.py:186,208 — multiple coroutines gathered in parallel, all on one thread, all sharing _context_local.fields.
The exact scenario from the draft (coroutine A sets context, awaits, coroutine B overwrites it) would happen any time two concurrent requests use log_context. The create_task + gather pattern is the norm here, not the exception.| ) | ||
| response.raise_for_status() | ||
| except requests.exceptions.RequestException as _: | ||
| pass |
There was a problem hiding this comment.
Interestingly our AGENTS.md says to never allow silent exception. However in this case it seems to be an appropriate exception - and consistent with the rest of the logging code, to avoid cascading failures which I think is the right intent. Just recording as an observation ..
Misc PR
Type of PR
Description
FancyLoggerimplementation with better JSON formatting, configurable log levels, and preparation for trace/context injection #457Testing