refactor: improve fancylogger implementation by AngeloDanducci · Pull Request #792 · generative-computing/mellea

AngeloDanducci · 2026-04-08T13:47:48Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes Improve the existing FancyLogger implementation with better JSON formatting, configurable log levels, and preparation for trace/context injection #457

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

github-actions · 2026-04-08T13:48:02Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

jakelorocco

Finished my initial code review. Still need to test out the logging myself and look a little deeper as well; but figured I'd get these comments to you.

jakelorocco · 2026-04-08T14:17:01Z

mellea/core/utils.py

+``FLOG``
+    When set, log records are forwarded to a local REST endpoint.


Should this be prepended with MELLEA? I don't know if this is a common flag that gets set system wide and that multiple programs adhere to.

jakelorocco · 2026-04-08T14:17:50Z

mellea/core/utils.py

+``DEBUG``
+    Legacy flag — equivalent to ``MELLEA_LOG_LEVEL=DEBUG``.


I think we should just drop this if we are only maintaining it for backwards compatibility. We can make an announcement to help change user adoption.

jakelorocco · 2026-04-08T14:28:20Z

mellea/core/utils.py

        return formatter.format(record)


 class FancyLogger:


I somewhat feel that we should rename this to something more sensible? Or instantiate with a better name? Ie something like DefaultMelleaLogger?

I don't know exactly what this looks like, but I know other projects utilize multiple different loggers / streams. I think we should keep our options open to that if needed in the future, and I think naming this class something more descriptive might help with that.

I agree that we should rename this. It was something I had meant to ask the original team when I got to this work.

jakelorocco · 2026-04-08T14:37:05Z

mellea/core/utils.py

+Environment variables
+---------------------
+``MELLEA_LOG_LEVEL``
+    Minimum log level name (e.g. ``DEBUG``, ``INFO``, ``WARNING``).  Defaults to
+    ``INFO``.  The legacy ``DEBUG`` variable is still honoured as a fallback.
+``MELLEA_LOG_JSON``
+    Set to any truthy value (``1``, ``true``, ``yes``) to emit structured JSON on
+    the console instead of the colour-coded human-readable format.
+``FLOG``
+    When set, log records are forwarded to a local REST endpoint.


We should also add all these env vars to the docs somewhere if they aren't already there.

jakelorocco

A few more things:

Slightly orthogonal, we currently use tqdm at some points and we utilize log level to determine if we should silence that output. If a good solution to this is outside the scope of your work, can you please open an issue for it?
We should include examples of the logs somewhere.
We should include a best practices document / skill for Mellea developers so that we are consistent with what types of fields get added, how things are formatted going forwards, and what type of events get dedicated logs.
Do we have plans to nest details / contexts? For instance, if I run a simple m.instruct call with requirements, I get very simple output:

{"timestamp": "2026-04-08T11:11:25", "level": "INFO", "message": "SUCCESS", "module": "base", "function": "sample", "line_number": 258, "process_id": 73738, "thread_id": 6179762176}

It would be more helpful if it tracked what sampling strategy, what instruct call, potentially even what session, etc...
Are these future potential improvements outside the scope of this PR?

Should our logger be catching logger errors? I got one when an item I passed in wasn't already in string form: TypeError: not all arguments converted during string formatting
Can you add a test that ensures our logger works with the hooks/plugin system? This might just mean changing the logger used in the conftest.py hooks to be our FancyLogger.

ajbozarth · 2026-04-08T22:21:13Z

So I haven't finished my review yet, but as I won't have time to return to my review till after lunch tomorrow I figured I'd share the issues Claude found for now as well as the comment I made above, I had it review the code as well as how it related to the issue and the epic, it may not be worth addressing all of these, but at least looking into them:

Code Review

1. threading.local breaks async context propagation

The parent epic #442 explicitly calls for contextvars integration. threading.local is invisible to asyncio coroutines — context won't propagate across asyncio.create_task or thread pool boundaries. Given mellea's async backends and astream, this is a real gap. At minimum document it as a known limitation; ideally use contextvars.ContextVar or file a follow-up issue.

2. _build_log_dict is a private method accessed from outside its class

RESTHandler.emit calls formatter._build_log_dict(record) directly, which crosses the _ naming contract. If the dict representation is needed externally, either make it public (build_log_dict) or add a format_as_dict(record) method to JsonFormatter.

3. include_fields only filters core fields — behavior is undocumented

extra_fields and context fields bypass include_fields filtering, so it's not a true whitelist. The current tests assert this as intentional but the docstring doesn't document it. Users passing include_fields will reasonably expect it to be exhaustive — worth clarifying.

4. Tests import private name _RESERVED_LOG_RECORD_ATTRS

The test test_reserved_set_is_non_empty imports and checks _RESERVED_LOG_RECORD_ATTRS directly. Tests on private symbols make refactoring harder. The check (len(...) > 10) could be replaced with a test on a public constant, or the name made public.

5. Test classes that reset the FancyLogger singleton are not marked

Several test classes manually reset FancyLogger.logger = None and clear handlers. These are unsafe under parallel test execution (e.g. pytest-xdist) and should carry a @pytest.mark.unit marker. The file also has no module-level marker comment per the project's convention.

6. No handling for formatter TypeError on malformed log calls

A TypeError: not all arguments converted during string formatting can surface when a caller passes a non-string value with %-style formatting (e.g. logger.info("val: %s %s", only_one_arg)). This is a known Python logging footgun and was reportedly hit during testing. Worth adding a try/except in JsonFormatter._build_log_dict or RESTHandler.emit to catch and surface it gracefully rather than silently dropping the record.

planetf1

Async behaviour needs checking (nasty to find later) - and mellea/stdlib/session.py:48 already uses contextvars.ContextVar for exactly the same pattern

planetf1 · 2026-04-14T17:16:23Z

mellea/core/utils.py

+# ---------------------------------------------------------------------------
+# Thread-local storage for per-request context fields
+# ---------------------------------------------------------------------------
+_context_local: threading.local = threading.local()


threading.local causes context contamination across async coroutines**

In a standard asyncio event loop, all coroutines share one OS thread — and therefore one threading.local namespace. This means concurrent requests contaminate each other's log context:

Coroutine A enters log_context(trace_id="request-A")

Coroutine A hits await and yields control

Coroutine B enters log_context(trace_id="request-B") — overwrites the same _context_local.fields dict

Coroutine A resumes and logs — sees trace_id="request-B" (wrong request)

Either coroutine's finally cleanup corrupts the other's state

Since mellea's backends are async-first (astream, async sampling), this is the primary execution path.

The fix is contextvars.ContextVar, which gives each asyncio.Task its own isolated copy:

import contextvars _log_context: contextvars.ContextVar[dict[str, Any]] = contextvars.ContextVar( "log_context_fields", default={} ) Then set_log_context does _log_context.set({**_log_context.get(), **fields}), and log_context uses token-based restore: token = _log_context.set({**_log_context.get(), **fields}) try: yield finally: _log_context.reset(token) This matters especially because set_log_context, log_context, and clear_log_context are now exported public API from mellea.core - majority_voting.py:129-143 — asyncio.create_task per sample, then asyncio.gather(*tasks). Multiple sampling calls run concurrently on the same event loop thread. If any of them set log context (e.g., a trace_id per sample), they'd overwrite each other. - Every backend (ollama, openai, huggingface, watsonx, litellm) uses asyncio.create_task for generation — so concurrent astream calls from different requests share the same threading.local. - asyncio.gather in async_helpers.py:66 and core/backend.py:186,208 — multiple coroutines gathered in parallel, all on one thread, all sharing _context_local.fields. The exact scenario from the draft (coroutine A sets context, awaits, coroutine B overwrites it) would happen any time two concurrent requests use log_context. The create_task + gather pattern is the norm here, not the exception.

planetf1 · 2026-04-14T17:20:34Z

mellea/core/utils.py

                )
                response.raise_for_status()
            except requests.exceptions.RequestException as _:
                pass


Interestingly our AGENTS.md says to never allow silent exception. However in this case it seems to be an appropriate exception - and consistent with the rest of the logging code, to avoid cascading failures which I think is the right intent. Just recording as an observation ..

improve fancylogger implementation

9def295

AngeloDanducci requested review from a team, jakelorocco and nrfulton as code owners April 8, 2026 13:47

AngeloDanducci changed the title ~~improve fancylogger implementation~~ refactor: improve fancylogger implementation Apr 8, 2026

github-actions bot added the enhancement New feature or request label Apr 8, 2026

update docstring

67e4252

jakelorocco reviewed Apr 8, 2026

View reviewed changes

AngeloDanducci requested a review from ajbozarth April 8, 2026 15:10

jakelorocco reviewed Apr 8, 2026

View reviewed changes

planetf1 requested changes Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: improve fancylogger implementation#792

refactor: improve fancylogger implementation#792
AngeloDanducci wants to merge 2 commits intogenerative-computing:mainfrom
AngeloDanducci:ad-457

AngeloDanducci commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

jakelorocco left a comment

Uh oh!

jakelorocco Apr 8, 2026

Uh oh!

jakelorocco Apr 8, 2026

Uh oh!

jakelorocco Apr 8, 2026

Uh oh!

ajbozarth Apr 8, 2026

Uh oh!

jakelorocco Apr 8, 2026

Uh oh!

jakelorocco left a comment

Uh oh!

ajbozarth commented Apr 8, 2026

Uh oh!

planetf1 left a comment

Uh oh!

planetf1 Apr 14, 2026

Uh oh!

planetf1 Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		``FLOG``
		When set, log records are forwarded to a local REST endpoint.

		``DEBUG``
		Legacy flag — equivalent to ``MELLEA_LOG_LEVEL=DEBUG``.

Conversation

AngeloDanducci commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

jakelorocco Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth commented Apr 8, 2026

Code Review

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AngeloDanducci commented Apr 8, 2026 •

edited

Loading