Skip to content

refactor: improve fancylogger implementation#792

Open
AngeloDanducci wants to merge 2 commits intogenerative-computing:mainfrom
AngeloDanducci:ad-457
Open

refactor: improve fancylogger implementation#792
AngeloDanducci wants to merge 2 commits intogenerative-computing:mainfrom
AngeloDanducci:ad-457

Conversation

@AngeloDanducci
Copy link
Copy Markdown
Contributor

@AngeloDanducci AngeloDanducci commented Apr 8, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@AngeloDanducci AngeloDanducci requested review from a team, jakelorocco and nrfulton as code owners April 8, 2026 13:47
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@AngeloDanducci AngeloDanducci changed the title improve fancylogger implementation refactor: improve fancylogger implementation Apr 8, 2026
@github-actions github-actions bot added the enhancement New feature or request label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished my initial code review. Still need to test out the logging myself and look a little deeper as well; but figured I'd get these comments to you.

Comment on lines +16 to +17
``FLOG``
When set, log records are forwarded to a local REST endpoint.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be prepended with MELLEA? I don't know if this is a common flag that gets set system wide and that multiple programs adhere to.

Comment on lines +18 to +19
``DEBUG``
Legacy flag — equivalent to ``MELLEA_LOG_LEVEL=DEBUG``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just drop this if we are only maintaining it for backwards compatibility. We can make an announcement to help change user adoption.

return formatter.format(record)


class FancyLogger:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat feel that we should rename this to something more sensible? Or instantiate with a better name? Ie something like DefaultMelleaLogger?

I don't know exactly what this looks like, but I know other projects utilize multiple different loggers / streams. I think we should keep our options open to that if needed in the future, and I think naming this class something more descriptive might help with that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should rename this. It was something I had meant to ask the original team when I got to this work.

Comment on lines +8 to +17
Environment variables
---------------------
``MELLEA_LOG_LEVEL``
Minimum log level name (e.g. ``DEBUG``, ``INFO``, ``WARNING``). Defaults to
``INFO``. The legacy ``DEBUG`` variable is still honoured as a fallback.
``MELLEA_LOG_JSON``
Set to any truthy value (``1``, ``true``, ``yes``) to emit structured JSON on
the console instead of the colour-coded human-readable format.
``FLOG``
When set, log records are forwarded to a local REST endpoint.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add all these env vars to the docs somewhere if they aren't already there.

@AngeloDanducci AngeloDanducci requested a review from ajbozarth April 8, 2026 15:10
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things:

  • Slightly orthogonal, we currently use tqdm at some points and we utilize log level to determine if we should silence that output. If a good solution to this is outside the scope of your work, can you please open an issue for it?
  • We should include examples of the logs somewhere.
  • We should include a best practices document / skill for Mellea developers so that we are consistent with what types of fields get added, how things are formatted going forwards, and what type of events get dedicated logs.
  • Do we have plans to nest details / contexts? For instance, if I run a simple m.instruct call with requirements, I get very simple output:
{"timestamp": "2026-04-08T11:11:25", "level": "INFO", "message": "SUCCESS", "module": "base", "function": "sample", "line_number": 258, "process_id": 73738, "thread_id": 6179762176}

It would be more helpful if it tracked what sampling strategy, what instruct call, potentially even what session, etc...
Are these future potential improvements outside the scope of this PR?

  • Should our logger be catching logger errors? I got one when an item I passed in wasn't already in string form: TypeError: not all arguments converted during string formatting

  • Can you add a test that ensures our logger works with the hooks/plugin system? This might just mean changing the logger used in the conftest.py hooks to be our FancyLogger.

@ajbozarth
Copy link
Copy Markdown
Contributor

So I haven't finished my review yet, but as I won't have time to return to my review till after lunch tomorrow I figured I'd share the issues Claude found for now as well as the comment I made above, I had it review the code as well as how it related to the issue and the epic, it may not be worth addressing all of these, but at least looking into them:


Code Review

1. threading.local breaks async context propagation

The parent epic #442 explicitly calls for contextvars integration. threading.local is invisible to asyncio coroutines — context won't propagate across asyncio.create_task or thread pool boundaries. Given mellea's async backends and astream, this is a real gap. At minimum document it as a known limitation; ideally use contextvars.ContextVar or file a follow-up issue.

2. _build_log_dict is a private method accessed from outside its class

RESTHandler.emit calls formatter._build_log_dict(record) directly, which crosses the _ naming contract. If the dict representation is needed externally, either make it public (build_log_dict) or add a format_as_dict(record) method to JsonFormatter.

3. include_fields only filters core fields — behavior is undocumented

extra_fields and context fields bypass include_fields filtering, so it's not a true whitelist. The current tests assert this as intentional but the docstring doesn't document it. Users passing include_fields will reasonably expect it to be exhaustive — worth clarifying.

4. Tests import private name _RESERVED_LOG_RECORD_ATTRS

The test test_reserved_set_is_non_empty imports and checks _RESERVED_LOG_RECORD_ATTRS directly. Tests on private symbols make refactoring harder. The check (len(...) > 10) could be replaced with a test on a public constant, or the name made public.

5. Test classes that reset the FancyLogger singleton are not marked

Several test classes manually reset FancyLogger.logger = None and clear handlers. These are unsafe under parallel test execution (e.g. pytest-xdist) and should carry a @pytest.mark.unit marker. The file also has no module-level marker comment per the project's convention.

6. No handling for formatter TypeError on malformed log calls

A TypeError: not all arguments converted during string formatting can surface when a caller passes a non-string value with %-style formatting (e.g. logger.info("val: %s %s", only_one_arg)). This is a known Python logging footgun and was reportedly hit during testing. Worth adding a try/except in JsonFormatter._build_log_dict or RESTHandler.emit to catch and surface it gracefully rather than silently dropping the record.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async behaviour needs checking (nasty to find later) - and mellea/stdlib/session.py:48 already uses contextvars.ContextVar for exactly the same pattern

# ---------------------------------------------------------------------------
# Thread-local storage for per-request context fields
# ---------------------------------------------------------------------------
_context_local: threading.local = threading.local()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threading.local causes context contamination across async coroutines**

In a standard asyncio event loop, all coroutines share one OS thread — and therefore one threading.local namespace. This means concurrent requests contaminate each other's log context:

  1. Coroutine A enters log_context(trace_id="request-A")
  2. Coroutine A hits await and yields control
  3. Coroutine B enters log_context(trace_id="request-B") — overwrites the same _context_local.fields dict
  4. Coroutine A resumes and logs — sees trace_id="request-B" (wrong request)
  5. Either coroutine's finally cleanup corrupts the other's state

Since mellea's backends are async-first (astream, async sampling), this is the primary execution path.

The fix is contextvars.ContextVar, which gives each asyncio.Task its own isolated copy:

import contextvars
_log_context: contextvars.ContextVar[dict[str, Any]] = contextvars.ContextVar(
    "log_context_fields", default={}
)

Then set_log_context does _log_context.set({**_log_context.get(), **fields}), and log_context uses token-based restore:

token = _log_context.set({**_log_context.get(), **fields})
try:
    yield
finally:
    _log_context.reset(token)

This matters especially because set_log_context, log_context, and clear_log_context are now exported public API from mellea.core 

- majority_voting.py:129-143asyncio.create_task per sample, then asyncio.gather(*tasks). Multiple sampling calls run concurrently on the same event loop thread. If any of them   set log context (e.g., a trace_id per sample), they'd overwrite each other.
- Every backend (ollama, openai, huggingface, watsonx, litellm) uses asyncio.create_task for generationso concurrent astream calls from different requests share the same   threading.local.
- asyncio.gather in async_helpers.py:66 and core/backend.py:186,208multiple coroutines gathered in parallel, all on one thread, all sharing _context_local.fields.

The exact scenario from the draft (coroutine A sets context, awaits, coroutine B overwrites it) would happen any time two concurrent requests use log_context. The create_task +   gather pattern is the norm here, not the exception.

)
response.raise_for_status()
except requests.exceptions.RequestException as _:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly our AGENTS.md says to never allow silent exception. However in this case it seems to be an appropriate exception - and consistent with the rest of the logging code, to avoid cascading failures which I think is the right intent. Just recording as an observation ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the existing FancyLogger implementation with better JSON formatting, configurable log levels, and preparation for trace/context injection

4 participants