Skip to content

export conversations to json#36

Open
raedatoui wants to merge 8 commits intodaaain:mainfrom
raedatoui:main
Open

export conversations to json#36
raedatoui wants to merge 8 commits intodaaain:mainfrom
raedatoui:main

Conversation

@raedatoui
Copy link
Copy Markdown

@raedatoui raedatoui commented Oct 29, 2025

any interest in something like this? I find that to be useful for other workflows
happy to make it an optional flag , etc.

Summary by CodeRabbit

  • New Features

    • Added JSON export for full transcripts, per-session exports, and aggregated project summaries (including metadata and optional combined-transcript links).
    • CLI now accepts a JSON output format and uses the new JSON renderer.
  • Bug Fixes

    • Clearing/export now targets the correct aggregated JSON index and combined-transcript files to avoid accidental deletion or collisions.
  • Tests

    • Updated CLI tests to exercise JSON output and the revised clear-output behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 29, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds JSON export support: new JsonRenderer and package, CLI --format includes json and clears JSON outputs correctly, renderer factory returns JsonRenderer for "json", and converter uses format-aware index filename (all-projects-summary.json for JSON).

Changes

Cohort / File(s) Summary
CLI
claude_code_log/cli.py
--format now accepts json; _clear_output_files signature changed to output_format and computes extension/index name internally; main passes output_format when clearing and updates status message.
JSON renderer package
claude_code_log/json/__init__.py, claude_code_log/json/renderer.py
Add JsonRenderer with _message_to_dict, generate, generate_session, generate_projects_index, and is_outdated; serializes transcript/session/project data to JSON and emits combined/session/index filenames (e.g., combined_transcripts*.json, all-projects-summary.json).
Renderer factory
claude_code_log/renderer.py
get_renderer recognizes "json", dynamically imports and returns JsonRenderer; detail and compact remain applied.
Converter
claude_code_log/converter.py
Add get_index_filename(output_format) and use it to compute aggregate index path (special-case JSON -> all-projects-summary.json).
Tests
test/test_cli.py
Tests updated to call _clear_output_files(output_format=...) (was file_ext=).

Sequence Diagram

sequenceDiagram
    autonumber
    actor User
    participant CLI as CLI
    participant Factory as RendererFactory
    participant JSON as JsonRenderer
    participant Converter as Converter
    participant FS as FileSystem

    User->>CLI: run CLI with --format json
    CLI->>Factory: get_renderer("json", detail, compact)
    Factory->>JSON: import & instantiate JsonRenderer
    Factory-->>CLI: return JsonRenderer
    CLI->>JSON: generate(messages, title, ...)
    JSON->>JSON: serialize messages -> nested dicts
    JSON-->>CLI: return JSON payload / filename
    CLI->>Converter: request projects index generation
    Converter->>JSON: generate_projects_index(...)
    JSON-->>Converter: return index JSON
    Converter->>FS: write `all-projects-summary.json` and per-project files
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop-hop, JSON blooms anew,

Brackets chatter, tidy view,
Sessions nested, indexes named,
I nibble bugs and stamp them tamed,
carrots crunch — the logs are true!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main feature addition: JSON export capability for conversations, which is the primary change across multiple files in the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@daaain
Copy link
Copy Markdown
Owner

daaain commented Oct 30, 2025

This makes sense, can be really useful and only takes a few lines, so why not!

@daaain
Copy link
Copy Markdown
Owner

daaain commented Jan 26, 2026

Are you still interested in this @raedatoui?

We have a SQLite cache now with a table of every single message (amongst others) so this could be a very flexible feature.

@raedatoui
Copy link
Copy Markdown
Author

Are you still interested in this @raedatoui?

We have a SQLite cache now with a table of every single message (amongst others) so this could be a very flexible feature.

hey, apologies for not following up earlier. SQLite support is cool, I'll check it out.

@cboos
Copy link
Copy Markdown
Collaborator

cboos commented Apr 18, 2026

I'd rather go for a structured output of the processed messages, IOW, another renderer, in addition to HTML and Markdown. The Markdown output (and especially the lower-level of details ones, like --detail low) can already be "useful for other workflows", but I can see that a .jsonl rendition of that might also have its applications.

@raedatoui
Copy link
Copy Markdown
Author

Good call — agreed that a raw TranscriptEntry dump wasn't pulling its weight alongside HTML/Markdown. Pushed a rewrite that runs the JSON renderer through the same generate_template_messages pipeline as the other renderers, so it now:

  • Honours --detail (full/high/low/minimal) since filtering happens before rendering — on the sidechain sample, nodes drop from 8 → 2 → 0 as detail decreases.
  • Emits the processed tree, not a flat list — each node has type, title, timestamp, session_id, content (structured dataclass dump), plus pair_first / pair_last / pair_duration, is_sidechain, agent_id, token_usage, and nested children.
  • Top-level metadata includes version, detail, compact, and the same sessions nav that the HTML TOC uses, so downstream consumers get session boundaries and summaries without re-deriving them.
  • Sibling combined-file back-link (combined_transcripts{variant}.json) matches Markdown's behaviour, so per-session outputs stay navigable across detail levels.

@raedatoui raedatoui marked this pull request as ready for review April 24, 2026 18:49
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/converter.py`:
- Around line 2680-2682: The cleanup in
claude_code_log/cli.py::_clear_output_files() must match the new JSON index
filename used in converter.py (all-projects-summary.json); update
_clear_output_files to compute the index filename the same way as conversion by
adding or calling a helper like get_index_filename/get_file_extension (or
implement the equivalent logic) so that for JSON it deletes
"all-projects-summary.json" and for other formats it deletes f"index.{ext}"
(refer to converter.py's index_filename logic and ensure _clear_output_files
uses the same symbol/name to derive the filename).

In `@claude_code_log/json/renderer.py`:
- Around line 102-117: The current session_messages comprehension filters only
by msg.sessionId == session_id which crashes on messages lacking sessionId and
omits sidechain/subagent entries rewritten to synthetic IDs; update the filter
used to build session_messages to (1) safely handle missing sessionId (guarding
against AttributeError), and (2) include messages whose sessionId equals
session_id, whose sessionId starts with f"{session_id}#" (synthetic sub-session
IDs created by _integrate_agent_entries), or whose parent/session parent field
points to session_id (e.g., parentSessionId or equivalent), then pass that list
into the existing call to generate(...) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c45cc45-7f4e-4763-8dab-db3f3c087146

📥 Commits

Reviewing files that changed from the base of the PR and between d4bcd58 and 24247dc.

📒 Files selected for processing (5)
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/json/__init__.py
  • claude_code_log/json/renderer.py
  • claude_code_log/renderer.py

Comment thread claude_code_log/converter.py Outdated
Comment thread claude_code_log/json/renderer.py Outdated
@raedatoui raedatoui force-pushed the main branch 2 times, most recently from 24247dc to 82ac060 Compare April 24, 2026 19:02
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
test/test_cli.py (1)

243-311: Consider adding JSON coverage for _clear_output_files.

Since this PR introduces --format json, it would be worth adding a TestClearOutputFiles case that exercises output_format="json", particularly to lock in the special top-level index filename (all-projects-summary.json via get_index_filename) — that branch is unique to JSON and currently uncovered here. Same rationale for extending test_format_option_md with a JSON variant.

🧪 Suggested additional tests
def test_clear_json_all_projects(
    self, cli_projects_setup: ProjectsSetup, sample_jsonl_content: list[dict]
):
    """Clears JSON files and the all-projects-summary.json index."""
    projects_dir = cli_projects_setup.projects_dir

    for i in range(2):
        project_dir = create_project_with_jsonl(
            projects_dir, f"project-{i}", sample_jsonl_content
        )
        (project_dir / "combined_transcripts.json").write_text("{}")

    # JSON uses a distinct top-level index name to avoid colliding with per-project exports
    (projects_dir / "all-projects-summary.json").write_text("{}")

    _clear_output_files(projects_dir, all_projects=True, output_format="json")

    assert not (projects_dir / "all-projects-summary.json").exists()
    for i in range(2):
        assert len(list((projects_dir / f"project-{i}").glob("*.json"))) == 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_cli.py` around lines 243 - 311, Add tests exercising
output_format="json" in TestClearOutputFiles and extend test_format_option_md
with a JSON variant: create per-project JSON files (e.g.,
"combined_transcripts.json") and the special top-level index file returned by
get_index_filename (all-projects-summary.json), call _clear_output_files with
all_projects=True and output_format="json", and assert the index file and all
per-project .json files are removed; reference _clear_output_files and
get_index_filename to locate the cleanup logic and test_format_option_md to add
the JSON case.
claude_code_log/json/renderer.py (1)

34-38: Defensive guard for dataclasses.asdict on non-dataclass content.

dataclasses.asdict raises TypeError if msg.content isn't a dataclass instance. Every current MessageContent subclass is a dataclass, so this works today, but a future non-dataclass content type (e.g., a Pydantic model) would crash the entire JSON export for that transcript. A fallback keeps the exporter robust without changing current output.

🩹 Proposed fix
-        content_dump = dataclasses.asdict(msg.content)
+        if dataclasses.is_dataclass(msg.content):
+            content_dump = dataclasses.asdict(msg.content)
+        else:
+            content_dump = {"message_type": msg.content.message_type}
         # Meta is surfaced at the node level; drop the nested copy for clarity.
         content_dump.pop("meta", None)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/json/renderer.py` around lines 34 - 38, Guard the
dataclasses.asdict call so non-dataclass msg.content doesn't raise: before
calling dataclasses.asdict on msg.content (used to produce content_dump), check
dataclasses.is_dataclass(msg.content) and only call asdict in that case;
otherwise, fall back to a safe conversion (prefer msg.content.dict() if present,
else vars(msg.content) or dict(msg.content) if it's already a mapping) to
produce content_dump, then continue to pop "meta" and "message_index" as before;
keep references to msg.content, content_dump, and the pop operations unchanged.
claude_code_log/renderer.py (1)

2971-2998: Optional: update the get_renderer docstring to list "json".

The dispatch addition is correct and idiomatic, but the format parameter doc at line 2972 still only advertises "html", "md", and "markdown". Worth a one-line update so CLI maintainers discover the new value from the signature.

📝 Proposed docstring tweak
     Args:
-        format: The output format ("html", "md", or "markdown").
+        format: The output format ("html", "md"/"markdown", or "json").
         image_export_mode: Image export mode ("placeholder", "embedded", "referenced").
             If None, defaults to "embedded" for HTML and "referenced" for Markdown.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/renderer.py` around lines 2971 - 2998, Update the
get_renderer docstring to include "json" in the description of the format
parameter: locate the get_renderer function and its docstring (look for the
"format" parameter description) and add "json" alongside "html", "md", and
"markdown" so the docstring reads that valid format values include "html", "md",
"markdown", and "json"; ensure the wording matches existing style and keep the
one-line change in the param list without altering any code logic (references:
get_renderer, format parameter, HtmlRenderer/MarkdownRenderer/JsonRenderer).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/json/renderer.py`:
- Around line 32-64: The serialized node omits the current message's uuid (it
pops meta from content_dump so downstream consumers can't resolve parent_uuid),
and it also doesn't emit render_session_id when it differs from session_id;
update _message_to_dict to add the node's own UUID (e.g., set node["uuid"] =
msg.meta.uuid or msg.uuid depending on the TemplateMessage representation) after
building node, keep popping meta from content_dump to avoid duplication, and
also conditionally include node["render_session_id"] = msg.render_session_id
when msg.render_session_id is present and not equal to msg.session_id so forked
branches can be grouped correctly.

---

Nitpick comments:
In `@claude_code_log/json/renderer.py`:
- Around line 34-38: Guard the dataclasses.asdict call so non-dataclass
msg.content doesn't raise: before calling dataclasses.asdict on msg.content
(used to produce content_dump), check dataclasses.is_dataclass(msg.content) and
only call asdict in that case; otherwise, fall back to a safe conversion (prefer
msg.content.dict() if present, else vars(msg.content) or dict(msg.content) if
it's already a mapping) to produce content_dump, then continue to pop "meta" and
"message_index" as before; keep references to msg.content, content_dump, and the
pop operations unchanged.

In `@claude_code_log/renderer.py`:
- Around line 2971-2998: Update the get_renderer docstring to include "json" in
the description of the format parameter: locate the get_renderer function and
its docstring (look for the "format" parameter description) and add "json"
alongside "html", "md", and "markdown" so the docstring reads that valid format
values include "html", "md", "markdown", and "json"; ensure the wording matches
existing style and keep the one-line change in the param list without altering
any code logic (references: get_renderer, format parameter,
HtmlRenderer/MarkdownRenderer/JsonRenderer).

In `@test/test_cli.py`:
- Around line 243-311: Add tests exercising output_format="json" in
TestClearOutputFiles and extend test_format_option_md with a JSON variant:
create per-project JSON files (e.g., "combined_transcripts.json") and the
special top-level index file returned by get_index_filename
(all-projects-summary.json), call _clear_output_files with all_projects=True and
output_format="json", and assert the index file and all per-project .json files
are removed; reference _clear_output_files and get_index_filename to locate the
cleanup logic and test_format_option_md to add the JSON case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d37ca592-f4a2-49b9-8e64-1a6b748fabd5

📥 Commits

Reviewing files that changed from the base of the PR and between 24247dc and 82ac060.

📒 Files selected for processing (6)
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/json/__init__.py
  • claude_code_log/json/renderer.py
  • claude_code_log/renderer.py
  • test/test_cli.py
✅ Files skipped from review due to trivial changes (1)
  • claude_code_log/json/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • claude_code_log/converter.py
  • claude_code_log/cli.py

Comment thread claude_code_log/json/renderer.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
claude_code_log/cli.py (1)

384-386: ⚠️ Potential issue | 🟠 Major

Avoid deleting arbitrary .json files during --clear-output --format json.

The current glob removes every *.json in project dirs, not just generated transcript outputs. That can remove unrelated user files.

🛠️ Proposed fix
-                    output_files = list(project_dir.glob(f"*.{file_ext}"))
+                    if file_ext == "json":
+                        output_files = [
+                            *project_dir.glob("combined_transcripts*.json"),
+                            *project_dir.glob("session-*.json"),
+                        ]
+                    else:
+                        output_files = list(project_dir.glob(f"*.{file_ext}"))
@@
-            output_files = list(input_path.glob(f"*.{file_ext}"))
+            if file_ext == "json":
+                output_files = [
+                    *input_path.glob("combined_transcripts*.json"),
+                    *input_path.glob("session-*.json"),
+                ]
+            else:
+                output_files = list(input_path.glob(f"*.{file_ext}"))

Also applies to: 415-417

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/cli.py` around lines 384 - 386, The code currently unlinks
every file matching f"*.{file_ext}" (variables project_dir, file_ext,
output_file.unlink()), which deletes any .json in the directory; change the
deletion to target only generated transcript outputs by narrowing the glob or
filtering before unlinking—e.g., use a specific pattern/suffix/prefix for
generated outputs (like f"*{GENERATED_TRANSCRIPT_SUFFIX}.{file_ext}" or check
output_file.stem.startswith(GENERATED_PREFIX)) or cross-check output_file
against the known list of generated filenames, and then call
output_file.unlink() only for those matched files (apply the same fix for the
similar block around lines 415-417).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/json/renderer.py`:
- Around line 190-193: In is_outdated(), defensively handle non-object JSON by
checking the parsed data type before calling .get(): after json.load(f) verify
isinstance(data, dict) and only call data.get("version") when it is a dict; if
it’s not a dict (e.g., list or scalar) return True to treat the file as
outdated. Keep the existing except (OSError, json.JSONDecodeError,
UnicodeDecodeError) block unchanged and continue to use get_library_version()
for the version comparison when data is a dict.

---

Outside diff comments:
In `@claude_code_log/cli.py`:
- Around line 384-386: The code currently unlinks every file matching
f"*.{file_ext}" (variables project_dir, file_ext, output_file.unlink()), which
deletes any .json in the directory; change the deletion to target only generated
transcript outputs by narrowing the glob or filtering before unlinking—e.g., use
a specific pattern/suffix/prefix for generated outputs (like
f"*{GENERATED_TRANSCRIPT_SUFFIX}.{file_ext}" or check
output_file.stem.startswith(GENERATED_PREFIX)) or cross-check output_file
against the known list of generated filenames, and then call
output_file.unlink() only for those matched files (apply the same fix for the
similar block around lines 415-417).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa023496-cad5-4f42-b94b-48d18bfa39ca

📥 Commits

Reviewing files that changed from the base of the PR and between 82ac060 and dd54b72.

📒 Files selected for processing (6)
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/json/__init__.py
  • claude_code_log/json/renderer.py
  • claude_code_log/renderer.py
  • test/test_cli.py
✅ Files skipped from review due to trivial changes (1)
  • claude_code_log/json/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/test_cli.py

Comment thread claude_code_log/json/renderer.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
claude_code_log/json/renderer.py (1)

32-70: Verify dataclasses.asdict round-trip for non-dataclass fields (Enums, datetimes, Paths).

dataclasses.asdict recurses through dataclasses/dict/list/tuple but leaves other objects unchanged. Those are then serialized via json.dumps(..., default=str) at line 96, which calls str() on them. For Enum members this yields "ClassName.MEMBER" rather than the underlying .value, and for datetime/Path it produces the __str__ form — both potentially lossy or awkward for downstream consumers of the JSON contract. Worth confirming which concrete field types can appear inside MessageContent subclasses.

Consider replacing the default=str fallback with a serializer that normalizes Enums to .value and datetimes to ISO 8601 to lock down a stable schema.

#!/bin/bash
# Inspect MessageContent subclasses and their field types to see what non-dataclass types
# can reach the JSON serializer via dataclasses.asdict.
fd -e py . claude_code_log | xargs rg -nP '@dataclass|class\s+\w+Content\b|class\s+MessageContent\b' -C2

# Look for Enum/datetime/Path fields on content-ish dataclasses.
rg -nP '^\s*\w+\s*:\s*(Enum|datetime|date|Path|PurePath|UUID|bytes)\b' --type=py

# Confirm whether any MessageContent subclass is NOT a `@dataclass` (which would
# make asdict(msg.content) raise TypeError at runtime).
ast-grep --pattern $'class $NAME(MessageContent):
  $$$'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/json/renderer.py` around lines 32 - 70, The
dataclasses.asdict call in _message_to_dict can leave Enum, datetime, Path,
UUID, and bytes objects intact which later get serialized with json.dumps(...,
default=str) producing unstable representations; update _message_to_dict to
recurse the content_dump (after dataclasses.asdict(msg.content)) and normalize:
convert Enum instances to .value, datetimes/dates to ISO 8601 via .isoformat(),
Path/PurePath to str(), UUID to str(), and bytes to a stable encoding (e.g.,
base64 or hex) so the JSON contract is stable; alternatively, implement a custom
serializer used by the json.dumps call (referencing the json.dumps usage) that
handles these types, but do the normalization in or before _message_to_dict to
ensure children/embedded fields (from dataclasses.asdict) are normalized
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@claude_code_log/json/renderer.py`:
- Around line 32-70: The dataclasses.asdict call in _message_to_dict can leave
Enum, datetime, Path, UUID, and bytes objects intact which later get serialized
with json.dumps(..., default=str) producing unstable representations; update
_message_to_dict to recurse the content_dump (after
dataclasses.asdict(msg.content)) and normalize: convert Enum instances to
.value, datetimes/dates to ISO 8601 via .isoformat(), Path/PurePath to str(),
UUID to str(), and bytes to a stable encoding (e.g., base64 or hex) so the JSON
contract is stable; alternatively, implement a custom serializer used by the
json.dumps call (referencing the json.dumps usage) that handles these types, but
do the normalization in or before _message_to_dict to ensure children/embedded
fields (from dataclasses.asdict) are normalized consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aaeddd3d-2906-4305-91c0-e978579d65de

📥 Commits

Reviewing files that changed from the base of the PR and between dd54b72 and 953fdff.

📒 Files selected for processing (1)
  • claude_code_log/json/renderer.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
claude_code_log/json/renderer.py (1)

83-88: Minor: truthiness check on pair_duration drops legitimate 0 values.

pair_first / pair_last correctly use is not None, but pair_duration uses a truthy check at line 87, which will omit a duration of 0 (e.g., back-to-back messages in the same second). Low impact, but worth aligning for consistency.

♻️ Proposed tweak
-        if msg.pair_duration:
+        if msg.pair_duration is not None:
             node["pair_duration"] = msg.pair_duration
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/json/renderer.py` around lines 83 - 88, The check for
pair_duration currently uses truthiness and will drop valid zero values; update
the condition that sets node["pair_duration"] to use an explicit None check
(like the existing checks for msg.pair_first and msg.pair_last) so that
msg.pair_duration == 0 is preserved when serializing in renderer.py (i.e.,
change the truthy check on msg.pair_duration to an is not None style check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@claude_code_log/json/renderer.py`:
- Around line 83-88: The check for pair_duration currently uses truthiness and
will drop valid zero values; update the condition that sets
node["pair_duration"] to use an explicit None check (like the existing checks
for msg.pair_first and msg.pair_last) so that msg.pair_duration == 0 is
preserved when serializing in renderer.py (i.e., change the truthy check on
msg.pair_duration to an is not None style check).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06037f0f-43ae-483c-b4ed-38084a55a33a

📥 Commits

Reviewing files that changed from the base of the PR and between 953fdff and 35809d3.

📒 Files selected for processing (2)
  • claude_code_log/cli.py
  • claude_code_log/json/renderer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • claude_code_log/cli.py

raedatoui and others added 8 commits April 28, 2026 08:49
If the JSON file contains a list or scalar, data.get() would raise
AttributeError and break the staleness check. Treat any non-dict
payload as outdated so the next run regenerates cleanly.
Two bugs addressed from PR review:

1. Tool inputs/outputs (ToolUseContent, ReadInput, TextContent, etc.)
   are Pydantic BaseModel instances embedded inside dataclasses.
   dataclasses.asdict does not recurse into them, so json.dumps was
   falling back to default=str and serialising them as __repr__ strings
   like "type='tool_use' id='toolu_...' input={...}" instead of nested
   objects. Added a custom default handler that calls model_dump on
   Pydantic models (and normalises Enums/Paths) so the JSON output
   contains the structured content consumers expect.

2. --clear-output --format json was globbing *.json across project
   directories and deleting arbitrary user files. Narrowed to the
   filename patterns this tool actually produces
   (combined_transcripts*.json, session-*.json).
Covers JsonRenderer.generate / generate_session /
generate_projects_index / is_outdated and the CLI --format json
flow, including the safety contract that --clear-output preserves
unrelated user .json files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two type-checker fixes the rebase exposed (both type-inference
tightening that's landed via main since the PR was opened — same
code passed both checkers when originally committed):

- `claude_code_log/json/renderer.py:is_outdated`: pyright (1.1.408)
  narrows `json.load(f)` after `isinstance(data, dict)` to
  `dict[Unknown, Unknown]`, which then flags `data.get("version")`
  as `reportUnknownMemberType`. Cast to `dict[str, Any]` after the
  isinstance check.
- `test/test_json_rendering.py:test_detail_minimal_filters_tool_messages`:
  ty narrows `get_renderer("json", ...).generate(...)` to the base
  `Optional[str]` return type and rejects passing it directly to
  `json.loads`. Hoist the call to a local + assert it's not None,
  matching how the other tests in this file already do it.

Runtime semantics unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR daaain#127 introduced `pair_middle` for the (UserSlash → Slash →
CommandOutput) triple, but the JSON renderer was never updated to
emit it — `_message_to_dict` had `pair_first`/`pair_last`/
`pair_duration` but silently dropped the middle. Downstream tools
walking the JSON would lose the middle linkage and see what looks
like an unrelated message wedged between two paired siblings.

Mirror the existing pair_* emission pattern. New test asserts the
three pair members emit pair_first/pair_middle/pair_last as the
exclusive role properties dictate (first carries middle+last;
middle carries first+last; last carries first only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Windows-specific failures the GH Actions runner caught (this
repo's `just ci` only runs on Linux):

- `test_shape_and_totals`: `Path("/tmp/alpha")` is a `WindowsPath`
  on Windows whose `str()` yields `\tmp\alpha`. Compare against
  `str(Path(...))` so the expected value matches the platform's own
  `Path` rendering.
- `test_generates_combined_and_session_files` and
  `test_all_projects_index_uses_dedicated_filename`: `read_text()`
  defaults to the system codec (cp1252 on Windows), which can't
  decode the UTF-8 multi-byte sequences the JSON renderer emits
  (`ensure_ascii=False`). Add explicit `encoding="utf-8"` to all
  `read_text` calls in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parametrizes over every session-level JSONL in
`test/test_data/real_projects/` (48 fixtures, ~384 invariant checks)
and asserts shape rather than specific values. Real fixtures are the
only way to exercise sidechains, within-session forks, multi-session
resumes, teammates, and the long tail of structural diversity that
hand-written tests can't reach.

Invariants asserted per fixture:

1. JSON parses (would catch _json_default gaps for new content types).
2. Top-level + per-node required keys present and well-typed.
3. `index` is unique tree-wide.
4. All `pair_first` / `pair_middle` / `pair_last` references resolve
   to nodes that exist in the document.
5. Pair role exclusivity: `pair_middle` (the field) only appears on
   the triple's pair_first (which has `pair_last` set, no `pair_first`).
6. Pair cross-references resolve to equivalent partners — relaxed from
   strict symmetry because real fixtures have stitched / forked
   transcripts where duplicate `tool_use_id` values produce multiple
   tool_use nodes pointing at the same tool_result; the back-reference
   only mirrors the latest one. The invariant that does hold is that
   the back-reference resolves to a node sharing the same
   `tool_use_id` (the logical pair equivalence class).
7. `type` strings match an identifier-shaped pattern (catches obvious
   garbage without hardcoding the closed set).
8. Tree is acyclic (no node appears in its own ancestor chain).

Investigated finding worth flagging: bug-like behavior on duplicate
`tool_use_id` is pre-existing renderer behavior (not specific to JSON
output) — `_try_pair_by_index`'s dict-based pair index keeps only the
last write, leaving earlier tool_use nodes with stale `pair_last`
references. Acceptable for now; documented in the relaxed invariant
above. Worth tracking as a separate cleanup if we ever revisit
stitching.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants