Skip to content

Add :stats and verbose mode to the dev CLIΒ #3

@AJaccP

Description

@AJaccP

🧠 Context

The dev CLI (src/apps/dev_cli.py) is the main way to use the bot locally, but it only prints the final answer β€” there's no quick way to see how much data is loaded, or what the retriever actually pulled for a given question. This ticket adds two small developer conveniences:

  • :stats β€” show how many sources and chunks are in the database.
  • a verbose mode β€” for each question, show the retrieved chunks and their similarity scores.

Both make it much easier to sanity-check ingestion and retrieval while developing. (Empty-input and Ctrl-D handling already exist β€” not in scope.)


πŸ›  Implementation Plan

  1. Counts via the repository. Add a small read-only method to Repository (src/infrastructure/db/repository.py), right next to has_chunks, that returns the source and chunk counts (a count_sources/count_chunks pair, or one method returning both). Use select(func.count())... β€” the existing repository tests show the pattern.
  2. :stats command. In the REPL loop (_repl), before treating input as a question, handle a :stats command that prints those counts. Reuse the same counts to improve the startup line: instead of only warning when the DB is empty, print something like N sources, M chunks loaded (keep the empty-DB warning).
  3. Verbose mode. Add a :verbose command that toggles a flag. When it's on, for each question call retrieval_service.get_relevant_chunks(question) and print each retrieved chunk's source URL, similarity score, and a content snippet (the first ~250 characters of chunk.content), then print the answer as usual.
  4. Document the new commands. The README covers the CLI basics (make cli, the ask> prompt, exit/Ctrl-D) but has no command reference. Add a short list of the REPL commands β€” :stats, :verbose, exit/quit β€” to the CLI usage section so the new commands are discoverable.

Notes

  • Keep DB queries in the Repository, not the CLI β€” mirror how _check_db already calls Repository.has_chunks.
  • Verbose mode re-runs retrieval separately from ask(), so the question gets embedded/retrieved twice. That's fine for a dev tool and keeps this ticket isolated β€” the alternative (returning the chunks out of ask()) would mean editing completion_service.py, which clashes with a different ticket. Don't go there.
  • Commands are simple :-prefixed inputs handled in the REPL loop before the input is sent to ask(); they never call the LLM.
  • To try :stats/:verbose by hand you need a populated DB β€” run make migrate + make ingest first.
  • No new dependencies.

βœ… Acceptance Criteria

  • :stats prints the current source and chunk counts.
  • Startup prints the loaded source/chunk counts, and still warns when the DB is empty.
  • :verbose toggles a mode that, for each question, shows the retrieved chunks with their source URLs, similarity scores, and a content snippet (first ~250 chars) before the answer.
  • All DB access goes through Repository; the CLI builds no raw queries. The retrieved-chunk display uses retrieval_service.get_relevant_chunks, and completion_service.py is left untouched.
  • The README documents the :stats and :verbose commands in the CLI usage section.
  • make test and make lint pass.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status
In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions