Skip to content

fix(speakers): respect Auto-create Speakers setting end-to-end#7574

Merged
mdmohsin7 merged 1 commit into
mainfrom
fix/respect-auto-create-speakers
Jun 1, 2026
Merged

fix(speakers): respect Auto-create Speakers setting end-to-end#7574
mdmohsin7 merged 1 commit into
mainfrom
fix/respect-auto-create-speakers

Conversation

@mdmohsin7
Copy link
Copy Markdown
Member

Problem

Users who toggled off Auto-create Speakers still got new speaker/person records created during transcription. The setting was effectively dead-wired:

  • App: the autoCreateSpeakersEnabled pref was stored but never sent to the backend. The /v4/listen socket hardcoded speaker_auto_assign=enabled and nothing else.
  • Backend: on text-based name detection, the backend always called create_person for an unmatched name. speaker_auto_assign only gated whether the resulting person_id was disclosed to the client (a backward-compat capability flag, introduced in Fix speaker auto-assignment: backend-owned with cache refresh [#4554] #4580), not whether the person was created.

So a new speaker was created regardless of the user's choice.

Fix

Introduce a dedicated create_speakers flag, kept separate from the speaker_auto_assign capability flag.

  • App (transcription_service.dart): send create_speakers=<autoCreateSpeakersEnabled> on the listen socket. speaker_auto_assign=enabled stays hardcoded — it's a capability flag, not a user preference, and conflating them would break existing-person assignment/diarization display.
  • Backend (transcribe.py): thread create_speakers through listen_handler_listen_stream_handler. When a detected name has no existing match and create_speakers is false, skip create_person and don't populate the speaker maps — but still emit the detected name as a suggestion (empty person_id) so the user can tag it manually. Existing-person matches still assign as before.

Defaults to true, so old app clients (which don't send the param) and /v4/web/listen are unchanged.

Scope

Covers the live /v4/listen path (the reported symptom). Batch/pre-recorded and sync creation paths are not touched in this PR.

Test plan

  • Toggle off Auto-create Speakers, speak a new name → no new person created; name appears as a manual-tag suggestion.
  • Toggle on → new person auto-created (unchanged behavior).
  • Existing person name spoken with toggle off → still assigned.
  • Old app build (no create_speakers param) → unchanged (creates).

The app's 'Auto-create Speakers' toggle (autoCreateSpeakersEnabled) was stored
but never sent to the backend, and the backend created a person on text-based
name detection unconditionally — speaker_auto_assign only gated whether the
person_id was disclosed to the client, not whether the person was created. So
new speakers were always created regardless of the user's choice.

- app: send create_speakers=<pref> on the /v4/listen socket (separate from the
  speaker_auto_assign capability flag, which stays hardcoded enabled).
- backend: thread create_speakers through listen_handler -> _listen ->
  _stream_handler; skip create_person for unmatched names when disabled, still
  surfacing the detected name as a manual-tag suggestion. Defaults to true, so
  old clients and web listen are unchanged.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

This PR fixes the end-to-end enforcement of the "Auto-create Speakers" setting by introducing a dedicated create_speakers flag, kept intentionally separate from the existing speaker_auto_assign capability flag so that diarization display and existing-person matching remain unaffected.

  • Backend (transcribe.py): create_speakers: bool = True is threaded through listen_handler → _listen → _stream_handler. When a text-detected name has no existing match and create_speakers=false, user_db.create_person is skipped and person_id=None is set; a SpeakerLabelSuggestionEvent with person_id=\"\" is still emitted so the client can surface the name for manual tagging. Defaults to true for backward compat with old clients and /v4/web/listen.
  • Flutter (transcription_service.dart): &create_speakers=<autoCreateSpeakersEnabled> is appended to the /v4/listen query string. speaker_auto_assign=enabled continues to be hardcoded as a separate capability flag. The rest of the diff is mechanical line-length reformatting.

Confidence Score: 4/5

Safe to merge — the core fix correctly gates user_db.create_person on the new flag, backward compat defaults are in place, and existing-person matching is unaffected.

The logic change is narrow and well-isolated: the only new path is the else branch that sets person_id=None instead of calling create_person. The backward-compat default (True) means old clients and the web endpoint see no change. The one open question is whether the Flutter client handles repeated SpeakerLabelSuggestionEvent emissions with person_id="" gracefully when the same speaker name is detected across multiple segments — speaker_to_person_map is never populated for the unmatched case so there is no short-circuit for future segments.

backend/routers/transcribe.py around the speaker_to_person_map update block (lines 2258–2262) — the map is intentionally skipped when person_id=None, but this means future segments from the same speaker_id can re-trigger text detection and emit duplicate suggestions.

Important Files Changed

Filename Overview
backend/routers/transcribe.py Adds create_speakers: bool = True parameter through listen_handler → _listen → _stream_handler, correctly gating user_db.create_person while still emitting a suggestion event with person_id="" for manual tagging. Minor concern: speaker_to_person_map is not updated when person_id=None, so future segments from the same speaker_id don't get short-circuited and may emit repeated suggestion events.
app/lib/services/sockets/transcription_service.dart Appends &create_speakers=<autoCreateSpeakersEnabled> to the WebSocket query string. Dart's bool.toString() produces lowercase "true"/"false" which FastAPI's bool query-param parser accepts. speaker_auto_assign=enabled remains hardcoded as a separate capability flag. Remaining diff is mechanical reformatting with no logic change.

Sequence Diagram

sequenceDiagram
    participant App as Flutter App
    participant WS as /v4/listen WebSocket
    participant LH as listen_handler
    participant SH as _stream_handler
    participant DB as user_db

    App->>WS: "connect(?speaker_auto_assign=enabled&create_speakers=false)"
    WS->>LH: "listen_handler(speaker_auto_assign="enabled", create_speakers=False)"
    LH->>SH: "_stream_handler(speaker_auto_assign_enabled=True, create_speakers=False)"

    Note over SH: Audio segment arrives, text-based detection runs
    SH->>DB: get_person_by_name(uid, detected_name)
    alt Person exists
        DB-->>SH: person record
        SH->>App: "SpeakerLabelSuggestionEvent(person_id=id, person_name=name)"
        SH->>SH: update speaker_to_person_map and segment_person_assignment_map
    else "create_speakers=True (default)"
        DB-->>SH: null
        SH->>DB: "create_person(uid, {id, name, ...})"
        SH->>App: "SpeakerLabelSuggestionEvent(person_id=new_id, person_name=name)"
        SH->>SH: update speaker_to_person_map and segment_person_assignment_map
    else "create_speakers=False (new path)"
        DB-->>SH: null
        SH->>App: "SpeakerLabelSuggestionEvent(person_id="", person_name=name)"
        Note over SH: Maps NOT updated — person_id=None
    end
Loading

Reviews (1): Last reviewed commit: "fix(speakers): respect Auto-create Speak..." | Re-trigger Greptile

Comment on lines +2258 to 2262
if person_id:
if should_update_speaker_to_person_map(segment.speaker_id):
speaker_to_person_map[segment.speaker_id] = (person_id, detected_name)
segment_person_assignment_map[segment.id] = person_id
suggested_segments.add(segment.id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Repeated suggestions for same speaker when create_speakers=False

When person_id is None (unmatched name, auto-create disabled), speaker_to_person_map is not updated for that speaker_id. This means every subsequent segment from the same speaker whose text also triggers detect_speaker_from_text will fall through the map lookup and emit another SpeakerLabelSuggestionEvent(person_id="", person_name=detected_name). With create_speakers=True the map entry short-circuits future segments, but here it stays empty, so the client may receive many duplicate unresolved suggestions for the same speaker within a single session. Consider storing a sentinel in speaker_to_person_map (e.g. (None, detected_name)) so subsequent segments from that speaker_id are handled via the map path and only one suggestion is emitted.

@mdmohsin7 mdmohsin7 merged commit 0c00e32 into main Jun 1, 2026
3 checks passed
@mdmohsin7 mdmohsin7 deleted the fix/respect-auto-create-speakers branch June 1, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant