fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) #29970

nourzakhama2003 · 2025-12-22T01:34:18Z

Important

Make sure you have read our contribution guidelines
Ensure there is an associated issue and you have been assigned to it
Use the correct syntax to link this PR: Fixes #<issue number>.
#closes
Bug in title of the conversations. #29745

Summary

Screenshots

##Summary

Fixes #29745 — ensure conversation titles generated from Persian (Farsi) input are produced in Persian (not Arabic or other languages), even when the input lacks Persian-only characters.
What I changed

Code
[llm_generator.py]
Add robust Persian detection: language fallback via [langdetect] + a lightweight Persian-word heuristic.
Lower LLM temperature to 0.2 for more deterministic title outputs.
Retry with a stricter Persian-only instruction and, if still not Persian, request a direct Persian translation as a last resort.
Improve logging and readability (exceptions logged, precompiled regex).
Dependency
[pyproject.toml]— add [langdetect~=1.0.9].
Tests
[test_llm_generator_persian.py]— new unit test verifying Persian input produces a Persian title. Test includes minimal, explicit test-only stubs to keep collection isolated from the full application.
Why this fixes the bug

Previous detection only looked for Persian-specific characters; short Farsi inputs without those characters were misclassified and could generate Arabic titles. This change:
Detects Persian robustly even on short inputs,
Forces the model to output Persian-only titles when applicable,
Uses a deterministic lower temperature and a controlled retry/translate fallback to avoid non-Persian outputs.
How to test locally

From repository root:
PowerShell:
& { $env:PYTHONPATH = "$PWD\api"; python -m pytest [test_llm_generator_persian.py] -q }
Run broader checks:
python -m ruff check api (or make lint in *nix env)
python -m pytest -q
Notes

The unit test contains a small [sys.path] insertion and light stubs to keep it isolated and fast; a short comment in the test explains the rationale.
No infra changes are included in this PR.
Checklist

Unit tests added
Linting passed locally
Manual verification of Persian titles in local UI

Before	After
...	...

Checklist

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

…ng lines)

…nd performance

gemini-code-assist · 2025-12-22T01:34:33Z

Summary of Changes

Hello @nourzakhama2003, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where conversation titles generated from Persian (Farsi) input were sometimes incorrectly produced in other languages, such as Arabic. The changes introduce a more sophisticated approach to detect Persian input and guide the Large Language Model (LLM) to consistently generate titles in Persian, ensuring a better user experience for Farsi speakers.

Highlights

Enhanced Persian Language Detection: Implemented a robust Persian language detection mechanism for user input, combining checks for Persian-specific characters, langdetect library for broader language identification, and a heuristic regex for common Persian words, ensuring accurate identification even for short inputs.
Improved LLM Title Generation for Persian: Modified the LLM prompt to explicitly instruct the model to generate titles in Persian (Farsi) when the input is detected as Persian. This includes specifying the use of Persian characters and avoiding other languages.
Deterministic LLM Output and Retry Logic: Lowered the LLM temperature to 0.2 for more deterministic title outputs. A retry mechanism is in place: if the initial generation for a Persian input does not produce a Persian title, the LLM is re-invoked with a stricter instruction. As a last resort, the model attempts to translate the non-Persian title into Persian.
New Unit Test for Persian Titles: Added a dedicated unit test (test_llm_generator_persian.py) to verify that Persian input correctly leads to Persian conversation titles, using lightweight stubs to isolate the test from the full application.
Dependency Update: The langdetect library (version ~1.0.9) has been added to pyproject.toml to support the new language detection capabilities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a robust mechanism to ensure conversation titles for Persian inputs are generated correctly in Persian. The changes include adding multiple layers of language detection (character-specific regex, a word heuristic, and the langdetect library) and implementing a retry and translation fallback logic for the LLM call. The approach is solid and directly addresses the reported issue.

My review includes a few suggestions to further improve the code quality:

Refactoring the _contains_persian helper function for better readability and performance.
Using more specific exceptions instead of the general Exception for LLM calls to make error handling more precise.
Strengthening the assertion in the new unit test to make it more robust.

Overall, this is a great contribution that significantly improves the handling of Persian language inputs.

api/core/llm_generator/llm_generator.py

tests/unit_tests/core/llm_generator/test_llm_generator_persian.py

…+ translation fallback

…uard JSON parsing; use specific invoke error handling; strengthen unit test

…llback; avoid circular import by deferring ops import and adding fallback

…build by using raise_errors=False; remove dummy TraceQueueManager

Copilot

Pull request overview

This PR fixes Persian (Farsi) conversation title generation to ensure titles are consistently produced in Persian rather than Arabic or other languages, even when the input lacks Persian-specific characters. The fix addresses issue #29745 by implementing robust language detection and retry mechanisms.

Key Changes:

Enhanced Persian language detection with multi-layered approach: character detection, word heuristics, and langdetect library fallback
Implemented retry mechanism with stricter prompts and translation fallback to ensure Persian output
Reduced LLM temperature from 1 to 0.2 for more deterministic title generation

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
`api/core/llm_generator/llm_generator.py`	Core implementation with Persian detection function, retry logic, and translation fallback; moved ops_trace_manager import to function scope
`api/core/llm_generator/prompts.py`	Added Persian-specific instruction to the conversation title prompt
`api/pyproject.toml`	Added langdetect~=1.0.9 dependency
`api/uv.lock`	Updated lock file with langdetect dependency
`tests/unit_tests/core/llm_generator/test_llm_generator_persian.py`	New isolated test with extensive module stubs to verify Persian title generation
`api/tests/unit_tests/core/test_llm_generator_persian.py`	Integration-style tests verifying retry and translation fallback mechanisms
`docker/docker-compose.middleware.yaml`	Added proxy configuration and extra_hosts for plugin_daemon service
`api/core/app/entities/app_invoke_entities.py`	Added raise_errors=False to model_rebuild calls; removed unused import

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/unit_tests/core/llm_generator/test_llm_generator_persian.py

api/core/llm_generator/llm_generator.py

docker/docker-compose.middleware.yaml

api/core/llm_generator/llm_generator.py

api/core/llm_generator/prompts.py

api/core/llm_generator/llm_generator.py

tests/unit_tests/core/llm_generator/test_llm_generator_persian.py

api/tests/unit_tests/core/test_llm_generator_persian.py

…n fallback; precompile regex; move langdetect import; robust JSON parsing; lower LLM temperature; add tests; resolve Copilot comments (langgenius#29745)

…s (_langdetect_available, _persian_chars_re); update tests

…ecker and avoid mypy attr errors

…pe checker (avoid object->str loads)

…oid possibly-unbound variable)

nourzakhama2003 · 2025-12-22T18:30:41Z

Summary of key changes
Files updated: [llm_generator.py], [prompts.py], tests
Main fixes: robust Persian detection (char + heuristic + langdetect fallback), retry + translation fallback, lowered LLM temperature, precompiled regex, safer JSON parsing, and narrower exception handling.
Tests added: unit tests for [_contains_persian], retry prompt content, translation fallback, and InvokeError handling.
What I resolved from Copilot comments
Moved langdetect import to module scope and seeded DetectorFactory
Precompiled regex and exposed [_contains_persian] for testability
Reintroduced robust JSON parsing with guarded [json_repair
Added/updated tests and fixed style/type issues raised by CI
Please review the changes in fix/29745-persian-conversation-title. If everything looks good, a quick LGTM / approve and merge would be great thanks again

crazywoola · 2025-12-24T01:50:12Z

cc @laipz8200

nourzakhama2003 added 2 commits December 22, 2025 02:20

style(llm-generator): fix lint issues (log langdetect errors, wrap lo…

0846542

…ng lines)

style(llm-generator): precompile Persian-word heuristic for clarity a…

a7a074e

…nd performance

nourzakhama2003 requested review from QuantumGhost, Yeuoly, crazywoola and laipz8200 as code owners December 22, 2025 01:34

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 💪 enhancement New feature or request labels Dec 22, 2025

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

nourzakhama2003 added 4 commits December 22, 2025 02:48

chore: refresh PR status after conflict resolution

3a717fe

fix(llm-generator): resolve PR conflict and keep retry + JSON-repair …

ce02e9c

…+ translation fallback

fix(llm-generator): make langdetect import dynamic for type checks; g…

ef50d44

…uard JSON parsing; use specific invoke error handling; strengthen unit test

chore: include unrelated workspace edits (prompts/uv.lock/middleware)

fd71d90

nourzakhama2003 changed the title ~~Fix/29745 persian conversation title~~ fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) Dec 22, 2025

nourzakhama2003 force-pushed the fix/29745-persian-conversation-title branch from 71dda58 to fd71d90 Compare December 22, 2025 04:01

nourzakhama2003 and others added 5 commits December 22, 2025 05:16

chore: mark resolved llm_generator.py (no content change)

046c791

Merge branch 'main' into fix/29745-persian-conversation-title

25af2de

[autofix.ci] apply automated fixes

5d6aac9

fix(llm-generator): use last non-Persian candidate for translation fa…

5a6ac9e

…llback; avoid circular import by deferring ops import and adding fallback

fix(types): avoid runtime forward-ref resolution in pydantic model_re…

9661d19

…build by using raise_errors=False; remove dummy TraceQueueManager

nourzakhama2003 force-pushed the fix/29745-persian-conversation-title branch from 1024fb7 to 9661d19 Compare December 22, 2025 05:01

nourzakhama2003 and others added 3 commits December 22, 2025 06:55

Merge branch 'main' into fix/29745-persian-conversation-title

d126979

Merge branch 'main' into fix/29745-persian-conversation-title

2781f5d

Merge branch 'main' into fix/29745-persian-conversation-title

5dc6fed

crazywoola requested a review from Copilot December 22, 2025 13:39

Copilot started reviewing on behalf of crazywoola December 22, 2025 13:40 View session

Copilot AI reviewed Dec 22, 2025

View reviewed changes

Fix: Persian conversation titles robust detection, retry & translatio…

a177097

…n fallback; precompile regex; move langdetect import; robust JSON parsing; lower LLM temperature; add tests; resolve Copilot comments (langgenius#29745)

dosubot bot removed the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 22, 2025

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 22, 2025

autofix-ci bot and others added 7 commits December 22, 2025 14:00

[autofix.ci] apply automated fixes

36be4ac

Merge branch 'main' into fix/29745-persian-conversation-title

3c9907e

Fix: rename module flags to avoid pyright constant redefinition error…

8dbae53

…s (_langdetect_available, _persian_chars_re); update tests

Fix: safely call json_repair functions via getattr to satisfy type ch…

188ddc7

…ecker and avoid mypy attr errors

Merge branch 'main' into fix/29745-persian-conversation-title

364216a

Fix: ensure json_repair returns are validated and typed to satisfy ty…

b83f550

…pe checker (avoid object->str loads)

Fix: ensure parsed_content is initialized to satisfy type checker (av…

effa483

…oid possibly-unbound variable)

Merge branch 'main' into fix/29745-persian-conversation-title

97f750f

fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) #29970

Are you sure you want to change the base?

fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) #29970

Uh oh!

Conversation

nourzakhama2003 commented Dec 22, 2025

Summary

Screenshots

Checklist

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nourzakhama2003 commented Dec 22, 2025

Uh oh!

crazywoola commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants