-
Notifications
You must be signed in to change notification settings - Fork 19.1k
fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) #29970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) #29970
Conversation
Summary of ChangesHello @nourzakhama2003, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where conversation titles generated from Persian (Farsi) input were sometimes incorrectly produced in other languages, such as Arabic. The changes introduce a more sophisticated approach to detect Persian input and guide the Large Language Model (LLM) to consistently generate titles in Persian, ensuring a better user experience for Farsi speakers. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a robust mechanism to ensure conversation titles for Persian inputs are generated correctly in Persian. The changes include adding multiple layers of language detection (character-specific regex, a word heuristic, and the langdetect library) and implementing a retry and translation fallback logic for the LLM call. The approach is solid and directly addresses the reported issue.
My review includes a few suggestions to further improve the code quality:
- Refactoring the
_contains_persianhelper function for better readability and performance. - Using more specific exceptions instead of the general
Exceptionfor LLM calls to make error handling more precise. - Strengthening the assertion in the new unit test to make it more robust.
Overall, this is a great contribution that significantly improves the handling of Persian language inputs.
tests/unit_tests/core/llm_generator/test_llm_generator_persian.py
Outdated
Show resolved
Hide resolved
…+ translation fallback
…uard JSON parsing; use specific invoke error handling; strengthen unit test
71dda58 to
fd71d90
Compare
…llback; avoid circular import by deferring ops import and adding fallback
…build by using raise_errors=False; remove dummy TraceQueueManager
1024fb7 to
9661d19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes Persian (Farsi) conversation title generation to ensure titles are consistently produced in Persian rather than Arabic or other languages, even when the input lacks Persian-specific characters. The fix addresses issue #29745 by implementing robust language detection and retry mechanisms.
Key Changes:
- Enhanced Persian language detection with multi-layered approach: character detection, word heuristics, and langdetect library fallback
- Implemented retry mechanism with stricter prompts and translation fallback to ensure Persian output
- Reduced LLM temperature from 1 to 0.2 for more deterministic title generation
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
api/core/llm_generator/llm_generator.py |
Core implementation with Persian detection function, retry logic, and translation fallback; moved ops_trace_manager import to function scope |
api/core/llm_generator/prompts.py |
Added Persian-specific instruction to the conversation title prompt |
api/pyproject.toml |
Added langdetect~=1.0.9 dependency |
api/uv.lock |
Updated lock file with langdetect dependency |
tests/unit_tests/core/llm_generator/test_llm_generator_persian.py |
New isolated test with extensive module stubs to verify Persian title generation |
api/tests/unit_tests/core/test_llm_generator_persian.py |
Integration-style tests verifying retry and translation fallback mechanisms |
docker/docker-compose.middleware.yaml |
Added proxy configuration and extra_hosts for plugin_daemon service |
api/core/app/entities/app_invoke_entities.py |
Added raise_errors=False to model_rebuild calls; removed unused import |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/unit_tests/core/llm_generator/test_llm_generator_persian.py
Outdated
Show resolved
Hide resolved
…n fallback; precompile regex; move langdetect import; robust JSON parsing; lower LLM temperature; add tests; resolve Copilot comments (langgenius#29745)
…s (_langdetect_available, _persian_chars_re); update tests
…ecker and avoid mypy attr errors
…pe checker (avoid object->str loads)
…oid possibly-unbound variable)
|
Summary of key changes |
|
cc @laipz8200 |
Important
Fixes #<issue number>.#closes
Bug in title of the conversations. #29745
Summary
Screenshots
##Summary
Fixes #29745 — ensure conversation titles generated from Persian (Farsi) input are produced in Persian (not Arabic or other languages), even when the input lacks Persian-only characters.
What I changed
Code
[llm_generator.py]
Add robust Persian detection: language fallback via [langdetect] + a lightweight Persian-word heuristic.
Lower LLM temperature to 0.2 for more deterministic title outputs.
Retry with a stricter Persian-only instruction and, if still not Persian, request a direct Persian translation as a last resort.
Improve logging and readability (exceptions logged, precompiled regex).
Dependency
[pyproject.toml]— add [langdetect~=1.0.9].
Tests
[test_llm_generator_persian.py]— new unit test verifying Persian input produces a Persian title. Test includes minimal, explicit test-only stubs to keep collection isolated from the full application.
Why this fixes the bug
Previous detection only looked for Persian-specific characters; short Farsi inputs without those characters were misclassified and could generate Arabic titles. This change:
Detects Persian robustly even on short inputs,
Forces the model to output Persian-only titles when applicable,
Uses a deterministic lower temperature and a controlled retry/translate fallback to avoid non-Persian outputs.
How to test locally
From repository root:
PowerShell:
& { $env:PYTHONPATH = "$PWD\api"; python -m pytest [test_llm_generator_persian.py] -q }
Run broader checks:
python -m ruff check api (or make lint in *nix env)
python -m pytest -q
Notes
The unit test contains a small [sys.path] insertion and light stubs to keep it isolated and fast; a short comment in the test explains the rationale.
No infra changes are included in this PR.
Checklist
Unit tests added
Linting passed locally
Manual verification of Persian titles in local UI
Checklist
dev/reformat(backend) andcd web && npx lint-staged(frontend) to appease the lint gods