Skip to content

Conversation

@nourzakhama2003
Copy link
Contributor

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.
    #closes
    Bug in title of the conversations. #29745

Summary

Screenshots

Capture d'écran 2025-12-22 020221

##Summary

Fixes #29745 — ensure conversation titles generated from Persian (Farsi) input are produced in Persian (not Arabic or other languages), even when the input lacks Persian-only characters.
What I changed

Code
[llm_generator.py]
Add robust Persian detection: language fallback via [langdetect] + a lightweight Persian-word heuristic.
Lower LLM temperature to 0.2 for more deterministic title outputs.
Retry with a stricter Persian-only instruction and, if still not Persian, request a direct Persian translation as a last resort.
Improve logging and readability (exceptions logged, precompiled regex).
Dependency
[pyproject.toml]— add [langdetect~=1.0.9].
Tests
[test_llm_generator_persian.py]— new unit test verifying Persian input produces a Persian title. Test includes minimal, explicit test-only stubs to keep collection isolated from the full application.
Why this fixes the bug

Previous detection only looked for Persian-specific characters; short Farsi inputs without those characters were misclassified and could generate Arabic titles. This change:
Detects Persian robustly even on short inputs,
Forces the model to output Persian-only titles when applicable,
Uses a deterministic lower temperature and a controlled retry/translate fallback to avoid non-Persian outputs.
How to test locally

From repository root:
PowerShell:
& { $env:PYTHONPATH = "$PWD\api"; python -m pytest [test_llm_generator_persian.py] -q }
Run broader checks:
python -m ruff check api (or make lint in *nix env)
python -m pytest -q
Notes

The unit test contains a small [sys.path] insertion and light stubs to keep it isolated and fast; a short comment in the test explains the rationale.
No infra changes are included in this PR.
Checklist

Unit tests added
Linting passed locally
Manual verification of Persian titles in local UI

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 💪 enhancement New feature or request labels Dec 22, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @nourzakhama2003, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where conversation titles generated from Persian (Farsi) input were sometimes incorrectly produced in other languages, such as Arabic. The changes introduce a more sophisticated approach to detect Persian input and guide the Large Language Model (LLM) to consistently generate titles in Persian, ensuring a better user experience for Farsi speakers.

Highlights

  • Enhanced Persian Language Detection: Implemented a robust Persian language detection mechanism for user input, combining checks for Persian-specific characters, langdetect library for broader language identification, and a heuristic regex for common Persian words, ensuring accurate identification even for short inputs.
  • Improved LLM Title Generation for Persian: Modified the LLM prompt to explicitly instruct the model to generate titles in Persian (Farsi) when the input is detected as Persian. This includes specifying the use of Persian characters and avoiding other languages.
  • Deterministic LLM Output and Retry Logic: Lowered the LLM temperature to 0.2 for more deterministic title outputs. A retry mechanism is in place: if the initial generation for a Persian input does not produce a Persian title, the LLM is re-invoked with a stricter instruction. As a last resort, the model attempts to translate the non-Persian title into Persian.
  • New Unit Test for Persian Titles: Added a dedicated unit test (test_llm_generator_persian.py) to verify that Persian input correctly leads to Persian conversation titles, using lightweight stubs to isolate the test from the full application.
  • Dependency Update: The langdetect library (version ~1.0.9) has been added to pyproject.toml to support the new language detection capabilities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust mechanism to ensure conversation titles for Persian inputs are generated correctly in Persian. The changes include adding multiple layers of language detection (character-specific regex, a word heuristic, and the langdetect library) and implementing a retry and translation fallback logic for the LLM call. The approach is solid and directly addresses the reported issue.

My review includes a few suggestions to further improve the code quality:

  • Refactoring the _contains_persian helper function for better readability and performance.
  • Using more specific exceptions instead of the general Exception for LLM calls to make error handling more precise.
  • Strengthening the assertion in the new unit test to make it more robust.

Overall, this is a great contribution that significantly improves the handling of Persian language inputs.

@nourzakhama2003 nourzakhama2003 changed the title Fix/29745 persian conversation title fix(llm-generator): ensure Persian conversation titles for Farsi input (#29745) Dec 22, 2025
@nourzakhama2003 nourzakhama2003 force-pushed the fix/29745-persian-conversation-title branch from 71dda58 to fd71d90 Compare December 22, 2025 04:01
@nourzakhama2003 nourzakhama2003 force-pushed the fix/29745-persian-conversation-title branch from 1024fb7 to 9661d19 Compare December 22, 2025 05:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Persian (Farsi) conversation title generation to ensure titles are consistently produced in Persian rather than Arabic or other languages, even when the input lacks Persian-specific characters. The fix addresses issue #29745 by implementing robust language detection and retry mechanisms.

Key Changes:

  • Enhanced Persian language detection with multi-layered approach: character detection, word heuristics, and langdetect library fallback
  • Implemented retry mechanism with stricter prompts and translation fallback to ensure Persian output
  • Reduced LLM temperature from 1 to 0.2 for more deterministic title generation

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
api/core/llm_generator/llm_generator.py Core implementation with Persian detection function, retry logic, and translation fallback; moved ops_trace_manager import to function scope
api/core/llm_generator/prompts.py Added Persian-specific instruction to the conversation title prompt
api/pyproject.toml Added langdetect~=1.0.9 dependency
api/uv.lock Updated lock file with langdetect dependency
tests/unit_tests/core/llm_generator/test_llm_generator_persian.py New isolated test with extensive module stubs to verify Persian title generation
api/tests/unit_tests/core/test_llm_generator_persian.py Integration-style tests verifying retry and translation fallback mechanisms
docker/docker-compose.middleware.yaml Added proxy configuration and extra_hosts for plugin_daemon service
api/core/app/entities/app_invoke_entities.py Added raise_errors=False to model_rebuild calls; removed unused import

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…n fallback; precompile regex; move langdetect import; robust JSON parsing; lower LLM temperature; add tests; resolve Copilot comments (langgenius#29745)
@dosubot dosubot bot removed the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 22, 2025
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 22, 2025
@nourzakhama2003
Copy link
Contributor Author

Summary of key changes
Files updated: [llm_generator.py], [prompts.py], tests
Main fixes: robust Persian detection (char + heuristic + langdetect fallback), retry + translation fallback, lowered LLM temperature, precompiled regex, safer JSON parsing, and narrower exception handling.
Tests added: unit tests for [_contains_persian], retry prompt content, translation fallback, and InvokeError handling.
What I resolved from Copilot comments
Moved langdetect import to module scope and seeded DetectorFactory
Precompiled regex and exposed [_contains_persian] for testability
Reintroduced robust JSON parsing with guarded [json_repair
Added/updated tests and fixed style/type issues raised by CI
Please review the changes in fix/29745-persian-conversation-title. If everything looks good, a quick LGTM / approve and merge would be great thanks again

@crazywoola
Copy link
Member

cc @laipz8200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💪 enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in title of the conversations.

2 participants