Skip to content

InMemoryMemoryService search doesn't work with non-Latin text (Japanese, CJK, etc.) #5501

@yodeee9

Description

@yodeee9

I was building a demo with InMemoryMemoryService and noticed that search_memory returns nothing when the conversation is in Japanese.

After digging in, the cause is _extract_words_lower in in_memory_memory_service.py:

return set([word.lower() for word in re.findall(r'[A-Za-z]+', text)])

This regex only matches ASCII letters, so any Japanese/Chinese/Korean/Cyrillic text gets completely dropped. For example:

_extract_words_lower("私の名前は太郎です")  # returns set()
_extract_words_lower("太郎 works at ABC")   # returns {'works', 'at', 'abc'} — 太郎 is lost

This means if you store a session in Japanese and then search for it, you'll never get a match.

A simple fix would be changing [A-Za-z]+ to \w+ with re.UNICODE, which matches word characters across all scripts.

ADK 1.31.1, Python 3.11

Metadata

Metadata

Assignees

Labels

request clarification[Status] The maintainer need clarification or more information from the authorservices[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions