fix(artifacts): Preserve .text on GcsArtifactService load (#3157)#4541
fix(artifacts): Preserve .text on GcsArtifactService load (#3157)#4541wpn10 wants to merge 3 commits intogoogle:mainfrom
Conversation
Store _adk_is_text metadata flag on GCS blobs for text artifacts and use it on load to reconstruct as Part(text=...) instead of Part.from_bytes(). Switch to get_blob() to fetch blob metadata.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @wpn10, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in the GcsArtifactService where text artifacts would lose their '.text' attribute upon loading, forcing access through '.inline_data'. The solution introduces a custom metadata flag on GCS blobs to explicitly mark text artifacts, enabling the service to correctly reconstruct them with their original text content. This ensures data integrity and consistent behavior for text-based artifacts stored in Google Cloud Storage. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @wpn10, thank you for creating this PR! It looks like the Contributor License Agreement (CLA) has not been signed. Could you please sign it to allow us to proceed with the review? You can find more details in the "cla/google" check at the bottom of this PR. Thanks! |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses the issue of preserving the .text attribute for text artifacts in GcsArtifactService by using blob metadata. The changes to use get_blob and add the _adk_is_text flag are well-implemented. I've added a couple of suggestions for improvement regarding an edge case with empty artifacts and enhancing the new test case.
| if not artifact_bytes: | ||
| return None |
There was a problem hiding this comment.
| async def test_save_load_text_artifact(service_type, artifact_service_factory): | ||
| """Tests that text artifacts retain .text after round-trip save/load.""" | ||
| artifact_service = artifact_service_factory(service_type) | ||
| artifact = types.Part.from_text(text='{"key": "value"}') | ||
|
|
||
| await artifact_service.save_artifact( | ||
| app_name="app0", | ||
| user_id="user0", | ||
| session_id="123", | ||
| filename="data.json", | ||
| artifact=artifact, | ||
| ) | ||
| loaded = await artifact_service.load_artifact( | ||
| app_name="app0", | ||
| user_id="user0", | ||
| session_id="123", | ||
| filename="data.json", | ||
| ) | ||
| assert loaded is not None | ||
| assert loaded.text == '{"key": "value"}' | ||
| assert loaded.inline_data is None |
There was a problem hiding this comment.
To improve test coverage, consider parametrizing this test to include various text contents, especially an empty string. This would help catch edge cases like handling empty artifacts.
You can add another @pytest.mark.parametrize decorator above this function, like so:
@pytest.mark.parametrize(
"text_content",
['{"key": "value"}', "some other text", ""],
)Then, update the test function to accept and use the text_content parameter.
async def test_save_load_text_artifact(service_type, artifact_service_factory, text_content):
"""Tests that text artifacts retain .text after round-trip save/load."""
artifact_service = artifact_service_factory(service_type)
artifact = types.Part.from_text(text=text_content)
await artifact_service.save_artifact(
app_name="app0",
user_id="user0",
session_id="123",
filename="data.json",
artifact=artifact,
)
loaded = await artifact_service.load_artifact(
app_name="app0",
user_id="user0",
session_id="123",
filename="data.json",
)
assert loaded is not None
assert loaded.text == text_content
assert loaded.inline_data is None|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the issue of preserving the .text attribute for text artifacts stored in GCS. The solution of adding a metadata flag _adk_is_text is well-implemented, ensuring that text artifacts are correctly reconstructed upon loading. The switch to bucket.get_blob() is appropriate for fetching the necessary metadata. The accompanying tests are thorough, covering both standard and edge cases like empty text artifacts, which validates the fix. I have one minor suggestion to improve maintainability by using a constant for the metadata key.
| ) | ||
| elif artifact.text: | ||
| elif artifact.text is not None: | ||
| blob.metadata = {**(blob.metadata or {}), "_adk_is_text": "true"} |
There was a problem hiding this comment.
To improve maintainability and avoid magic strings, consider defining "_adk_is_text" as a module-level constant, as it's used in both _save_artifact and _load_artifact.
For example, at the top of the file:
_IS_TEXT_METADATA_KEY = "_adk_is_text"You can then use this constant here and in _load_artifact.
|
Hi @wpn10 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @DeanChensj , can you please review this. LGTM. |
Store _adk_is_text metadata flag on GCS blobs for text artifacts and use it on load to reconstruct as Part(text=...) instead of Part.from_bytes(). Switch to get_blob() to fetch blob metadata.
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
Testing Plan
Added
test_save_load_text_artifactparametrized across all 3 artifact service backends (InMemory, GCS, File). Verifies.textsurvives round-trip and.inline_dataisNone.Unit Tests:
pytest tests/unittests/artifacts/ -v
47 passed in 2.36s
Manual End-to-End (E2E) Tests:
Not applicable — this is an internal service fix with no UI component. The bug is fully reproducible and verifiable through unit tests.
Checklist
Additional context
Problem: GcsArtifactService._load_artifact() always uses Part.from_bytes() to reconstruct artifacts. Text artifacts saved via Part.from_text() lose their .text attribute it returns None with data only accessible through .inline_data.
Solution: Store an _adk_is_text: "true" flag in the GCS blob's custom metadata when saving text artifacts. On load, check for that flag and reconstruct as Part(text=...) instead of Part.from_bytes(...). Also switch from bucket.blob() to bucket.get_blob() so blob metadata is populated (same pattern already used in _get_artifact_version_sync). Backward compatible old blobs without the flag continue loading as before.