Skip to content

Python driver: Double quotes at the end of string values are wrongfully removed #2418

@gej5sgm

Description

@gej5sgm

Describe the bug

The ResultVisitor.visitStringValue() method in apache-age-python uses .strip('"') to remove surrounding quotes from agtype string tokens. Python's str.strip() removes all matching characters from both ends, not just one. This causes data corruption when a string property value starts or ends with an escaped double quote (\"), because the " that is part of the actual data gets stripped along with the delimiter quote. The same bug exists in visitPair().

For example, a property value foo "bar" is serialized by AGE as the agtype token "foo \"bar\"". Calling .strip('"') produces foo \"bar\ instead of the correct foo \"bar\".

The fix is to replace .strip('"') with [1:-1] to remove exactly the first and last character.
It has to be checked whether there are cases when no leading or trailing double quotes exist, which would mean the fix would wrongfully delete content in that case.

How are you accessing AGE (Command line, driver, etc.)?

  • apache-age-python driver (via psycopg + AGE Python client)

What data setup do we need to do?

SELECT * FROM cypher('test_graph', $$
  CREATE (a:TestNode {name: 'This value ends with a "quote"'})
$$) AS (a agtype);

What is the necessary configuration info needed?

  • PostgreSQL with Apache AGE extension
  • apache-age-python package (tested with latest PyPI version)

What is the command that caused the error?

import age

age.setUpAge(conn, "test_graph")

with conn.cursor() as cursor:
    cursor.execute("""
        SELECT * FROM cypher('test_graph', $$
            MATCH (a:TestNode) RETURN a.name
        $$) AS (name agtype);
    """)
    for row in cursor:
        result = age.parseAgeValue(row[0])
        print(repr(result))
        # Expected: 'This value ends with a "quote"'
        # Actual:   'This value ends with a "quote\\'

The root cause in builder.py:

# Current (broken):
def visitStringValue(self, ctx:AgtypeParser.StringValueContext):
    return ctx.STRING().getText().strip('"')

Expected behavior

String property values containing double quotes should survive a round-trip (write → read) without data loss. visitStringValue() and visitPair() should remove exactly the first and last delimiter characters, not strip all matching characters from both ends.

Environment (please complete the following information):

  • AGE: 1.6.0
  • apache-age-python: latest (PyPI)
  • Python: 3.12
  • PostgreSQL: 16

Additional context

The visitPair() method has the same issue when parsing map keys from agtype objects:

# Also broken:
def visitPair(self, ctx:AgtypeParser.PairContext):
    self.visitChildren(ctx)
    return (ctx.STRING().getText().strip('"'), ctx.agValue())

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions