Skip to content

Nested varchar in ARRAY<ROW<...>> is deserialized as int in DictCursor results #689

@kimhanse

Description

@kimhanse

Summary

PyAthena appears to deserialize nested varchar values inside complex types (array(row(...))) as Python int when the string is numeric-looking.

In Athena, the type is clearly varchar, but PyAthena returns int in the parsed Python object.

Environment

  • PyAthena: 3.29.3
  • Python: 3.12
  • Cursor: pyathena.cursor.DictCursor
  • Workgroup: Athena (Presto/Trino engine in AWS Athena)

Repro

#!/usr/bin/env python3
from pyathena import connect
from pyathena.cursor import DictCursor
connection = connect(
    work_group="integrations_workgroup",
    cursor_class=DictCursor,
)
cursor = connection.cursor()
cursor.execute(
    """
    WITH t AS (
        SELECT *
        FROM (
            VALUES (
                ARRAY [
                    CAST(ROW('1234') AS ROW(string varchar)),
                    CAST(ROW('value') AS ROW(string varchar))
                ]
            )
        ) AS v(field)
    )
    SELECT
        field,
        json_format(CAST(field AS JSON)) AS field_json
    FROM t
    """
)
result = cursor.fetchall()
print(f"Result: {result}")
print(f"Expected string here: {result[0]['field'][0]['string']}")

Result is

Result: [{'field': [{'string': 1234}, {'string': 'value'}], 'field_json': '[{"string":"1234"},{"string":"value"}]'}]
Expected string here: 1234

The types of the result are lost and it seem that pyathena assumes type based on the values.

Not sure if there is a way to get the expected types from Athena, in my real case they are stored in the table definition, the WITH...SELECT above is just written up as an example for this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions