-
Notifications
You must be signed in to change notification settings - Fork 107
Open
Description
Summary
PyAthena appears to deserialize nested varchar values inside complex types (array(row(...))) as Python int when the string is numeric-looking.
In Athena, the type is clearly varchar, but PyAthena returns int in the parsed Python object.
Environment
- PyAthena:
3.29.3 - Python:
3.12 - Cursor:
pyathena.cursor.DictCursor - Workgroup: Athena (Presto/Trino engine in AWS Athena)
Repro
#!/usr/bin/env python3
from pyathena import connect
from pyathena.cursor import DictCursor
connection = connect(
work_group="integrations_workgroup",
cursor_class=DictCursor,
)
cursor = connection.cursor()
cursor.execute(
"""
WITH t AS (
SELECT *
FROM (
VALUES (
ARRAY [
CAST(ROW('1234') AS ROW(string varchar)),
CAST(ROW('value') AS ROW(string varchar))
]
)
) AS v(field)
)
SELECT
field,
json_format(CAST(field AS JSON)) AS field_json
FROM t
"""
)
result = cursor.fetchall()
print(f"Result: {result}")
print(f"Expected string here: {result[0]['field'][0]['string']}")Result is
Result: [{'field': [{'string': 1234}, {'string': 'value'}], 'field_json': '[{"string":"1234"},{"string":"value"}]'}]
Expected string here: 1234
The types of the result are lost and it seem that pyathena assumes type based on the values.
Not sure if there is a way to get the expected types from Athena, in my real case they are stored in the table definition, the WITH...SELECT above is just written up as an example for this report.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels