fix: Handle array of strings columns in Athena materialization#6324
Open
alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
Open
fix: Handle array of strings columns in Athena materialization#6324alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
Conversation
86036b7 to
c58ca8a
Compare
Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>
c58ca8a to
cf2f0c4
Compare
7 tasks
ntkathole
reviewed
Apr 25, 2026
| # Per-type default values substituted for None elements inside list columns. | ||
| # Only STRING_LIST uses ""; numeric/bytes types drop None entirely because | ||
| # there is no meaningful in-band sentinel (protobuf rejects wrong scalar types). | ||
| _LIST_TYPE_NONE_REPLACEMENT: Dict[ValueType, Any] = { |
Member
There was a problem hiding this comment.
@alan-gauthier-jt The approach used in https://github.com/feast-dev/feast/pull/6327/changes seems safer and preserve list length
ntkathole
reviewed
Apr 25, 2026
| none_replacement = _LIST_TYPE_NONE_REPLACEMENT.get(feast_value_type, _DROP_NONE) | ||
| if none_replacement is _DROP_NONE: | ||
| return [x for x in value if x is not None] | ||
| return [x if x is not None else none_replacement for x in value] |
Member
There was a problem hiding this comment.
if feast_value_type in _LIST_TYPE_NONE_REPLACEMENT instead?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it
Fixes two related bugs that cause
TypeErrorandValueErrorwhen materializingfeature views with
Array(String)columns using the Athena offline store.Arrow/Athena deserializes
Array(String)columns asnumpy.ndarray(object dtype)instead of plain Python lists. This breaks two code paths in
type_map.py:_convert_scalar_values_to_proto:pd.isnull(ndarray)returns an array of bools,and
not <array>raisesValueError: The truth value of an empty array is ambiguous.→ Already guarded by
_is_array_likein newer Feast versions; no change needed here._convert_list_values_to_proto(generic list path):proto_type(val=ndarray)passesthe raw numpy array to the protobuf constructor, which only accepts Python lists →
TypeError: bad argument type for built-in operation. Additionally, Arrow nullablecolumns can yield
Noneelements inside the ndarray, whichStringListalso rejects._validate_collection_item_types:Noneelements inside an ndarray failed thetype(item) in valid_typescheck before reaching the sanitization step.Changes
feast/type_map.pyAdd
_to_proto_safe_list(value)helper that:.tolist()on anynumpy.ndarrayto produce a plain Python listNoneelements with""(empty string), which protobufStringListacceptsUse
_to_proto_safe_listin the generic list conversion (end of_convert_list_values_to_proto) instead of passingvaluedirectly toproto_type.Skip
Noneelements in_validate_collection_item_types—Noneentries are validin nullable Arrow columns and are sanitized by
_to_proto_safe_listdownstream; raisinga
TypeErroron them before that point was incorrect.Testing
Added
TestArrowArrayStringListMaterializationinsdk/python/tests/unit/test_type_map.pycovering:test_to_proto_safe_list_ndarraytest_to_proto_safe_list_empty_ndarrayValueError)test_to_proto_safe_list_ndarray_with_noneNoneelements replaced with""test_to_proto_safe_list_plain_listtest_to_proto_safe_list_plain_list_with_noneNonein plain list also replacedtest_to_proto_safe_list_scalar_passthroughtest_string_list_from_ndarraypython_values_to_proto_valuestest_string_list_from_empty_ndarrayValueErrortest_string_list_from_ndarray_with_none_elementsNonein ndarray no longer raisesTypeErrortest_string_list_null_row_produces_empty_protoNonerows produce emptyProtoValuetest_mixed_batch_simulating_athena_chunkWhich issues this PR fixes
Fixes #6325
Does this PR introduce a user-facing change?
Yes — materialization of
Array(String)(and other array-typed) feature columns fromAthena no longer fails with
TypeErrororValueErrorwhen a batch contains emptyarrays,
Nonerows, orNoneelements inside arrays.