fix: preserve 'NA' string values in CSV parsing instead of converting to None#1058
fix: preserve 'NA' string values in CSV parsing instead of converting to None#1058devin-ai-integration[bot] wants to merge 3 commits into
Conversation
… to None pandas.read_csv() by default treats strings like 'NA', 'N/A', 'NaN' as NaN values (via keep_default_na=True). This caused legitimate string values like 'NA' (North America) in Salesforce fields to be incorrectly converted to None/NULL during bulk API CSV parsing. Adding keep_default_na=False prevents pandas from interpreting these strings as missing values. Empty CSV fields are still correctly converted to None via explicit empty string replacement. Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1782155104-fix-na-string-preservation#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1782155104-fix-na-string-preservationPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
|
Reviewed the fix — it is correct and well-targeted. One suggestion on the tests: The two added tests only exercise I verified the fix against pandas 2.2.3 and it preserves all of them:
Suggestion: extend The |
Covers NA, N/A, NULL, None, nan, null, #N/A, <NA> per reviewer feedback. Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>
|
Anatolii Yatsuk (@tolik0) ☑️ Resolved in 6e4989f. Extended Agreed on the |
Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>
|
/prerelease
|
There was a problem hiding this comment.
Pull request overview
This PR updates the declarative ResponseToFileExtractor CSV parsing behavior to avoid Pandas’ default conversion of common “NA-like” string tokens (e.g., "NA", "N/A", "null") into NaN/None, which can silently corrupt legitimate source values (e.g., Salesforce picklists).
Changes:
- Pass
keep_default_na=Falsetopd.read_csv(...)so NA-like string tokens are preserved as literal strings. - Explicitly map empty-string cells (
""/ empty fields) toNoneto preserve prior behavior for missing values. - Add unit tests covering both preservation of NA-like tokens and conversion of empty fields to
None.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py |
Disables Pandas default NA token coercion and normalizes empty strings to None during chunk conversion. |
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py |
Adds regression tests ensuring NA-like strings are preserved and empty fields still become None. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
ResponseToFileExtractor._read_with_chunksusespd.read_csv()which by default interprets strings like'NA','N/A','NaN','null'as NaN values (keep_default_na=True). This causes legitimate field values — e.g.'NA'meaning "North America" in a Salesforce picklist — to be silently converted toNone/NULL in synced data.Fix: pass
keep_default_na=Falseso pandas treats all cell values as literal strings (since we already setdtype=object). Empty CSV fields (which pandas would previously auto-tag as NaN) are now explicitly converted toNonevia.replace({"": None})to maintain backward-compatible behavior.Link to Devin session: https://app.devin.ai/sessions/59e21b90c6c34df7bcae4bddc1947b44