fix: preserve 'NA' string values in CSV parsing instead of converting to None by devin-ai-integration[bot] · Pull Request #1058 · airbytehq/airbyte-python-cdk

devin-ai-integration · 2026-06-22T19:05:31Z

Summary

ResponseToFileExtractor._read_with_chunks uses pd.read_csv() which by default interprets strings like 'NA', 'N/A', 'NaN', 'null' as NaN values (keep_default_na=True). This causes legitimate field values — e.g. 'NA' meaning "North America" in a Salesforce picklist — to be silently converted to None/NULL in synced data.

Fix: pass keep_default_na=False so pandas treats all cell values as literal strings (since we already set dtype=object). Empty CSV fields (which pandas would previously auto-tag as NaN) are now explicitly converted to None via .replace({"": None}) to maintain backward-compatible behavior.

- pd.read_csv(data, chunksize=chunk_size, iterator=True, dialect="unix", dtype=object)
+ pd.read_csv(data, chunksize=chunk_size, iterator=True, dialect="unix", dtype=object, keep_default_na=False)

- chunk.replace({nan: None})
+ chunk.replace({nan: None, "": None})

Link to Devin session: https://app.devin.ai/sessions/59e21b90c6c34df7bcae4bddc1947b44

… to None pandas.read_csv() by default treats strings like 'NA', 'N/A', 'NaN' as NaN values (via keep_default_na=True). This caused legitimate string values like 'NA' (North America) in Salesforce fields to be incorrectly converted to None/NULL during bulk API CSV parsing. Adding keep_default_na=False prevents pandas from interpreting these strings as missing values. Empty CSV fields are still correctly converted to None via explicit empty string replacement. Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>

devin-ai-integration · 2026-06-22T19:05:33Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment, CI, and merge conflict monitoring

github-actions · 2026-06-22T19:05:50Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1782155104-fix-na-string-preservation#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1782155104-fix-na-string-preservation

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/prerelease - Triggers a prerelease publish with default arguments
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📚 Show Repo Guidance

Helpful Resources

CDK API Reference

📝 Edit this welcome message.

github-actions · 2026-06-22T19:15:03Z

PyTest Results (Fast)

4 121 tests +2 4 110 ✅ +2 8m 14s ⏱️ -2s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit edc81bf. ± Comparison against base commit 977f051.

♻️ This comment has been updated with latest results.

github-actions · 2026-06-22T19:19:00Z

PyTest Results (Full)

4 124 tests +2 4 112 ✅ +2 12m 0s ⏱️ +24s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit edc81bf. ± Comparison against base commit 977f051.

♻️ This comment has been updated with latest results.

Anatolii Yatsuk (tolik0) · 2026-06-23T10:40:28Z

Reviewed the fix — it is correct and well-targeted. One suggestion on the tests:

The two added tests only exercise "NA", but keep_default_na=True was silently corrupting all 19 of pandas' default NA tokens — N/A, NULL, None, nan, null, #N/A, <NA>, etc. Those other tokens are arguably the under-reported part of this bug (this customer only happened to notice "NA").

I verified the fix against pandas 2.2.3 and it preserves all of them:

Cell value	Before fix	After fix
`NA`, `N/A`, `NULL`, `None`, `nan`	→ `None` (corrupted)	preserved as string ✓
empty unquoted (`,`)	`None`	`None` ✓
empty quoted (`""`)	`None`	`None` ✓
whitespace (`" "`)	`" "`	`" "` ✓

Suggestion: extend test_na_string_values_are_preserved to include a couple of the other tokens (e.g. N/A, NULL, nan) so the test documents the full intent and guards against a regression on the broader set, not just "NA". Cheap insurance, non-blocking.

The {nan: None} entry in the .replace(...) is now effectively dead (with keep_default_na=False + dtype=object pandas never emits NaN), but it is harmless defensive code — fine to leave as-is.

Covers NA, N/A, NULL, None, nan, null, #N/A, <NA> per reviewer feedback. Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>

devin-ai-integration · 2026-06-23T10:41:28Z

Anatolii Yatsuk (@tolik0) ☑️ Resolved in 6e4989f. Extended test_na_string_values_are_preserved to cover 8 of pandas' default NA tokens: NA, N/A, NULL, None, nan, null, #N/A, <NA>.

Agreed on the {nan: None} in .replace() being effectively dead code now — leaving it as harmless defensive code.

Devin session

Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>

Anatolii Yatsuk (tolik0) · 2026-06-23T10:43:12Z

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/28020507330

Copilot

Pull request overview

This PR updates the declarative ResponseToFileExtractor CSV parsing behavior to avoid Pandas’ default conversion of common “NA-like” string tokens (e.g., "NA", "N/A", "null") into NaN/None, which can silently corrupt legitimate source values (e.g., Salesforce picklists).

Changes:

Pass keep_default_na=False to pd.read_csv(...) so NA-like string tokens are preserved as literal strings.
Explicitly map empty-string cells ("" / empty fields) to None to preserve prior behavior for missing values.
Add unit tests covering both preservation of NA-like tokens and conversion of empty fields to None.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py`	Disables Pandas default NA token coercion and normalizes empty strings to `None` during chunk conversion.
`unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py`	Adds regression tests ensuring NA-like strings are preserved and empty fields still become `None`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test: extend NA preservation test to cover all pandas default NA tokens

6e4989f

Covers NA, N/A, NULL, None, nan, null, #N/A, <NA> per reviewer feedback. Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>

style: fix ruff formatting

edc81bf

Co-Authored-By: lucas.leadbetter@airbyte.io <5595530+lleadbet@users.noreply.github.com>

Anatolii Yatsuk (tolik0) marked this pull request as ready for review June 23, 2026 10:42

Copilot AI review requested due to automatic review settings June 23, 2026 10:42

Anatolii Yatsuk (tolik0) self-assigned this Jun 23, 2026

Anatolii Yatsuk (tolik0) requested a review from Daryna Ishchenko (darynaishchenko) June 23, 2026 10:43

Copilot started reviewing on behalf of Anatolii Yatsuk (tolik0) June 23, 2026 10:43 View session

octavia-bot Bot deployed to DockerHub June 23, 2026 10:44 View deployment

octavia-bot Bot deployed to PyPi June 23, 2026 10:44 View deployment

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Daryna Ishchenko (darynaishchenko) approved these changes Jun 23, 2026

View reviewed changes

devin-ai-integration Bot mentioned this pull request Jun 23, 2026

chore(source-salesforce): test CDK pre-release for NA string preservation fix airbytehq/airbyte#80738

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve 'NA' string values in CSV parsing instead of converting to None#1058

fix: preserve 'NA' string values in CSV parsing instead of converting to None#1058
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1782155104-fix-na-string-preservation

devin-ai-integration Bot commented Jun 22, 2026 •

edited by lleadbet

Loading

Uh oh!

devin-ai-integration Bot commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Anatolii Yatsuk (tolik0) commented Jun 23, 2026

Uh oh!

devin-ai-integration Bot commented Jun 23, 2026

Uh oh!

Anatolii Yatsuk (tolik0) commented Jun 23, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

devin-ai-integration Bot commented Jun 22, 2026 • edited by lleadbet Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

devin-ai-integration Bot commented Jun 22, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions Bot commented Jun 22, 2026

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

Anatolii Yatsuk (tolik0) commented Jun 23, 2026

Uh oh!

devin-ai-integration Bot commented Jun 23, 2026

Uh oh!

Anatolii Yatsuk (tolik0) commented Jun 23, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

devin-ai-integration Bot commented Jun 22, 2026 •

edited by lleadbet

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Anatolii Yatsuk (tolik0) commented Jun 23, 2026 •

edited by github-actions Bot

Loading