Skip to content

fix(file-based): populate failure_type, internal_message, and exception on CSV encoding error#1053

Draft
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1781539607-fix-csv-unicode-error-handling
Draft

fix(file-based): populate failure_type, internal_message, and exception on CSV encoding error#1053
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1781539607-fix-csv-unicode-error-handling

Conversation

@devin-ai-integration

Copy link
Copy Markdown
Contributor

Summary

The UnicodeError handler in csv_parser.py:read_data() raised AirbyteTracedException without failure_type, internal_message, or exception parameters. This caused:

  1. Sentry titles the event AirbyteTracedException: None (because internal_message was None)
  2. The error is classified as system_error instead of config_error
  3. The original UnicodeError stack trace is lost
- except UnicodeError:
+ except UnicodeError as e:
      raise AirbyteTracedException(
-         message=f"... Expected encoding: {config_format.encoding}",
+         message=f"File contains bytes that cannot be decoded with the configured {config_format.encoding} encoding.",
+         internal_message=str(e),
+         failure_type=FailureType.config_error,
+         exception=e,
      )

This follows the same pattern already used elsewhere in the file (lines 178, 186, 212).

Resolves https://github.com/airbytehq/oncall/issues/12517:

Breaking Change Evaluation

Not a breaking change. This is a bug fix that improves error classification and messaging — no schema, spec, stream, or state changes.

Test Coverage

Updated CsvReaderTest.test_read_data_with_encoding_error to verify all new fields: failure_type == FailureType.config_error, internal_message matches the original error string, and _exception preserves the original UnicodeError instance.

Link to Devin session: https://app.devin.ai/sessions/becb680c363a4cecb6664b5fc80b9739

…on on CSV encoding error

The UnicodeError handler in csv_parser.py raised AirbyteTracedException
without failure_type, internal_message, or exception parameters.  This
caused Sentry to title the event 'AirbyteTracedException: None' and
classified the error as system_error instead of config_error.

- Capture the UnicodeError as 'e'
- Set failure_type=FailureType.config_error
- Set internal_message=str(e) for Sentry grouping
- Set exception=e to preserve the original stack trace
- Improve user-facing message per error message guidelines

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@github-actions

Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1781539607-fix-csv-unicode-error-handling#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1781539607-fix-csv-unicode-error-handling

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions

Copy link
Copy Markdown

PyTest Results (Fast)

4 118 tests  ±0   4 107 ✅ ±0   8m 18s ⏱️ -3s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit e1a2043. ± Comparison against base commit 626c753.

@github-actions

Copy link
Copy Markdown

PyTest Results (Full)

4 121 tests  ±0   4 109 ✅ ±0   11m 32s ⏱️ -35s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit e1a2043. ± Comparison against base commit 626c753.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants