Skip to content

fix(file-based): classify CSV encoding errors as config_error to prevent unnecessary retries#1054

Draft
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1781541430-fix-csv-encoding-error-failure-type
Draft

fix(file-based): classify CSV encoding errors as config_error to prevent unnecessary retries#1054
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1781541430-fix-csv-encoding-error-failure-type

Conversation

@devin-ai-integration

Copy link
Copy Markdown
Contributor

Summary

The AirbyteTracedException raised for UnicodeError in csv_parser.py:76-79 was missing failure_type, defaulting to FailureType.system_error (retriable). This caused up to 5 unnecessary retry attempts for encoding mismatch errors — a deterministic config problem that will never succeed without a user config change.

- except UnicodeError:
-     raise AirbyteTracedException(
-         message=f"... Expected encoding: {config_format.encoding}",
-     )
+ except UnicodeError as e:
+     raise AirbyteTracedException(
+         message=f"... Expected encoding: {config_format.encoding}.",
+         internal_message=f"UnicodeError reading file {file.uri} with encoding {config_format.encoding}: {e}",
+         failure_type=FailureType.config_error,
+         exception=e,
+     ) from e

This matches the pattern already used by adjacent validators in the same file (e.g., _validate_trailing_headers at lines 178-189).

Breaking Change Evaluation

Not a breaking change. This only changes the failure_type classification of an existing error from system_error to config_error, reducing wasted retries.

Declarative-First Evaluation

N/A — this is a CDK-level Python fix, not a connector manifest change.

Test Coverage

Updated the existing test_read_data_with_encoding_error test to verify:

  • failure_type == FailureType.config_error
  • internal_message contains file URI and encoding
  • Exception chaining via __cause__

Resolves https://github.com/airbytehq/oncall/issues/11649:

Link to Devin session: https://app.devin.ai/sessions/349ff681df434e669c27803cade26a6e

…ent unnecessary retries

The AirbyteTracedException raised for UnicodeError in csv_parser.py
was missing failure_type, defaulting to system_error (retriable).
This caused up to 5 unnecessary retry attempts for encoding mismatch
errors that will never succeed without a config change.

- Add failure_type=FailureType.config_error
- Add internal_message with file URI, encoding, and original error
- Pass exception parameter and chain with 'from e'
- Update test to verify all new exception attributes

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@github-actions

Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1781541430-fix-csv-encoding-error-failure-type#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1781541430-fix-csv-encoding-error-failure-type

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions

Copy link
Copy Markdown

PyTest Results (Fast)

4 118 tests  ±0   4 107 ✅ ±0   8m 22s ⏱️ -1s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit d1d0a09. ± Comparison against base commit 626c753.

@github-actions

Copy link
Copy Markdown

PyTest Results (Full)

4 121 tests  ±0   4 109 ✅ ±0   9m 28s ⏱️ - 2m 39s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit d1d0a09. ± Comparison against base commit 626c753.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants