fix(file-based): support multi-sheet Excel workbooks via sheet_name config by devin-ai-integration[bot] · Pull Request #1051 · airbytehq/airbyte-python-cdk

devin-ai-integration · 2026-06-12T11:47:43Z

Summary

The file-based CDK Excel parser previously relied on the pandas default sheet_name=0, so only the first worksheet was ever read from multi-sheet workbooks. This caused silent data loss for connectors like source-microsoft-sharepoint when users had Excel files with multiple sheets.

This PR adds an optional sheet_name field to ExcelFormat:

class ExcelFormat(BaseModel):
    sheet_name: str = Field(default="0", ...)

Accepted values: "0" (default, first sheet — preserves existing behavior), a sheet name string, a zero-indexed position, or "*" to read all sheets.

The new _parse_excel_file → _resolve_sheet_name path converts the string config into a pandas-compatible sheet_name argument (int, str, or None). When sheet_name=None (all sheets), pandas returns a dict[str, DataFrame] instead of a single DataFrame; the parser iterates over all returned DataFrames for both record emission and schema inference (merging columns across sheets).

Both Calamine-first and OpenPyXL fallback paths now accept and forward the sheet_name parameter.

Resolves https://github.com/airbytehq/oncall/issues/12598:

https://github.com/airbytehq/oncall/issues/12598

Supersedes #1024 (stale draft from a prior session).

Declarative-First Evaluation

The affected connectors (source-microsoft-sharepoint, etc.) are Python file-based CDK sources, not manifest-only connectors. The fix operates at the Excel parsing layer before any records exist, so declarative features (RecordFilter, AddFields, RemoveFields, paginators, routers, error handlers, $ref overrides) cannot address this.

Breaking Change Evaluation

Not a breaking change. The new sheet_name field is optional and defaults to "0", preserving the previous first-sheet-only behavior. No streams, fields, primary keys, cursors, existing config fields, or state formats are removed or changed.

Test Coverage

New tests added to test_excel_parser.py:

test_parse_records_selects_configured_sheet — parametrized for default first sheet, named sheet, and all sheets
test_infer_schema_with_sheet_selection — parametrized for single-sheet and merged multi-sheet schema inference
test_resolve_sheet_name — parametrized unit test for the config → pandas argument conversion

All tests use real in-memory multi-sheet Excel workbooks (via xlsxwriter) instead of mocks.

Link to Devin session: https://app.devin.ai/sessions/6a77303e47e040da803aeee819febb5b

…onfig Add an optional sheet_name field to ExcelFormat config that controls which worksheet(s) the parser reads: - '0' (default): first sheet only (preserves existing behavior) - '<name>': a specific sheet by name - '*': all sheets in the workbook Both Calamine and OpenPyXL fallback paths now pass sheet_name through to pandas. Schema inference merges columns across selected sheets. Co-Authored-By: bot_apk <apk@cognition.ai>

devin-ai-integration · 2026-06-12T11:47:44Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment, CI, and merge conflict monitoring

github-actions · 2026-06-12T11:47:54Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1781264638-fix-excel-multisheet#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1781264638-fix-excel-multisheet

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/prerelease - Triggers a prerelease publish with default arguments
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📚 Show Repo Guidance

Helpful Resources

CDK API Reference

📝 Edit this welcome message.

github-actions · 2026-06-12T11:55:56Z

PyTest Results (Fast)

4 118 tests +9 4 106 ✅ +9 7m 27s ⏱️ +8s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 57ca4e1. ± Comparison against base commit fd95ecf.

♻️ This comment has been updated with latest results.

Co-Authored-By: bot_apk <apk@cognition.ai>

github-actions · 2026-06-12T12:10:43Z

PyTest Results (Full)

4 121 tests +9 4 109 ✅ +10 12m 8s ⏱️ + 2m 43s
1 suites ±0 12 💤 ± 0
1 files ±0 0 ❌ - 1

Results for commit 57ca4e1. ± Comparison against base commit fd95ecf.

devin-ai-integration Bot added the hyd-fix label Jun 12, 2026

fix: update expected spec snapshot with sheet_name field

57ca4e1

Co-Authored-By: bot_apk <apk@cognition.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(file-based): support multi-sheet Excel workbooks via sheet_name config#1051

fix(file-based): support multi-sheet Excel workbooks via sheet_name config#1051
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1781264638-fix-excel-multisheet

devin-ai-integration Bot commented Jun 12, 2026

Uh oh!

devin-ai-integration Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

devin-ai-integration Bot commented Jun 12, 2026

Summary

Declarative-First Evaluation

Breaking Change Evaluation

Test Coverage

Uh oh!

devin-ai-integration Bot commented Jun 12, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions Bot commented Jun 12, 2026

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions Bot commented Jun 12, 2026

PyTest Results (Full)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

github-actions Bot commented Jun 12, 2026 •

edited

Loading