-
Notifications
You must be signed in to change notification settings - Fork 30
feat(file-based): option to use only first found file for discover #841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@daryna/file-based/update-discover#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch daryna/file-based/update-discoverHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughAdded boolean Changes
Sequence Diagram(s)sequenceDiagram
participant Config as FileBasedStreamConfig
participant Stream as DefaultFileBasedStream
participant FS as FileFetcher
participant Infer as SchemaInferencer
rect rgb(245,250,255)
Note over Config,Stream: Schema discovery (flag-aware)
end
Stream->>Config: read discovery flags
alt use_first_found_file_for_schema_discovery == true
Stream->>FS: get_files() (iterator)
FS-->>Stream: files iterator
Stream->>Stream: take first file (islice(...,1))
Stream->>Infer: infer_schema(first_file)
Infer-->>Stream: schema
else
Stream->>FS: get_files() (iterator)
FS-->>Stream: files iterator
Stream->>Infer: infer_schema(files up to recent_n_files...)
Infer-->>Stream: schema
end
Stream->>Stream: consolidate & return schema
sequenceDiagram
participant Source as FileBasedSource
participant Streams as StreamFactory
participant Validator as FileBasedStreamConfig (root_validator)
rect rgb(250,245,245)
Note over Source,Validator: Validation and connection check changes
end
Source->>Streams: construct streams
Streams->>Validator: validate stream configs (root_validator)
alt validation error
Validator-->>Source: raise ConfigValidationError (ERROR_VALIDATION_STREAM_DISCOVERY_OPTIONS)
Source->>Source: check_connection catches ConfigValidationError and returns formatted error
else other error
Streams-->>Source: other exception
Source->>Source: wrap as config error and return generic failure
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Want a short changelog entry or a one-line config-doc snippet for this flag, wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py(1 hunks)airbyte_cdk/sources/file_based/stream/default_file_based_stream.py(1 hunks)unit_tests/sources/file_based/stream/test_default_file_based_stream.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
🧬 Code graph analysis (2)
unit_tests/sources/file_based/stream/test_default_file_based_stream.py (5)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (3)
config(46-47)config(51-61)get_matching_files(79-99)airbyte_cdk/sources/file_based/discovery_policy/abstract_discovery_policy.py (2)
get_max_n_files_for_schema_inference(21-21)n_concurrent_requests(18-18)airbyte_cdk/sources/file_based/discovery_policy/default_discovery_policy.py (2)
get_max_n_files_for_schema_inference(24-33)n_concurrent_requests(21-22)airbyte_cdk/sources/file_based/stream/default_file_based_stream.py (2)
infer_schema(338-342)get_json_schema(246-269)airbyte_cdk/sources/file_based/remote_file.py (1)
RemoteFile(11-18)
airbyte_cdk/sources/file_based/stream/default_file_based_stream.py (3)
airbyte_cdk/sources/streams/core.py (1)
logger(128-129)airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py (1)
get_files(92-96)airbyte_cdk/sources/file_based/stream/concurrent/adapters.py (1)
get_files(165-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (python)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
Outdated
Show resolved
Hide resolved
PyTest Results (Fast)3 813 tests +2 3 801 ✅ +1 6m 41s ⏱️ -31s Results for commit 31b6be0. ± Comparison against base commit f0443aa. This pull request removes 6 and adds 8 tests. Note that renamed tests count towards both.This pull request skips 1 test.♻️ This comment has been updated with latest results. |
PyTest Results (Full)3 816 tests +2 3 804 ✅ +2 10m 52s ⏱️ -12s Results for commit 31b6be0. ± Comparison against base commit f0443aa. This pull request removes 6 and adds 8 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
maxi297
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments just to make sure we cover everything here
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/file_based/scenarios/csv_scenarios.py (2)
488-493: Consider enhancing the description for clarity.The description "Helps to avoid long discovery step" could be more explicit about the exact behavior. What do you think about making it clearer that schema inference will be limited to exactly the first file found? For example:
- "description": "When enabled, the source will use the first found file for schema discovery. Helps to avoid long discovery step", + "description": "When enabled, the source will use only the first found file for schema discovery, rather than sampling multiple files. This can significantly speed up the discovery step when working with many files.",This would help users understand both what happens (single file only) and why it matters (performance optimization), wdyt?
488-493: Consider adding a CSV scenario that exercises this flag.While the new field is correctly added to the expected spec, I noticed that none of the CSV scenarios in this file actually set
use_first_found_file_for_schema_discovery: Truein their configuration.Would it be valuable to add a CSV-specific test scenario that demonstrates this flag in action? For example, a scenario similar to
multi_csv_scenariobut with this flag enabled, expecting the schema to only include columns from the first file. This would:
- Provide clear documentation of the feature's behavior with CSV files
- Ensure the flag works correctly with CSV-specific configurations
- Complement the unit test in
test_default_file_based_stream.pymentioned in the AI summaryWdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/file_based/scenarios/csv_scenarios.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: destination-motherduck
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
|
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (2)
77-81: Minor grammar improvement in the description.The description on line 79 could be slightly improved. Consider:
- "Helps to avoid a long discovery step" or
- "Helps to avoid long discovery steps"
This was mentioned in a previous review comment, so just wanted to follow up. Wdyt?
Apply this diff:
use_first_found_file_for_schema_discovery: bool = Field( title="Use First Found File For Schema Discover", - description="When enabled, the source will use the first found file for schema discovery. Helps to avoid long discovery step", + description="When enabled, the source will use the first found file for schema discovery. Helps to avoid a long discovery step.", default=False, )
111-117: Consider splitting the validation condition for clarity.The boolean logic on lines 111-114 is correct but somewhat complex. Would you consider splitting it into two separate conditions for readability? For example:
if recent_n_files_to_read_for_schema_discovery and use_first_found_file_for_schema_discovery: raise ConfigValidationError( FileBasedSourceError.ERROR_VALIDATION_STREAM_DISCOVERY_OPTIONS ) if [schemaless, input_schema, use_first_found_file_for_schema_discovery].count(True) > 1: raise ConfigValidationError( FileBasedSourceError.ERROR_VALIDATION_STREAM_DISCOVERY_OPTIONS )This would make the two distinct validation rules more explicit. But the current implementation is functionally correct, so this is just a readability suggestion. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py(3 hunks)airbyte_cdk/sources/file_based/exceptions.py(1 hunks)airbyte_cdk/sources/file_based/file_based_source.py(1 hunks)unit_tests/sources/file_based/scenarios/csv_scenarios.py(3 hunks)unit_tests/sources/file_based/test_file_based_scenarios.py(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Applied to files:
airbyte_cdk/sources/file_based/file_based_source.pyairbyte_cdk/sources/file_based/config/file_based_stream_config.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
🧬 Code graph analysis (3)
airbyte_cdk/sources/file_based/file_based_source.py (3)
airbyte_cdk/sources/file_based/exceptions.py (2)
ConfigValidationError(75-76)FileBasedSourceError(12-39)airbyte_cdk/utils/traced_exception.py (1)
AirbyteTracedException(25-145)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
FailureType(546-549)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (1)
airbyte_cdk/sources/file_based/exceptions.py (2)
ConfigValidationError(75-76)FileBasedSourceError(12-39)
unit_tests/sources/file_based/scenarios/csv_scenarios.py (3)
unit_tests/sources/file_based/scenarios/scenario_builder.py (7)
set_expected_check_error(188-192)set_expected_discover_error(194-198)set_expected_read_error(200-204)build(35-41)build(225-258)TestScenario(44-115)TestScenarioBuilder(118-273)airbyte_cdk/sources/file_based/exceptions.py (2)
FileBasedSourceError(12-39)ConfigValidationError(75-76)unit_tests/sources/file_based/in_memory_files_source.py (1)
InMemoryFilesSource(50-99)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
[error] 93-93: mypy: Function is missing a type annotation [no-untyped-def]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (8)
airbyte_cdk/sources/file_based/exceptions.py (1)
26-26: LGTM!The error message clearly describes the mutual exclusivity constraint for schema discovery options. It aligns well with the validation logic being added in the root validator.
unit_tests/sources/file_based/test_file_based_scenarios.py (2)
84-87: LGTM!The new scenario imports follow the existing naming conventions and clearly indicate they test validation failures for conflicting discovery options.
212-213: LGTM!The new scenarios are correctly wired into the appropriate test lists and will provide coverage for the validation error paths when conflicting discovery options are used.
Also applies to: 319-320
airbyte_cdk/sources/file_based/file_based_source.py (1)
159-172: LGTM!The refined exception handling correctly distinguishes configuration validation errors from other errors, preserving the specific error message from the root validator. The generic exception handler provides a safety net while maintaining consistent error reporting.
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (1)
8-8: LGTM!The
root_validatorimport is correctly added frompydantic.v1, consistent with the existing imports.unit_tests/sources/file_based/scenarios/csv_scenarios.py (3)
488-493: LGTM!The expected spec correctly includes the new
use_first_found_file_for_schema_discoveryfield with appropriate type, default, title, and description. The placement afterrecent_n_files_to_read_for_schema_discoveryis logical.
2123-2131: LGTM!The updated error expectations correctly reflect that the
schemaless+input_schemaconflict is now caught by the root validator and returnsERROR_VALIDATION_STREAM_DISCOVERY_OPTIONS. Both single-stream and multi-stream scenarios are consistently updated.Also applies to: 2228-2236
2239-2452: LGTM!The new test scenarios comprehensively cover the validation error paths:
- Conflict between
recent_n_files_to_read_for_schema_discoveryanduse_first_found_file_for_schema_discovery- Conflict between
schemalessanduse_first_found_file_for_schema_discoveryBoth scenarios appropriately test multi-stream configurations and expect the correct
ERROR_VALIDATION_STREAM_DISCOVERY_OPTIONSerror across check, discover, and read operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
unit_tests/sources/file_based/scenarios/csv_scenarios.py (2)
2239-2344: Verify this scenario is wired into the test runner.Past review comments flag this scenario variable as unused. Please ensure it's properly exported and included in the test execution, either in
unit_tests/sources/file_based/test_file_based_scenarios.pyor another appropriate test file, wdyt?
2347-2452: Verify this scenario is wired into the test runner.Similar to the scenario at line 2239, past review comments indicate this scenario variable is unused. Please ensure it's properly exported and included in test execution, wdyt?
🧹 Nitpick comments (3)
unit_tests/sources/file_based/scenarios/csv_scenarios.py (1)
488-493: Consider polishing the description for clarity, wdyt?The description "Helps to avoid long discovery step" is missing an article. Would "Helps to avoid a long discovery step" or "Helps to avoid long discovery steps" read more naturally?
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (2)
77-81: Minor typo and grammar polish needed, wdyt?Two small issues:
- The title has "Use First Found File For Schema Discover" - should be "Schema Discovery" (missing the 'y')
- The description "Helps to avoid long discovery step" is missing an article - perhaps "Helps to avoid a long discovery step" would be clearer?
Apply this diff:
use_first_found_file_for_schema_discovery: bool = Field( - title="Use First Found File For Schema Discover", + title="Use First Found File For Schema Discovery", - description="When enabled, the source will use the first found file for schema discovery. Helps to avoid long discovery step", + description="When enabled, the source will use the first found file for schema discovery. Helps to avoid a long discovery step.", default=False, )
96-96: Typo in the comment, wdyt?Line 96 has "compatability" which should be "compatibility".
Apply this diff:
- Validates schema discovery options compatability. + Validates schema discovery options compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py(3 hunks)unit_tests/sources/file_based/scenarios/csv_scenarios.py(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
🧬 Code graph analysis (2)
unit_tests/sources/file_based/scenarios/csv_scenarios.py (3)
unit_tests/sources/file_based/scenarios/scenario_builder.py (7)
set_expected_check_error(188-192)set_expected_discover_error(194-198)set_expected_read_error(200-204)build(35-41)build(225-258)TestScenario(44-115)TestScenarioBuilder(118-273)airbyte_cdk/sources/file_based/exceptions.py (2)
FileBasedSourceError(12-39)ConfigValidationError(75-76)unit_tests/sources/file_based/in_memory_files_source.py (1)
InMemoryFilesSource(50-99)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (1)
airbyte_cdk/sources/file_based/exceptions.py (2)
ConfigValidationError(75-76)FileBasedSourceError(12-39)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
[error] 93-93: mypy: Missing type parameters for generic type "dict" [type-arg]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
unit_tests/sources/file_based/scenarios/csv_scenarios.py (1)
2123-2131: LGTM!The error type change from
CONFIG_VALIDATION_ERRORtoERROR_VALIDATION_STREAM_DISCOVERY_OPTIONScorrectly reflects the more specific validation for mutually exclusive discovery options.airbyte_cdk/sources/file_based/config/file_based_stream_config.py (2)
8-8: LGTM!The
root_validatorimport is correctly added and necessary for the new cross-field validation logic.
102-119: LGTM on the validation logic!The mutual exclusivity validation correctly enforces that the new
use_first_found_file_for_schema_discoverycannot be combined with other discovery options, while maintaining backward compatibility for existingrecent_n_files_to_read_for_schema_discoverycombinations. The comment clearly explains the rationale for this approach.
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
Outdated
Show resolved
Hide resolved
|
/autofix
|
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
Dismissed
Show dismissed
Hide dismissed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/file_based/config/file_based_stream_config.py
🧬 Code graph analysis (1)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (1)
airbyte_cdk/sources/file_based/exceptions.py (2)
ConfigValidationError(75-76)FileBasedSourceError(12-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/file_based/config/file_based_stream_config.py (1)
6-6: LGTM!The new imports (
Dictfrom typing androot_validatorfrom pydantic) are necessary for the validation logic below and are correctly added.Also applies to: 8-8
maxi297
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
https://github.com/airbytehq/oncall/issues/9301
Summary by CodeRabbit
New Features
Bug Fixes / Validation
Tests