Skip to content

Fix 1.13 ingestion import regression in patch_mixin#29156

Open
ayush-shah wants to merge 7 commits into
1.13from
codex/fix-113-entity-adapters
Open

Fix 1.13 ingestion import regression in patch_mixin#29156
ayush-shah wants to merge 7 commits into
1.13from
codex/fix-113-entity-adapters

Conversation

@ayush-shah

@ayush-shah ayush-shah commented Jun 18, 2026

Copy link
Copy Markdown
Member

What

Fix the 1.13 Python ingestion CLI startup failure by removing the patch_mixin.py dependency on metadata.sampler.entity_adapters, which does not exist on the 1.13 branch.

Instead of backporting the entity_adapters cleanup from main, this keeps the fix local to patch_mixin.py and handles the only classifiable entities supported on 1.13:

  • Table via .columns
  • Container via .dataModel.columns

It also preserves both existing SDK call shapes for patch_column_tags:

  • patch_column_tags(table=...)
  • patch_column_tags(entity=...)

Why

AUT sample data ingestion fails before the workflow YAML is read because importing metadata.ingestion.ometa.mixins.patch_mixin raises:

ModuleNotFoundError: No module named 'metadata.sampler.entity_adapters'

Root cause: #28837 was backported to 1.13 with an import dependency on an adapter module that exists on main from #27716, but that module was never present on 1.13.

Why this shape

The earlier adapter-module backport was more than this branch needs. On 1.13, ClassifiableEntityType is only Table | Container, so direct type handling in patch_mixin.py is the smallest fix that removes the bad import while preserving column tag patching and the optimistic-lock retry behavior from #28837.

The Table/Container rules are centralized in a local helper instead of a new module. patch_column_tags now uses the fetched server entity as the patch source and applies tag changes only to a deep-copied destination, so caller-owned entities are not mutated as a side effect.

Regression coverage checked

I verified the touched logic with a focused stdlib-only smoke harness that imports the actual patch_mixin.py file with stubbed external dependencies and exercises:

  • Table column tag patching
  • Container.dataModel.columns column tag patching
  • Container without caller-side dataModel, using fetched server state
  • Container without server-side dataModel
  • unsupported entity handling
  • missing entity handling
  • both table= and entity= keyword compatibility
  • no caller input mutation for Table or Container
  • fetched source entity is not mutated; tags are applied only to the copied destination

Validation

  • python3 -m py_compile ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py
  • focused patch_mixin.py smoke harness: patch_mixin smoke ok
  • uvx ruff check ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py
  • git diff --check

Full local package import was blocked because this checkout does not include generated metadata schema modules, and uv run cannot resolve the project extras as currently declared.

Greptile Summary

This PR fixes a ModuleNotFoundError crash on the 1.13 branch caused by a backport of #28837 that imported metadata.sampler.entity_adapters, a module that only exists on main. The fix removes that import and replaces it with direct isinstance checks on Table and Container — the only two ClassifiableEntityType members present on 1.13.

  • Removes the broken from metadata.sampler.entity_adapters import ... and replaces the adapter pattern with a _column_tag_patch_info helper that returns the right field list and column reference for each supported type.
  • Makes patch_column_tags backward-compatible by accepting both table= (new preferred name) and entity= (old name), and changes source from the caller-supplied entity to the freshly-fetched server instance, avoiding stale-state diffs.
  • Updates test_ometa_glossary.py to match the 1.13 GlossaryTerm.relatedTerms schema change from EntityReferenceList to List[TermRelation].

Confidence Score: 5/5

Safe to merge — the change removes a broken import that caused a startup crash and replaces it with well-tested inline logic scoped to the two entity types present on 1.13.

The fix is narrowly scoped: one bad import removed, one small helper added, backward-compatible signature kept. The optimistic-lock retry logic from the original backport is preserved unchanged. New unit tests cover all primary paths, and the integration test update tracks a schema change already present on this branch. No correctness regressions were found.

No files require special attention.

Important Files Changed

Filename Overview
ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py Core fix: removes broken entity_adapters import, replaces adapter pattern with _column_tag_patch_info helper, and makes patch_column_tags backward-compatible; logic is clean and correctly deep-copies the fetched instance before mutating column tags.
ingestion/tests/unit/metadata/ingestion/ometa/test_patch_mixin.py New unit test file covering Table patching, Container patching, legacy entity= alias, no-entity guard, and unsupported-type guard; mutation-isolation assertions present for Table.
ingestion/tests/integration/ometa/test_ometa_glossary.py Updates relatedTerms construction and assertions to match the 1.13 schema change from EntityReferenceList to List[TermRelation]; change is minimal and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["patch_column_tags(table= / entity=)"] --> B{table is None?}
    B -- yes --> Z1[return None + warn]
    B -- no --> C["_column_tag_patch_info(table)"]
    C --> D{isinstance Table?}
    D -- yes --> E["fields=['tags','columns'], columns=table.columns"]
    D -- no --> F{isinstance Container?}
    F -- yes --> G["fields=['tags','dataModel'], columns=table.dataModel.columns"]
    F -- no --> Z2[return None + warn unsupported]
    E & G --> H["_fetch_entity_if_exists(entity_type, table.id, fields)"]
    H --> I{instance found?}
    I -- no --> Z3[return None]
    I -- yes --> J["_prepare_destination_for_column_tags(instance, column_tags, operation)"]
    J --> K["_column_tag_patch_info(instance) → columns ref"]
    K --> L{columns is None?}
    L -- yes --> Z4[return None + warn]
    L -- no --> M["instance.model_copy(deep=True) → destination"]
    M --> N["_column_tag_patch_info(destination) → destination_columns ref"]
    N --> O["update_column_tags(destination_columns, ...) for each tag"]
    O --> P[return destination]
    P --> Q["patch(entity_type, source=instance, destination=destination, if_match=etag)"]
    Q --> R{412 Precondition Failed?}
    R -- yes, retries left --> H
    R -- yes, retries exhausted --> Z5[return None + warn]
    R -- no --> S[return patched_entity]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["patch_column_tags(table= / entity=)"] --> B{table is None?}
    B -- yes --> Z1[return None + warn]
    B -- no --> C["_column_tag_patch_info(table)"]
    C --> D{isinstance Table?}
    D -- yes --> E["fields=['tags','columns'], columns=table.columns"]
    D -- no --> F{isinstance Container?}
    F -- yes --> G["fields=['tags','dataModel'], columns=table.dataModel.columns"]
    F -- no --> Z2[return None + warn unsupported]
    E & G --> H["_fetch_entity_if_exists(entity_type, table.id, fields)"]
    H --> I{instance found?}
    I -- no --> Z3[return None]
    I -- yes --> J["_prepare_destination_for_column_tags(instance, column_tags, operation)"]
    J --> K["_column_tag_patch_info(instance) → columns ref"]
    K --> L{columns is None?}
    L -- yes --> Z4[return None + warn]
    L -- no --> M["instance.model_copy(deep=True) → destination"]
    M --> N["_column_tag_patch_info(destination) → destination_columns ref"]
    N --> O["update_column_tags(destination_columns, ...) for each tag"]
    O --> P[return destination]
    P --> Q["patch(entity_type, source=instance, destination=destination, if_match=etag)"]
    Q --> R{412 Precondition Failed?}
    R -- yes, retries left --> H
    R -- yes, retries exhausted --> Z5[return None + warn]
    R -- no --> S[return patched_entity]
Loading

Reviews (7): Last reviewed commit: "test(ingestion): align glossary related ..." | Re-trigger Greptile

@ayush-shah ayush-shah requested a review from a team as a code owner June 18, 2026 05:29
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jun 18, 2026
Comment thread ingestion/src/metadata/sampler/entity_adapters.py Outdated
@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Comment thread ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py Outdated
Comment thread ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py Outdated
@ayush-shah ayush-shah changed the title Fix ingestion entity adapter import on 1.13 Fix 1.13 ingestion import regression in patch_mixin Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (9 flaky)

✅ 3900 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 80 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 291 0 0 4
🟡 Shard 2 734 0 1 7
🟡 Shard 3 750 0 2 2
🟡 Shard 4 736 0 3 18
✅ Shard 5 688 0 0 41
🟡 Shard 6 701 0 3 8
🟡 9 flaky test(s) (passed on retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Flow/IngestionBot.spec.ts › Ingestion bot should be able to access domain specific domain (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Spreadsheet (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Inactive Announcement create & delete (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Tags.spec.ts › Adds one tag and removes another in the same save preserves appliedBy on the kept tag (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@sonarqubecloud

Copy link
Copy Markdown

@gitar-bot

gitar-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 3 resolved / 3 findings

Removes the missing entity_adapters import in patch_mixin.py to resolve 1.13 ingestion startup failures, replacing it with direct type dispatch for Table and Container entities. This also fixes the input mutation bug and ensures column tag patches are performed against the most recent server state.

✅ 3 resolved
Edge Case: ContainerAdapter.set_columns silently drops columns when dataModel is None

📄 ingestion/src/metadata/sampler/entity_adapters.py:51-53
In ContainerAdapter.set_columns, the assignment is guarded by if entity.dataModel: and otherwise does nothing. In the patch_column_tags flow (_prepare_destination_for_column_tags), the server-fetched instance may have dataModel.columns (so get_columns(instance) returns a non-None list and the early-return is skipped), while the caller-supplied source entity may have dataModel=None. In that case set_columns(entity, columns) is a no-op, destination = entity.model_copy(...) still has dataModel=None, get_columns(destination) returns None, no column tags are applied, and the resulting PATCH is effectively empty — the column tag update is silently lost with no warning.

This is an edge case dependent on the caller passing a Container without a populated dataModel, but unlike the get_columns-returns-None path (which logs a warning and returns None), this path fails silently. Consider either initializing entity.dataModel before assigning columns or logging a warning when dataModel is missing so the skip is observable.

Quality: patch_column_tags mutates the caller's input entity

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:506-516
In _prepare_destination_for_column_tags, the source object passed by the caller is mutated in place: table.columns = instance.columns (line 507) and table.dataModel.columns = instance.dataModel.columns (line 514). This overwrites the caller's table/Container columns with the freshly fetched server state as a side effect of calling patch_column_tags. While this is needed so the patch source matches server state for optimistic-locking diff computation, callers that reuse the same entity object after the call will observe their column list silently replaced (and without the applied tag changes, since those are only applied to the deep-copied destination). Consider operating on a copy of table rather than mutating the argument, e.g. build destination from instance directly and use a copy as the patch source.

Quality: Duplicated Table/Container dispatch and warning logic

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:506-520 📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:551-560
The entity-type dispatch (isinstance Table / Container / else with the identical "Unsupported entity type for column tag patching" warning) is now duplicated in both patch_column_tags (lines 551-560) and _prepare_destination_for_column_tags (lines 506-522). This was previously centralized in adapter_for/EntityAdapter. The duplication means future support for a new entity type must be added in two places and kept in sync. Consider consolidating the type handling into a single helper that returns both the fetch fields and the columns accessor.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs skip-pr-checks Bypass PR metadata validation check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants