Skip to content

Rebuild full search document for tag deletes#38477

Open
mgwozdz-unicon wants to merge 4 commits intoopenedx:masterfrom
mgwozdz-unicon:tag-delete-updates-meilisearch
Open

Rebuild full search document for tag deletes#38477
mgwozdz-unicon wants to merge 4 commits intoopenedx:masterfrom
mgwozdz-unicon:tag-delete-updates-meilisearch

Conversation

@mgwozdz-unicon
Copy link
Copy Markdown

Description

This PR in conjunction with the openedx-core PR (openedx/openedx-core#571) is the backend component for openedx/modular-learning#260. The openedx-core PR (openedx/openedx-core#571) ensures that when a tag and its descendants are deleted that CONTENT_OBJECT_ASSOCIATIONS_CHANGED events take place. This PR ensures that once those events happen, the entire search document is replaced for the relevant content. Without this PR, those events were only leading to a partial update of the document, which was sufficient to update renamed tags but not sufficient to remove deleted tags from the search index.

Goal:
When a Course Author deletes a tag that is associated to Library content, they should be able to return to the Library page and not see those tags in the tag search filter.

Supporting information

Github Issue: openedx/modular-learning#260

Testing instructions

Note: This cannot be tested without the code from the following PRs in place:

  1. Go to Taxonomies and select a taxonomy
  2. Add new tags to this taxonomy that you plan to delete. (I have been adding a subtag (e.g. Wind1) with 2 sibbling children (W1.1 and W1.2) for my testing.)
  3. Go to a Library and create 2 new pieces of content.
  4. For one piece of content, associate the parent subtag (e.g. Wind1) and click Save.
  5. For the other piece of content, associate a child (e.g. W1.2) and click Save.
  6. Click on the tag search filter button and note that Wind1 and W1.2 appear in the tag search filter.
  7. Return to the taxonomy and delete the subtag Wind1.
  8. It should disappear from the taxonomy and a toast should display indicating what was deleted.
  9. Go back to the Library and refresh the page in the browser.
  10. The pieces of content you previously associated those tags to should now display that 0 tags are associated to them.
  11. Click on the tag search filter button and note that Wind1 and W1.2 no longer appear in the tag search filter.

Deadline

This is for our Verawood code cut stretch goal.

Other information

While this code requires other PRs in place to be able to test, I believe it can be safely merged without them.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Apr 29, 2026
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @mgwozdz-unicon!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@github-project-automation github-project-automation Bot moved this to Needs Triage in Contributions Apr 29, 2026
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Waiting on Author in Contributions Apr 29, 2026
@mgwozdz-unicon mgwozdz-unicon force-pushed the tag-delete-updates-meilisearch branch from 53c46c2 to 03d7d0f Compare April 29, 2026 17:19
Copy link
Copy Markdown
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions on things that I'm confused about.

Returns a tuple of (doc, should_delete_index_doc). The second value is True when the
object no longer exists in a form that should remain indexed.
"""
if isinstance(key, LibraryCollectionLocator):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit (optional)]: The match statement with class matching works well for this kind of code:

match key:
    case LibraryCollectionLocator():
        ...
    case LibraryContainerLocator():
        ...
    case _:
        raise TypeError(...)

try:
doc, should_delete_doc = _build_full_content_object_index_doc(key)
except ItemNotFoundError:
should_delete_doc = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this code. What raises the ItemNotFoundError, and why does it bubble up to here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception was coming from modulestore().get_item() in the course-block path. I’m moving that handling into _build_full_content_object_index_doc() so missing objects are normalized there for all key types, and the caller just deletes the stale index doc when should_delete_doc is true.

_wait_for_meili_tasks(tasks)


def _replace_index_docs(docs) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in understanding that this takes a list of docs, but it's only ever called with a single doc in that list in upsert_content_object_tags_index_doc()? Is it intended to be called with a list of docs elsewhere?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm updating it to take a single doc to match the actual usage.

Comment on lines +1075 to +1076
if not doc.get(Fields.display_name):
return None, True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention in a comment what this is checking for? ("searchable_doc_for_container always returns a document, even if the container has been deleted, but in that case some fields will be missing.")

Nit: I also think we should be careful here to only return None, True if Fields.display_name is actually absent, and not just empty "".

The docstring for searchable_doc_for_container seems wrong; it says Field.type will be missing, but looking at the code that doesn't seem to be the case. Out of scope for this PR but feel free to correct if you want.

Comment thread openedx/core/djangoapps/content/search/api.py Outdated
Helper function that fully replaces a document in the search index.

We use this when nested fields need to be removed from indexed documents, because
Meilisearch partial updates do not reliably clear nested facet values.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This seems like a bug in Meilisearch. Were you able to find a bug report on their GitHub, or some explanation in their documentation as to why this is necessary?

@bradenmacdonald
Copy link
Copy Markdown
Contributor

bradenmacdonald commented Apr 29, 2026

@mgwozdz-unicon OK, I played around with your openedx-core PR and the authoring PR but not using this PR and I definitely see the Meilisearch bug you're fixing here.

Just setting Field.tags: {} does not seem to clear the values of the sub-tags properly in the Meilisearch document, even though it shows the tags field being totally empty in the Meilisearch UI:

Screenshot 2026-04-29 at 4 16 12 PM

However, I found a much simpler fix for this problem that doesn't require reindexing the whole docs or refactoring this code so significantly. I think the following is sufficient:

--- a/openedx/core/djangoapps/content/search/documents.py
+++ b/openedx/core/djangoapps/content/search/documents.py
@@ -383,7 +383,10 @@ def searchable_doc_tags(object_id: OpaqueKey) -> dict:
     if not all_tags:
         # Clear out tags in the index when unselecting all tags for the block, otherwise
         # it would remain the last value if a cleared Fields.tags field is not included
-        return {Fields.tags: {}}
+        return {Fields.tags: {
+            Fields.tags_taxonomy: [],
+            Fields.tags_level0: [],
+        }}
     result = {
         Fields.tags_taxonomy: [],
         Fields.tags_level0: [],

What do you think?

(Note: it may be best to add level1, level2, etc. as well. Also the code could be re-organized to avoid duplication with the result = ... below, since we're now essentially returning result in either case.)

@mgwozdz-unicon mgwozdz-unicon force-pushed the tag-delete-updates-meilisearch branch 2 times, most recently from 05e6f0c to a55c105 Compare April 30, 2026 00:36
@ormsbee
Copy link
Copy Markdown
Contributor

ormsbee commented Apr 30, 2026

@mgwozdz-unicon: Just a note that openedx-core 1.0.0 has been released.

@mgwozdz-unicon mgwozdz-unicon force-pushed the tag-delete-updates-meilisearch branch from 857dd5f to e8d81d3 Compare April 30, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Status: Waiting on Author

Development

Successfully merging this pull request may close these issues.

5 participants