Rebuild full search document for tag deletes#38477
Rebuild full search document for tag deletes#38477mgwozdz-unicon wants to merge 4 commits intoopenedx:masterfrom
Conversation
|
Thanks for the pull request, @mgwozdz-unicon! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
53c46c2 to
03d7d0f
Compare
ormsbee
left a comment
There was a problem hiding this comment.
Just a couple of questions on things that I'm confused about.
| Returns a tuple of (doc, should_delete_index_doc). The second value is True when the | ||
| object no longer exists in a form that should remain indexed. | ||
| """ | ||
| if isinstance(key, LibraryCollectionLocator): |
There was a problem hiding this comment.
[Nit (optional)]: The match statement with class matching works well for this kind of code:
match key:
case LibraryCollectionLocator():
...
case LibraryContainerLocator():
...
case _:
raise TypeError(...)| try: | ||
| doc, should_delete_doc = _build_full_content_object_index_doc(key) | ||
| except ItemNotFoundError: | ||
| should_delete_doc = True |
There was a problem hiding this comment.
I'm a bit confused by this code. What raises the ItemNotFoundError, and why does it bubble up to here?
There was a problem hiding this comment.
The exception was coming from modulestore().get_item() in the course-block path. I’m moving that handling into _build_full_content_object_index_doc() so missing objects are normalized there for all key types, and the caller just deletes the stale index doc when should_delete_doc is true.
| _wait_for_meili_tasks(tasks) | ||
|
|
||
|
|
||
| def _replace_index_docs(docs) -> None: |
There was a problem hiding this comment.
Am I right in understanding that this takes a list of docs, but it's only ever called with a single doc in that list in upsert_content_object_tags_index_doc()? Is it intended to be called with a list of docs elsewhere?
There was a problem hiding this comment.
Yes. I'm updating it to take a single doc to match the actual usage.
| if not doc.get(Fields.display_name): | ||
| return None, True |
There was a problem hiding this comment.
Can we mention in a comment what this is checking for? ("searchable_doc_for_container always returns a document, even if the container has been deleted, but in that case some fields will be missing.")
Nit: I also think we should be careful here to only return None, True if Fields.display_name is actually absent, and not just empty "".
The docstring for searchable_doc_for_container seems wrong; it says Field.type will be missing, but looking at the code that doesn't seem to be the case. Out of scope for this PR but feel free to correct if you want.
| Helper function that fully replaces a document in the search index. | ||
|
|
||
| We use this when nested fields need to be removed from indexed documents, because | ||
| Meilisearch partial updates do not reliably clear nested facet values. |
There was a problem hiding this comment.
Nit: This seems like a bug in Meilisearch. Were you able to find a bug report on their GitHub, or some explanation in their documentation as to why this is necessary?
|
@mgwozdz-unicon OK, I played around with your openedx-core PR and the authoring PR but not using this PR and I definitely see the Meilisearch bug you're fixing here. Just setting
However, I found a much simpler fix for this problem that doesn't require reindexing the whole docs or refactoring this code so significantly. I think the following is sufficient: --- a/openedx/core/djangoapps/content/search/documents.py
+++ b/openedx/core/djangoapps/content/search/documents.py
@@ -383,7 +383,10 @@ def searchable_doc_tags(object_id: OpaqueKey) -> dict:
if not all_tags:
# Clear out tags in the index when unselecting all tags for the block, otherwise
# it would remain the last value if a cleared Fields.tags field is not included
- return {Fields.tags: {}}
+ return {Fields.tags: {
+ Fields.tags_taxonomy: [],
+ Fields.tags_level0: [],
+ }}
result = {
Fields.tags_taxonomy: [],
Fields.tags_level0: [],What do you think? (Note: it may be best to add |
05e6f0c to
a55c105
Compare
|
@mgwozdz-unicon: Just a note that openedx-core 1.0.0 has been released. |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
857dd5f to
e8d81d3
Compare

Description
This PR in conjunction with the openedx-core PR (openedx/openedx-core#571) is the backend component for openedx/modular-learning#260. The openedx-core PR (openedx/openedx-core#571) ensures that when a tag and its descendants are deleted that CONTENT_OBJECT_ASSOCIATIONS_CHANGED events take place. This PR ensures that once those events happen, the entire search document is replaced for the relevant content. Without this PR, those events were only leading to a partial update of the document, which was sufficient to update renamed tags but not sufficient to remove deleted tags from the search index.
Goal:
When a Course Author deletes a tag that is associated to Library content, they should be able to return to the Library page and not see those tags in the tag search filter.
Supporting information
Github Issue: openedx/modular-learning#260
Testing instructions
Note: This cannot be tested without the code from the following PRs in place:
Deadline
This is for our Verawood code cut stretch goal.
Other information
While this code requires other PRs in place to be able to test, I believe it can be safely merged without them.