-
Notifications
You must be signed in to change notification settings - Fork 163
docs(search): add note about re-indexing when enabling Tika #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -104,6 +104,8 @@ Additionally, the following optional settings can be set: | |
|
|
||
| * `SEARCH_EXTRACTOR_TIKA_CLEAN_STOP_WORDS=true` (default: `true`): ignore stop words like `I`, `you`, `the` during content extraction. | ||
|
|
||
| > **Note:** Enabling Tika does not automatically re-extract content from already indexed files. You need to delete the existing search index and trigger a full re-index. See [Manually Trigger Re-Indexing a Space](#manually-trigger-re-indexing-a-space) for details. | ||
|
|
||
| ## Manually Trigger Re-Indexing a Space | ||
|
|
||
| The service includes a command-line interface to trigger re-indexing a space: | ||
|
|
@@ -118,6 +120,13 @@ It can also be used to re-index all spaces: | |
| opencloud search index --all-spaces | ||
| ``` | ||
|
|
||
| > **Note:** The re-index command skips files whose modification time has not changed since they were last indexed. If you changed the extractor type (e.g., from `basic` to `tika`), you need to delete the existing search index first to force a full content re-extraction: | ||
| > | ||
| > ```shell | ||
| > rm -rf $OC_BASE_DATA_PATH/search # default: /var/lib/opencloud/search | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @micbar maybe it is bug? I expect re-index without deleting
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aduffeck can you clarify? You know the implementation
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's the current behavior, yes. I consider that a bug. A re-index should unconditionally rebuild the index for the space/all space in my opinion. Maybe it would be helpful to also have a command or flag for just "syncing" the index, i.e. picking up changes that haven't been indexed yet (the current behavior), but that shouldn't be the default behavior of the index command. |
||
| > opencloud search index --all-spaces | ||
| > ``` | ||
|
|
||
| ## Metrics | ||
|
|
||
| The search service exposes the following prometheus metrics at `<debug_endpoint>/metrics` (as configured using the `SEARCH_DEBUG_ADDR` env var): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScharfViktor Is that true? I think we need to verify this first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is true. I can reproduce it