Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions services/search/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ Additionally, the following optional settings can be set:

* `SEARCH_EXTRACTOR_TIKA_CLEAN_STOP_WORDS=true` (default: `true`): ignore stop words like `I`, `you`, `the` during content extraction.

> **Note:** Enabling Tika does not automatically re-extract content from already indexed files. You need to delete the existing search index and trigger a full re-index. See [Manually Trigger Re-Indexing a Space](#manually-trigger-re-indexing-a-space) for details.

## Manually Trigger Re-Indexing a Space

The service includes a command-line interface to trigger re-indexing a space:
Expand All @@ -118,6 +120,13 @@ It can also be used to re-index all spaces:
opencloud search index --all-spaces
```

> **Note:** The re-index command skips files whose modification time has not changed since they were last indexed. If you changed the extractor type (e.g., from `basic` to `tika`), you need to delete the existing search index first to force a full content re-extraction:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScharfViktor Is that true? I think we need to verify this first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is true. I can reproduce it

>
> ```shell
> rm -rf $OC_BASE_DATA_PATH/search # default: /var/lib/opencloud/search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micbar maybe it is bug? I expect re-index without deleting /search

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aduffeck can you clarify? You know the implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the current behavior, yes. I consider that a bug.

A re-index should unconditionally rebuild the index for the space/all space in my opinion. Maybe it would be helpful to also have a command or flag for just "syncing" the index, i.e. picking up changes that haven't been indexed yet (the current behavior), but that shouldn't be the default behavior of the index command.

> opencloud search index --all-spaces
> ```

## Metrics

The search service exposes the following prometheus metrics at `<debug_endpoint>/metrics` (as configured using the `SEARCH_DEBUG_ADDR` env var):
Expand Down