perf: bound trigram search index size with LEFT() truncation#2897
Open
dkindlund wants to merge 1 commit intoteableio:developfrom
Open
perf: bound trigram search index size with LEFT() truncation#2897dkindlund wants to merge 1 commit intoteableio:developfrom
dkindlund wants to merge 1 commit intoteableio:developfrom
Conversation
Teable auto-creates GIN trigram indexes (idx_trgm_*) on every field for search. On large tables with many fields, this causes massive index bloat (e.g., 3.6 GB of indexes on 117 MB of data) and severe write amplification on every INSERT/UPDATE. This commit wraps index expressions with LEFT(expression, N) to bound index size to the first N characters per field value. N is configurable via SEARCH_INDEX_TRUNCATE_LENGTH env var (default: 1000). Setting it to 0 disables truncation (preserving current behavior). For short fields (< N chars), LEFT() is a no-op. For large JSON/HTML fields, it dramatically reduces index size while preserving search functionality (PostgreSQL uses the truncated index for candidate selection, then applies the full-column WHERE clause for filtering). Existing indexes are automatically rebuilt on the next index reconciliation cycle when getAbnormalIndex() detects the definition mismatch. If the env var changes between reboots, the same mechanism triggers a rebuild with the new threshold. Production data that motivated this change: - Articles table: 117 MB data, 5.8 GB total (3.6 GB indexes) - 70 trigram indexes, 13 largest with zero search scans (1.9 GB) - html_content index: 731 MB, 0 scans - Formula field backfill (34K rows): 90+ min, 3 container crashes Follow-on enhancement: per-field configurable truncate length via field metadata, rather than a single global threshold. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wraps GIN trigram index expressions with
LEFT(expression, N)to bound index size. This addresses severe index bloat and write amplification on tables with many fields.SEARCH_INDEX_TRUNCATE_LENGTHenv var (default: 1000, set to 0 to disable)FieldFormatter.getIndexExpression()withLEFT((expr)::text, N)getAbnormalIndex()detection on next reconciliation cyclegetSearchableExpression())Motivation
Production data from a 34K-row table with 70+ fields:
html_contentat 731 MB (0 search scans)This caused:
With
LEFT(expr, 1000), thehtml_contentindex shrinks from ~731 MB to ~20-30 MB (only indexing the first 1K chars of each 20KB+ HTML blob). Short fields liketitleandstatusare unaffected since their values are already under 1000 characters.Changes
threshold.config.tssearchIndexTruncateLengthconfig from env varsearch-index-builder.postgres.tstruncateLengthto constructor andgetIndexExpression()db.provider.interface.tssearchIndex()signaturepostgres.provider.tstruncateLengthtoIndexBuilderPostgresconstructorsqlite.provider.tstable-index.service.tsgetSearchIndexBuilder()helper, pass config to all call sitessearch-index-builder.postgres.spec.tsHow It Works
Index creation:
Search queries (unchanged):
PostgreSQL handles this correctly: the truncated index is used for candidate row selection, then the full-column WHERE clause filters to exact matches.
Deployment
SEARCH_INDEX_TRUNCATE_LENGTH=1000(or use default)getAbnormalIndex()detects all existing indexes as abnormal (definition mismatch)repairIndex()drops and recreates all indexes with theLEFT()expressionTo revert: set
SEARCH_INDEX_TRUNCATE_LENGTH=0(triggers rebuild withoutLEFT())Follow-on Enhancement
Per-field configurable truncate length via field metadata, allowing admins to set different thresholds for different fields (e.g.,
titlegets full indexing,html_contentgets 500 bytes). This PR uses a global threshold as the initial implementation.Test Plan
getIndexExpression()with/without truncationcreateSingleIndexSql()output verificationgetAbnormalIndex()detecting old-format indexes🤖 Generated with Claude Code