CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207
CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207
Conversation
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
🧪 CI tested the following changed modules:
Build reactor — dependencies compiled but only changed modules were tested (26 modules)
|
apupier
left a comment
There was a problem hiding this comment.
What is the reason for a different postgres component specific for the vectors? Why not including these capabilities directly in the existing Postgres component?
There was a problem hiding this comment.
Claude Code on behalf of Guillaume Nodet
Good question! There are several reasons for keeping pgvector as a separate component under camel-ai/ rather than adding it to the existing PostgreSQL components:
-
Consistency with other vector DB components. All vector database components live under
camel-ai/: Milvus, Qdrant, Pinecone, Weaviate, Neo4j. pgvector serves the same purpose — vector similarity search for AI/RAG pipelines — it just happens to use PostgreSQL as its storage engine. Users looking for vector database components would naturally look in the AI group. -
Different use case and audience. The existing PG components (
camel-pgeventfor LISTEN/NOTIFY,camel-pg-replication-slotfor WAL replication) are PostgreSQL-specific integration utilities. pgvector targets a completely different use case: embeddings storage, similarity search, and RAG pipelines. These serve different personas. -
Additional dependencies. pgvector brings in
com.pgvector:pgvector(for vector type support) and LangChain4j data type transformers — AI-oriented dependencies that would be out of place in a general PostgreSQL component. -
Separate lifecycle. Keeping it as its own component allows it to evolve independently without affecting the existing PostgreSQL components.
This is the same approach taken by other frameworks — for example, LangChain and LlamaIndex treat pgvector as a vector store alongside Pinecone/Qdrant/etc., not as a PostgreSQL utility.
- New camel-pgvector component under components/camel-ai/ - Supports CREATE_TABLE, DROP_TABLE, UPSERT, DELETE, SIMILARITY_SEARCH actions - Uses PostgreSQL pgvector extension via JDBC with com.pgvector library - Supports cosine, euclidean, and inner product distance types - LangChain4j data type transformers: pgvector:embeddings and pgvector:rag - Integration tests with testcontainers pgvector image - LangChain4j embeddings integration test with AllMiniLmL6V2 model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BOM, catalog, DSL, and documentation generated files - Remove Spring Boot starter reference (no starter yet) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…in and add pgvector entries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove pgvector from alphabetical nav listing (grouped under AI only) - Fix trailing spaces in javadoc blank comment lines - Update EventEndpointBuilderFactory to CamelEventEndpointBuilderFactory - Update rest-openapi description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CREATE_INDEX action to create HNSW indexes for faster approximate nearest neighbor search, using the configured distance type - Add CamelPgVectorFilter header to apply SQL WHERE clause filtering on similarity search results (e.g., filter by metadata or text content) - Add integration tests for both features Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…index action Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ** glob with {*,*/*} for dsl source pattern to prevent
scandir of target/ directories created during parallel builds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dsl.adoc lives at dsl/src/main/docs/ (depth 0), which is not
matched by {*,*/*}. Add explicit pattern for it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a new dependency is added to parent/pom.xml, the diff contains
structural XML elements like <groupId>, <artifactId>, <version> which
were incorrectly extracted as "changed properties" by detectChangedProperties.
This caused the script to search for modules using ${artifactId} or
${groupId} as property references, which either matched nothing useful
or caused spurious failures.
Fix: filter out known structural POM element names (groupId, artifactId,
version, scope, type, etc.) so only actual property names like
"pgvector-version" or "openai-java-version" are detected.
Fixes the CI script bug seen in PR #22207 where adding a new component
to parent/pom.xml caused the dependency detection to fail.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Implements CAMEL-21540: Vector Database capabilities for PostgreSQL.
camel-pgvectorcomponent undercomponents/camel-ai/that provides vector similarity search capabilities using the PostgreSQL pgvector extensionCREATE_TABLE,DROP_TABLE,UPSERT,DELETE,SIMILARITY_SEARCHcom.pgvector:pgvectorlibrary for pgvector type supportpgvector:embeddingsandpgvector:rag) for RAG pipeline integrationTest plan
PgVectorComponentIT(7 tests) - CRUD operations and similarity searchLangChain4jEmbeddingsComponentPgVectorTargetIT(4 tests) - end-to-end LangChain4j embedding + pgvector integration