Skip to content

CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207

Open
gnodet wants to merge 11 commits intomainfrom
hungry-quark
Open

CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207
gnodet wants to merge 11 commits intomainfrom
hungry-quark

Conversation

@gnodet
Copy link
Contributor

@gnodet gnodet commented Mar 23, 2026

Summary

Implements CAMEL-21540: Vector Database capabilities for PostgreSQL.

  • New camel-pgvector component under components/camel-ai/ that provides vector similarity search capabilities using the PostgreSQL pgvector extension
  • Supports actions: CREATE_TABLE, DROP_TABLE, UPSERT, DELETE, SIMILARITY_SEARCH
  • Uses JDBC with the com.pgvector:pgvector library for pgvector type support
  • Configurable distance types: cosine (default), euclidean, inner product
  • LangChain4j data type transformers (pgvector:embeddings and pgvector:rag) for RAG pipeline integration
  • Integration tests using testcontainers pgvector image
  • LangChain4j embeddings integration test with AllMiniLmL6V2 embedding model

Test plan

  • PgVectorComponentIT (7 tests) - CRUD operations and similarity search
  • LangChain4jEmbeddingsComponentPgVectorTargetIT (4 tests) - end-to-end LangChain4j embedding + pgvector integration
  • Code formatted and imports sorted
  • Generated files committed

@github-actions
Copy link
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2026

🧪 CI tested the following changed modules:

  • bom/camel-bom
  • catalog/camel-allcomponents
  • catalog/camel-catalog
  • components/camel-ai
  • components/camel-ai/camel-langchain4j-embeddings
  • components/camel-ai/camel-pgvector
  • core/camel-main
  • docs
  • dsl/camel-componentdsl
  • dsl/camel-endpointdsl
  • dsl/camel-kamelet-main
  • parent
  • tooling/maven/camel-package-maven-plugin

ℹ️ Dependent modules were not tested because the total number of affected modules exceeded the threshold (50). Use the test-dependents label to force testing all dependents.

Build reactor — dependencies compiled but only changed modules were tested (26 modules)
  • Camel :: AI :: LangChain4j :: Embedding
  • Camel :: AI :: LangChain4j :: Embedding [jar]
  • Camel :: AI :: PGVector
  • Camel :: AI :: PGVector [jar]
  • Camel :: AI :: Parent
  • Camel :: AI :: Parent [pom]
  • Camel :: All Components Sync point
  • Camel :: All Components Sync point [pom]
  • Camel :: BOM
  • Camel :: BOM [pom]
  • Camel :: Catalog :: Camel Catalog
  • Camel :: Catalog :: Camel Catalog [jar]
  • Camel :: Component DSL
  • Camel :: Component DSL [jar]
  • Camel :: Docs
  • Camel :: Docs [pom]
  • Camel :: Endpoint DSL
  • Camel :: Endpoint DSL [jar]
  • Camel :: Kamelet Main
  • Camel :: Kamelet Main [jar]
  • Camel :: Main
  • Camel :: Main [jar]
  • Camel :: Maven Plugins :: Camel Maven Package
  • Camel :: Maven Plugins :: Camel Maven Package [maven-plugin]
  • Camel :: Parent
  • Camel :: Parent [pom]

@gnodet gnodet marked this pull request as ready for review March 24, 2026 05:53
Copy link
Contributor

@apupier apupier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for a different postgres component specific for the vectors? Why not including these capabilities directly in the existing Postgres component?

Copy link
Contributor Author

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code on behalf of Guillaume Nodet

Good question! There are several reasons for keeping pgvector as a separate component under camel-ai/ rather than adding it to the existing PostgreSQL components:

  1. Consistency with other vector DB components. All vector database components live under camel-ai/: Milvus, Qdrant, Pinecone, Weaviate, Neo4j. pgvector serves the same purpose — vector similarity search for AI/RAG pipelines — it just happens to use PostgreSQL as its storage engine. Users looking for vector database components would naturally look in the AI group.

  2. Different use case and audience. The existing PG components (camel-pgevent for LISTEN/NOTIFY, camel-pg-replication-slot for WAL replication) are PostgreSQL-specific integration utilities. pgvector targets a completely different use case: embeddings storage, similarity search, and RAG pipelines. These serve different personas.

  3. Additional dependencies. pgvector brings in com.pgvector:pgvector (for vector type support) and LangChain4j data type transformers — AI-oriented dependencies that would be out of place in a general PostgreSQL component.

  4. Separate lifecycle. Keeping it as its own component allows it to evolve independently without affecting the existing PostgreSQL components.

This is the same approach taken by other frameworks — for example, LangChain and LlamaIndex treat pgvector as a vector store alongside Pinecone/Qdrant/etc., not as a PostgreSQL utility.

gnodet and others added 11 commits March 24, 2026 23:22
- New camel-pgvector component under components/camel-ai/
- Supports CREATE_TABLE, DROP_TABLE, UPSERT, DELETE, SIMILARITY_SEARCH actions
- Uses PostgreSQL pgvector extension via JDBC with com.pgvector library
- Supports cosine, euclidean, and inner product distance types
- LangChain4j data type transformers: pgvector:embeddings and pgvector:rag
- Integration tests with testcontainers pgvector image
- LangChain4j embeddings integration test with AllMiniLmL6V2 model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BOM, catalog, DSL, and documentation generated files
- Remove Spring Boot starter reference (no starter yet)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…in and add pgvector entries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove pgvector from alphabetical nav listing (grouped under AI only)
- Fix trailing spaces in javadoc blank comment lines
- Update EventEndpointBuilderFactory to CamelEventEndpointBuilderFactory
- Update rest-openapi description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CREATE_INDEX action to create HNSW indexes for faster approximate
  nearest neighbor search, using the configured distance type
- Add CamelPgVectorFilter header to apply SQL WHERE clause filtering
  on similarity search results (e.g., filter by metadata or text content)
- Add integration tests for both features

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…index action

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ** glob with {*,*/*} for dsl source pattern to prevent
scandir of target/ directories created during parallel builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dsl.adoc lives at dsl/src/main/docs/ (depth 0), which is not
matched by {*,*/*}. Add explicit pattern for it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gnodet added a commit that referenced this pull request Mar 25, 2026
When a new dependency is added to parent/pom.xml, the diff contains
structural XML elements like <groupId>, <artifactId>, <version> which
were incorrectly extracted as "changed properties" by detectChangedProperties.

This caused the script to search for modules using ${artifactId} or
${groupId} as property references, which either matched nothing useful
or caused spurious failures.

Fix: filter out known structural POM element names (groupId, artifactId,
version, scope, type, etc.) so only actual property names like
"pgvector-version" or "openai-java-version" are detected.

Fixes the CI script bug seen in PR #22207 where adding a new component
to parent/pom.xml caused the dependency detection to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants