Skip to content

feat: AI embeddings for feedback records#38

Merged
xernobyl merged 17 commits intomainfrom
feat/embeddings
Feb 27, 2026
Merged

feat: AI embeddings for feedback records#38
xernobyl merged 17 commits intomainfrom
feat/embeddings

Conversation

@xernobyl
Copy link
Contributor

@xernobyl xernobyl commented Feb 23, 2026

What does this PR do?

Adds embeddings for feedback records: when a feedback record is created or updated and has non-empty value_text, the system enqueues a job to generate an embedding via a configurable provider (OpenAI or Google Gemini) and stores it in a dedicated embeddings table (pgvector). This keeps embedding data out of the main feedback-records read path and supports multiple models per record.

Highlights:

  • Event-driven: EmbeddingProvider subscribes to feedback_record.created and feedback_record.updated (when value_text is in changed fields). Jobs are enqueued to a dedicated River queue (embeddings) so embedding work does not starve webhook delivery.
  • Pluggable providers: Single event provider and worker; embedding API is behind an EmbeddingClient interface. Implementations: OpenAI (text-embedding-3-small, configurable dimensions) and Google Gemini (gemini-embedding-001, via google.golang.org/genai). API key is optional (e.g. for local AI).
  • Schema: New embeddings table: id, feedback_record_id, embedding (vector), model, created_at, updated_at; unique on (feedback_record_id, model); ON DELETE CASCADE from feedback_records.
  • Backfill: cmd/backfill-embeddings enqueues jobs for existing records that have value_text but no embedding for the configured model. Requires EMBEDDING_PROVIDER and EMBEDDING_MODEL (no defaults).
  • Config: EMBEDDING_PROVIDER (openai | google), EMBEDDING_MODEL, EMBEDDING_PROVIDER_API_KEY (optional), EMBEDDING_DIMENSIONS (default 1536), EMBEDDING_MAX_CONCURRENT, EMBEDDING_MAX_ATTEMPTS. Supported providers are kept in a map for easy extension.
  • Observability: Embedding metrics (enqueued, outcomes, duration, errors) and structured logging; worker retries on transient errors and skips retry on not-found.

Fixes #(issue)

No API contract changes: embeddings are not exposed on feedback-record list/get. They are stored for future use (e.g. semantic search).

How should this be tested?

  • Unit: go test ./internal/... ./cmd/... (includes embedding_provider_test.go, worker tests).
  • Integration: make tests (requires Postgres; tests/ use feedback records service with embedding model; DB must have embeddings table from migration).
  • Manual:
    1. make init-db (and make river-migrate if using River UI) so embeddings exists.
    2. Set in .env: EMBEDDING_PROVIDER=openai, EMBEDDING_MODEL=text-embedding-3-small, EMBEDDING_PROVIDER_API_KEY=sk-... (or use google and a Gemini API key; or leave key empty for a no-key provider).
    3. make run; create a feedback record with value_text; confirm embedding job is enqueued and processed (logs: "embedding: job enqueued", "embedding: stored") and a row appears in embeddings.
    4. Backfill: EMBEDDING_PROVIDER=openai EMBEDDING_MODEL=text-embedding-3-small DATABASE_URL=... go run ./cmd/backfill-embeddings (both env vars required).

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Repository Guidelines
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand bits
  • Ran make build
  • Ran make tests (integration tests in tests/)
  • Ran make fmt and make lint; no new warnings
  • Removed debug prints / temporary logging
  • Merged the latest changes from main onto my branch with git pull origin main
  • If database schema changed: added migration in migrations/ with goose annotations and ran make migrate-validate

Appreciated

  • If API changed: added or updated OpenAPI spec and ran contract tests (make tests or API contract workflow)
  • If API behavior changed: added request/response examples or Swagger UI screenshots to this PR
  • Updated docs in docs/ if changes were necessary
  • Ran make tests-coverage for meaningful logic changes

@github-actions
Copy link

github-actions bot commented Feb 23, 2026

✱ Stainless preview builds

This PR will update the hub SDKs with the following commit message.

feat: Embeddings

Edit this comment to update it. It will appear in the SDK's changelogs.

hub-openapi studio · code · diff

Your SDK built successfully.
generate ✅ (prev: generate ⚠️)

hub-typescript studio · code · diff

Your SDK built successfully.
generate ✅ (prev: generate ⚠️) → build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/hub-typescript/d42485dcb0ddb449d9cd52a5e219faef023e9549/dist.tar.gz

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-02-23 17:06:50 UTC

@xernobyl xernobyl changed the title feat: Embeddings feat: OpenAI embeddings for feedback records Feb 23, 2026
@xernobyl xernobyl marked this pull request as ready for review February 23, 2026 17:13
Copy link
Member

@mattinannt mattinannt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xernobyl Thank you for the PR :-)

Please always add the ticket link to the PR description so that ticket and PR get linked.

I currently only did a small review with a first AI-based review. Also maybe we need to discuss how we would proceed with supporting multiple AI/embeddings providers as this might happen much sooner than we might think.

@xernobyl xernobyl requested a review from mattinannt February 24, 2026 11:25
Copy link
Contributor

@BhagyaAmarasinghe BhagyaAmarasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I have commented on some issues I've noticed.
Could you also check on below 2 points as well:

  1. in feedback_records.go CreateFeedbackRecordRequest.SubmissionID is a non-pointer string with validate:"required". Every existing API consumer that doesn't send submission_id will now get a 400 validation error. This is a breaking change that should be called out in a changelog or migration guide, or made optional with a server-generated default.

  2. Migration 003 adds NOT NULL column with no default. If the table has any existing rows, this ALTER TABLE will fail because PostgreSQL cannot add a NOT NULL column without a DEFAULT value to a table with existing data. The migration needs a strategy: add as nullable, backfill (e.g set submission_id = id::text), then add the NOT NULL constraint.

@xernobyl
Copy link
Contributor Author

@BhagyaAmarasinghe submission_id topics can be fixed on another PR

Copy link
Member

@mattinannt mattinannt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xernobyl thank you for updating the PR. I have a few concerns regarding the embedding model provider and indexes. let's discuss.

@xernobyl xernobyl changed the title feat: OpenAI embeddings for feedback records feat: AI embeddings for feedback records Feb 25, 2026
@xernobyl xernobyl requested a review from mattinannt February 26, 2026 15:26
@xernobyl xernobyl enabled auto-merge February 27, 2026 08:54
@xernobyl xernobyl added this pull request to the merge queue Feb 27, 2026
Merged via the queue into main with commit 4927834 Feb 27, 2026
8 checks passed
@xernobyl xernobyl deleted the feat/embeddings branch February 27, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants