Skip to content

Conversation

@dan-rubinstein
Copy link
Member

@dan-rubinstein dan-rubinstein commented Nov 13, 2025

Issue - #137288

This change adds a configurable max_batch_size to the GoogleVertexAI embedding service settings. This will allow users to configure the number of chunks during chunked inference (ingest to a semantic text field) to the downstream service. This will allow users to override the existing 250 batch size to either reduce it to avoid hitting the 20k token limit across all chunks sent in a single request or to increase it (up to the 250 max) for improved inference performance if they are not sending large enough chunks to hit the limit.

@dan-rubinstein dan-rubinstein added >bug :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Nov 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @dan-rubinstein, I've created a changelog YAML for you.

@dan-rubinstein dan-rubinstein marked this pull request as ready for review November 14, 2025 15:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :ml Machine learning Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants