Tokenize endpoint empty array crash fix by przepeck · Pull Request #4208 · openvinotoolkit/model_server

przepeck · 2026-05-14T07:16:32Z

🛠 Summary

CVS-186166
Adding checks for empty nested array, which was making OVMS crash

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

Copilot

Pull request overview

Fixes a crash in the tokenize request parser when the text field contains an empty nested array (e.g. {"text":[[]]}), by explicitly rejecting empty inner arrays and adding regression tests across tokenize endpoint test suites.

Changes:

Add an explicit guard in TokenizeParser::parseInput() to reject empty inner arrays before indexing array[0].
Add negative HTTP tests to ensure text with empty nested arrays returns an error (and does not crash) for embeddings, rerank, and LLM tokenize endpoints.
Remove a previously existing GTEST_SKIP() in the LLM tokenize “add_special_tokens” test for VLM models.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`src/tokenize/tokenize_parser.cpp`	Adds empty-inner-array validation to prevent out-of-bounds access when inspecting nested arrays.
`src/test/reranknode_test.cpp`	Adds HTTP regression tests covering empty nested arrays for the rerank tokenize endpoint.
`src/test/llm/tokenize_endpoint_test.cpp`	Adds HTTP regression tests for empty nested arrays across all LLM tokenize test parameters; also removes a VLM-specific skip for add_special_tokens.
`src/test/embeddingsnode_test.cpp`	Adds HTTP regression tests covering empty nested arrays for the embeddings tokenize endpoint.

 TEST_P(LLMTokenizeTests, tokenizeStringWithAddSpecialTokens) {
    auto params = GetParam();
-    if (params.modelName == "vlm_cb_regular" || params.modelName == "vlm_legacy_regular") {
-        GTEST_SKIP() << "Skipping test for " << params.modelName;
-    }

    std::string requestBody = R"(


+TEST_F(EmbeddingsTokenizeHttpTest, tokenizeEmptyNestedArray) {
+    std::string requestBody = R"(
+        {
+            "model": "embeddings_ov",
+            "text": [[]]
+        }
+    )";
+    Status status = handler->dispatchToProcessor(endpointTokenize, requestBody, &response, comp, responseComponents, writer, multiPartParser);
+    ASSERT_EQ(status, ovms::StatusCode::MEDIAPIPE_EXECUTION_ERROR) << status.string();


+TEST_F(RerankTokenizeHttpTest, tokenizeEmptyNestedArray) {
+    std::string requestBody = R"(
+        {
+            "model": "rerank_ov",
+            "text": [[]]
+        }
+    )";
+    Status status = handler->dispatchToProcessor(endpointTokenize, requestBody, &response, comp, responseComponents, writer, multiPartParser);
+    ASSERT_EQ(status, ovms::StatusCode::MEDIAPIPE_EXECUTION_ERROR) << status.string();


+TEST_P(LLMTokenizeTests, tokenizeEmptyNestedArray) {
+    auto params = GetParam();
+
+    std::string requestBody = R"(
+        {
+            "model": ")" + params.modelName +
+                              R"(",
+            "text": [[]]
+        }
+    )";
+    Status status = handler->dispatchToProcessor(endpointTokenize, requestBody, &response, comp, responseComponents, writer, multiPartParser);
+    ASSERT_EQ(status, ovms::StatusCode::MEDIAPIPE_EXECUTION_ERROR) << status.string();
+}


dkalinowski · 2026-05-14T11:43:46Z

 TEST_P(LLMTokenizeTests, tokenizeStringWithAddSpecialTokens) {
    auto params = GetParam();
-    if (params.modelName == "vlm_cb_regular" || params.modelName == "vlm_legacy_regular") {
-        GTEST_SKIP() << "Skipping test for " << params.modelName;


Is it working for all paths, or only special token path? Do other tests have it unskipped?

przepeck added 3 commits May 14, 2026 08:55

fix

a2ee7e0

test

8012f74

style

f846d1f

przepeck added this to the 2026.2_rc milestone May 14, 2026

przepeck requested review from atobiszei and michalkulakowski May 14, 2026 07:16

przepeck added 2 commits May 14, 2026 09:22

test

840f902

style

4b4f5bf

przepeck marked this pull request as ready for review May 14, 2026 07:48

Copilot AI review requested due to automatic review settings May 14, 2026 07:48

Copilot started reviewing on behalf of przepeck May 14, 2026 07:50 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

copilot review

e75b888

michalkulakowski approved these changes May 14, 2026

View reviewed changes

extract function

1b1259e

dkalinowski reviewed May 14, 2026

View reviewed changes

dkalinowski approved these changes May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenize endpoint empty array crash fix #4208

Tokenize endpoint empty array crash fix #4208
przepeck wants to merge 7 commits into
mainfrom
przepeck/tokenize_psirt

przepeck commented May 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

dkalinowski May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

przepeck commented May 14, 2026

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

dkalinowski May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants