default task detection based on model config by dtrawins · Pull Request #4317 · openvinotoolkit/model_server

dtrawins · 2026-06-22T23:19:18Z

🛠 Summary

CVS-188024
task parameter will be determined automatically based on model config.json, model_index.json and in special cases based on model name pattern.
It simplifies model deployment from HF hub.
While using --source_model - it will always create or update graph.pbtxt
While using --model_path - it will using params from graph.pbtxt if exist and no overwrite params are passed. When graph.pbtxt is missing, it will determine task automatically
In case task can't be determined like for unknows architectures - no default task will be set to be provided by the user.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

dtrawins · 2026-06-23T10:04:03Z

+                // Check if task-specific parameters are provided or if graph.pbtxt is missing
+                bool hasUnmatchedOptions = ::ovms::hasTaskSpecificParameters(result->unmatched());
+                bool graphExists = ::ovms::graphPbtxtExists(*modelPath);
+


Suggested change

dtrawins · 2026-06-23T10:04:19Z

+                const std::optional<std::string> modelPath = result->count("model_path") ? std::make_optional(result->operator[]("model_path").as<std::string>()) : std::nullopt;
+                const std::optional<std::string> sourceModel = result->count("source_model") ? std::make_optional(result->operator[]("source_model").as<std::string>()) : std::nullopt;
+                const std::optional<std::string> modelRepositoryPath = result->count("model_repository_path") ? std::make_optional(result->operator[]("model_repository_path").as<std::string>()) : std::nullopt;
+


Suggested change

dtrawins · 2026-06-23T10:04:31Z

+                    bool graphExists = ::ovms::graphPbtxtExists(*modelPath);
+                    shouldInferTask = hasUnmatchedOptions || !graphExists;
+                }
+


Suggested change

dkalinowski · 2026-06-24T10:16:35Z

@@ -0,0 +1,6 @@
+{
+  "architectures": ["XLMRobertaForSequenceClassification"],


I see its possible to have more than 1 architecture (its a list). How would OVMS behave? Can you add test for that behavior?

In theory this is possible but rarely if ever happens. Added tests to cover it. First architecture will be used.

dkalinowski · 2026-06-24T10:17:36Z

@@ -0,0 +1,5 @@
+{
+  "architectures": ["UNet2DConditionModel"],


Please add these files to data field in unit test build target. If you dont do that, the unit tests will not re-run once this file change.

Example: https://github.com/openvinotoolkit/model_server/blob/main/src/BUILD#L2337-L2364

dkalinowski · 2026-06-24T10:20:59Z

+    const std::string modelPath = resolveTestModelPath("llama");
+    const std::filesystem::path configJson = std::filesystem::path(modelPath) / "config.json";
+    if (!std::filesystem::exists(configJson)) {
+        GTEST_SKIP() << "Test prerequisite missing: " << configJson.string();


Why skipping? Shouldnt it error?

dkalinowski · 2026-06-24T10:21:54Z

+cc_binary(
+    name = "num_streams_repro",
+    srcs = [
+        "num_streams_repro.cpp",


dkalinowski · 2026-06-24T10:22:38Z

+    {"XLMRobertaModel", "embeddings"},
+};
+
+std::string getEnvOrDefault(const char* envName, const std::string& defaultValue = "") {


these functions should be static

why? They are already inside a namespace block?

Why we do not use utils from tils/env_guard.hpp?

dkalinowski · 2026-06-24T10:23:38Z

+        }
+    }
+    if (!resolvedTask.has_value()) {
+        throw std::logic_error("config.json architectures do not map to a supported default task");


please add test for that

dkalinowski · 2026-06-24T10:24:52Z


        result = std::make_unique<cxxopts::ParseResult>(options->parse(argc, argv));

+        const bool isConfigManagementFlow =


could be a CLIParser method

Copilot

Pull request overview

This PR adds automatic inference of the default --task value for generative flows by inspecting HuggingFace-style config.json / model_index.json, reducing the need to pass --task explicitly when starting from --model_path or --source_model.

Changes:

Add task inference logic in CLIParser based on model architectures / Diffusers pipeline _class_name, including support for local and remote (HF endpoint) config retrieval.
Introduce test fixtures (model config JSONs) and new unit tests validating task inference and config parsing behavior.
Centralize HF-related env var names in a shared header and reduce LLM calculator log verbosity (DEBUG → TRACE).

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/cli_parser.hpp	Adds inferred task state and declares task inference helper(s).
src/cli_parser.cpp	Implements task inference from config/model_index and integrates it into CLI parse flow.
src/pull_module/hf_pull_model_module.cpp	Switches HF env var usage to named constants.
src/pull_module/hf_env_vars.hpp	New header providing HF env var and default endpoint constants.
src/pull_module/BUILD	Adds a Bazel target for the new `hf_env_vars` header and wires it into deps.
src/llm/http_llm_calculator.cc	Lowers Open/Close logging from DEBUG to TRACE.
src/test/task_determine_test.cpp	New parameterized unit test for `determineDefaultTaskParameter()` across model configs.
src/test/ovmsconfig_test.cpp	Updates/extends config parsing death + positive tests for inferred task behavior.
src/test/models_config_json/xlm_roberta/config.json	Test model HF config fixture for rerank detection.
src/test/models_config_json/whisper/config.json	Test model HF config fixture for speech2text detection.
src/test/models_config_json/trinity/config.json	Test model HF config fixture for text_generation detection.
src/test/models_config_json/t5_encoder/config.json	Test model HF config fixture for embeddings detection.
src/test/models_config_json/stable_diffusion/config.json	Test model HF config fixture for image_generation detection.
src/test/models_config_json/speecht5/config.json	Test model HF config fixture for text2speech detection.
src/test/models_config_json/seamlessm4t/config.json	Test model HF config fixture for speech2text detection.
src/test/models_config_json/sdxl/model_index.json	Test Diffusers `model_index.json` fixture for image_generation detection.
src/test/models_config_json/qwen3/config.json	Test model HF config fixture for text_generation detection.
src/test/models_config_json/Qwen3-Reranker-0.6B/config.json	Test model HF config fixture for “questionable architecture” rerank disambiguation.
src/test/models_config_json/Qwen3-Embedding-0.6B/config.json	Test model HF config fixture for “questionable architecture” embeddings disambiguation.
src/test/models_config_json/Qwen3-8B/config.json	Test model HF config fixture for “questionable architecture” ambiguity handling.
src/test/models_config_json/qwen3_multi_arch/config.json	Test model HF config fixture for multi-architecture task resolution.
src/test/models_config_json/qwen3_asr/config.json	Test model HF config fixture for speech2text detection.
src/test/models_config_json/qwen3_6/config.json	Test model HF config fixture for heuristic text_generation detection.
src/test/models_config_json/qwen2_rerank/config.json	Test model HF config fixture for rerank detection.
src/test/models_config_json/qwen2_embedding/config.json	Test model HF config fixture for embeddings detection.
src/test/models_config_json/phi3/config.json	Test model HF config fixture for text_generation detection.
src/test/models_config_json/parlertts/config.json	Test model HF config fixture for text2speech detection.
src/test/models_config_json/NullArch/config.json	Test model HF config fixture for null-architectures negative path.
src/test/models_config_json/no_architectures/config.json	Test model HF config fixture for missing-architectures negative path.
src/test/models_config_json/llama/config.json	Test model HF config fixture for text_generation detection.
src/test/models_config_json/lfm/config.json	Test model HF config fixture for text_generation detection.
src/test/models_config_json/Kokoro/config.json	Test model HF config fixture for null-architectures → text2speech special-case.
src/test/models_config_json/invalid_architecture/config.json	Test model HF config fixture for unsupported-architecture negative path.
src/test/models_config_json/gemma4/config.json	Test model HF config fixture for text_generation detection.
src/test/models_config_json/flux/config.json	Test model HF config fixture for image_generation detection.
src/test/models_config_json/flux_pipeline/model_index.json	Test Diffusers `model_index.json` fixture for image_generation detection.
src/test/models_config_json/cross_encoder/config.json	Test model HF config fixture for rerank detection.
src/test/models_config_json/bge/config.json	Test model HF config fixture for embeddings detection.
src/test/models_config_json/bge_reranker/config.json	Test model HF config fixture for rerank detection.
src/BUILD	Adds RapidJSON + pull-module deps to `cli_parser` target; adds new test and test data glob.

+#include <fstream>
 #include <filesystem>
 #include <iostream>
+#include <optional>
 #include <stdexcept>
 #include <string>
+#include <map>
 #include <utility>


dtrawins · 2026-06-24T22:32:14Z

+std::string getTaskForQuestionableArchitecture(const std::string& architecture, const std::string& normalizedModelIdentifier) {
+    const auto architectureRules = questionableArchitectureTaskKeywords.find(architecture);
+    if (architectureRules == questionableArchitectureTaskKeywords.end()) {
+        return "";
+    }
+    const auto& [defaultTask, patternRules] = architectureRules->second;
+    for (const auto& [task, keyword] : patternRules) {
+        if (normalizedModelIdentifier.find(keyword) != std::string::npos) {
+            return task;
+        }
+    }
+    return defaultTask;
+}


currently the only questionable architecture is Qwen3ForCausalLM which is used most frequently for text generation like Qwen3-4B, Qwen3-8B etc. There is no unique pattern specific for text generation. The pattern for rerank and embed should help in proper task identification. For potential other questionable architectures it would be possible to set empty default task.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

mzegla

I'm not a fan of putting entire auto detection logic directly to cli_parser file, especially if we expect it to grow.
From the architecture point of view I would encourage the following:

Keep CLI parser clean - on the parser level just parse input.
Have a separate file with a class AutoTaskDetector or something similar
Instead of multiple mapping we could have object representations like "Qwen3ForCausalLM" maps to Qwen3ForCausalLM class which internally has logic to check further (since its ambiguous and we can't directly tell if it's for generation, embedding, reranker etc.). Missing or null architecture could have their own hirarchy.

I know it would expand the code significantly, but such approach would look more scalable and easier to manage to me. As there are tons of models and multiple selection criteria, having multiple mapping and top-level chain of ifs directly in CLI parser does not look like a good design to me.

mzegla · 2026-06-25T09:14:34Z

@@ -0,0 +1,22 @@
+//****************************************************************************
+// Copyright 2025 Intel Corporation


atobiszei · 2026-06-26T09:00:32Z

        "test/tensor_conversion_test.cpp",
        "test/tensorinfo_test.cpp",
        "test/tensorutils_test.cpp",
+        "test/task_determine_test.cpp",


Add as separate test library target and make those new config files as data field there.

atobiszei · 2026-06-26T09:02:54Z

+#include <rapidjson/document.h>
+#include <rapidjson/error/en.h>
+#include <rapidjson/istreamwrapper.h>


include src/port/rapidjson*hpp instead to avoid spilling pragmas/ifdefs.

atobiszei · 2026-06-26T09:04:38Z

+            SPDLOG_DEBUG("Using local absolute model path for graph export: {}", hfSettings.exportSettings.modelPath);
+        }
+        const std::string taskValue = getEffectiveTaskParameter();
+        if (inferredTaskParameter.has_value()) {
+            SPDLOG_INFO("Identified default task '{}' from model config", inferredTaskParameter.value());


We did not use spdlog in cli_parser on purpose. Logging could not be setup yet. Use cout. We also did use cout for model pulling.

atobiszei · 2026-06-26T09:06:02Z

+#endif
+
 #include "capi_frontend/server_settings.hpp"
+#include "logging.hpp"


spdlog was not used in cli_parser on purpose - during cli parsing spdlog could be not initialized yet.

atobiszei · 2026-06-26T09:09:19Z

        "//src/graph_export:t2s_graph_cli_parser",
        "//src/graph_export:s2t_graph_cli_parser",
        "//src/graph_export:image_generation_graph_cli_parser",
+        "//src/pull_module:curl_downloader",


Why we do need to add those includes here? We are spilling dependencies.

atobiszei · 2026-06-26T09:09:47Z

    srcs = ["cli_parser.cpp"],
    deps = [
        "@com_github_jarro2783_cxxopts//:cxxopts",
+        "@com_github_tencent_rapidjson//:rapidjson",


Depend on src/port/rapidjson*

atobiszei · 2026-06-26T09:20:42Z

+    std::string responseBody;
+    const std::string hfEndpoint = ensureTrailingSlash(getEnvOrDefault(HF_ENDPOINT_ENV_VAR, DEFAULT_HF_ENDPOINT));
+    const std::string configUrl = hfEndpoint + *sourceModel + "/resolve/main/" + MODEL_CONFIG_FILENAME;
+    const auto status = fetchUrlToString(configUrl, getEnvOrDefault(HF_TOKEN_ENV_VAR), responseBody);


What happens if someone launches OVMS with local model without access to network?

atobiszei · 2026-06-26T09:25:55Z

 }

+std::string CLIParser::determineDefaultTaskParameter(const std::optional<std::string>& modelPath, const std::optional<std::string>& sourceModel, const std::optional<std::string>& modelRepositoryPath) {
+    if (modelPath.has_value() && !modelPath->empty()) {


I think most logic for determining the task doesn't depend a lot on cli_parser and could be separated into new file, which wouldn't be rebuild with each cli_parser.cpp change.

dtrawins added 2 commits June 22, 2026 17:31

initial version

981bdf6

handling default task

ba4eac9

dtrawins commented Jun 23, 2026

View reviewed changes

dtrawins requested review from dkalinowski and mzegla June 23, 2026 10:09

dtrawins added 9 commits June 23, 2026 14:43

test task detection

d3de201

style

98ea635

style

fac11b3

missing file

f6fa1d9

more tests

29292b4

style

055352d

win build

5673c47

fix tests

60984c7

fix

8520129

dkalinowski reviewed Jun 24, 2026

View reviewed changes

dtrawins added 5 commits June 24, 2026 14:20

add questionable architectures

fe7ef4b

image generation fixes and tests

ca3d558

add missing files

faa924a

more tets

72d016b

Merge remote-tracking branch 'origin/main' into default-task

954f297

dtrawins marked this pull request as ready for review June 24, 2026 14:51

Copilot AI review requested due to automatic review settings June 24, 2026 14:51

Copilot started reviewing on behalf of dtrawins June 24, 2026 14:52 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Apply suggestions from code review

97304c9

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

dtrawins requested a review from rasapala June 25, 2026 08:59

mzegla reviewed Jun 25, 2026

View reviewed changes

atobiszei reviewed Jun 26, 2026

View reviewed changes

		@@ -0,0 +1,6 @@
		{
		"architectures": ["XLMRobertaForSequenceClassification"],


		result = std::make_unique<cxxopts::ParseResult>(options->parse(argc, argv));

		const bool isConfigManagementFlow =

		@@ -0,0 +1,22 @@
		//****************************************************************************
		// Copyright 2025 Intel Corporation

Uh oh!

Conversation

dtrawins commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary

🧪 Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mzegla left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atobiszei Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dtrawins commented Jun 22, 2026 •

edited

Loading

atobiszei Jun 26, 2026 •

edited

Loading