default task detection based on model config#4317
Conversation
| // Check if task-specific parameters are provided or if graph.pbtxt is missing | ||
| bool hasUnmatchedOptions = ::ovms::hasTaskSpecificParameters(result->unmatched()); | ||
| bool graphExists = ::ovms::graphPbtxtExists(*modelPath); | ||
|
|
| const std::optional<std::string> modelPath = result->count("model_path") ? std::make_optional(result->operator[]("model_path").as<std::string>()) : std::nullopt; | ||
| const std::optional<std::string> sourceModel = result->count("source_model") ? std::make_optional(result->operator[]("source_model").as<std::string>()) : std::nullopt; | ||
| const std::optional<std::string> modelRepositoryPath = result->count("model_repository_path") ? std::make_optional(result->operator[]("model_repository_path").as<std::string>()) : std::nullopt; | ||
|
|
| bool graphExists = ::ovms::graphPbtxtExists(*modelPath); | ||
| shouldInferTask = hasUnmatchedOptions || !graphExists; | ||
| } | ||
|
|
| @@ -0,0 +1,6 @@ | |||
| { | |||
| "architectures": ["XLMRobertaForSequenceClassification"], | |||
There was a problem hiding this comment.
I see its possible to have more than 1 architecture (its a list). How would OVMS behave? Can you add test for that behavior?
There was a problem hiding this comment.
In theory this is possible but rarely if ever happens. Added tests to cover it. First architecture will be used.
| @@ -0,0 +1,5 @@ | |||
| { | |||
| "architectures": ["UNet2DConditionModel"], | |||
There was a problem hiding this comment.
Please add these files to data field in unit test build target. If you dont do that, the unit tests will not re-run once this file change.
Example: https://github.com/openvinotoolkit/model_server/blob/main/src/BUILD#L2337-L2364
| const std::string modelPath = resolveTestModelPath("llama"); | ||
| const std::filesystem::path configJson = std::filesystem::path(modelPath) / "config.json"; | ||
| if (!std::filesystem::exists(configJson)) { | ||
| GTEST_SKIP() << "Test prerequisite missing: " << configJson.string(); |
There was a problem hiding this comment.
Why skipping? Shouldnt it error?
| cc_binary( | ||
| name = "num_streams_repro", | ||
| srcs = [ | ||
| "num_streams_repro.cpp", |
| {"XLMRobertaModel", "embeddings"}, | ||
| }; | ||
|
|
||
| std::string getEnvOrDefault(const char* envName, const std::string& defaultValue = "") { |
There was a problem hiding this comment.
these functions should be static
There was a problem hiding this comment.
why? They are already inside a namespace block?
There was a problem hiding this comment.
Why we do not use utils from tils/env_guard.hpp?
| } | ||
| } | ||
| if (!resolvedTask.has_value()) { | ||
| throw std::logic_error("config.json architectures do not map to a supported default task"); |
There was a problem hiding this comment.
please add test for that
|
|
||
| result = std::make_unique<cxxopts::ParseResult>(options->parse(argc, argv)); | ||
|
|
||
| const bool isConfigManagementFlow = |
There was a problem hiding this comment.
could be a CLIParser method
There was a problem hiding this comment.
Pull request overview
This PR adds automatic inference of the default --task value for generative flows by inspecting HuggingFace-style config.json / model_index.json, reducing the need to pass --task explicitly when starting from --model_path or --source_model.
Changes:
- Add task inference logic in
CLIParserbased on modelarchitectures/ Diffusers pipeline_class_name, including support for local and remote (HF endpoint) config retrieval. - Introduce test fixtures (model config JSONs) and new unit tests validating task inference and config parsing behavior.
- Centralize HF-related env var names in a shared header and reduce LLM calculator log verbosity (DEBUG → TRACE).
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cli_parser.hpp | Adds inferred task state and declares task inference helper(s). |
| src/cli_parser.cpp | Implements task inference from config/model_index and integrates it into CLI parse flow. |
| src/pull_module/hf_pull_model_module.cpp | Switches HF env var usage to named constants. |
| src/pull_module/hf_env_vars.hpp | New header providing HF env var and default endpoint constants. |
| src/pull_module/BUILD | Adds a Bazel target for the new hf_env_vars header and wires it into deps. |
| src/llm/http_llm_calculator.cc | Lowers Open/Close logging from DEBUG to TRACE. |
| src/test/task_determine_test.cpp | New parameterized unit test for determineDefaultTaskParameter() across model configs. |
| src/test/ovmsconfig_test.cpp | Updates/extends config parsing death + positive tests for inferred task behavior. |
| src/test/models_config_json/xlm_roberta/config.json | Test model HF config fixture for rerank detection. |
| src/test/models_config_json/whisper/config.json | Test model HF config fixture for speech2text detection. |
| src/test/models_config_json/trinity/config.json | Test model HF config fixture for text_generation detection. |
| src/test/models_config_json/t5_encoder/config.json | Test model HF config fixture for embeddings detection. |
| src/test/models_config_json/stable_diffusion/config.json | Test model HF config fixture for image_generation detection. |
| src/test/models_config_json/speecht5/config.json | Test model HF config fixture for text2speech detection. |
| src/test/models_config_json/seamlessm4t/config.json | Test model HF config fixture for speech2text detection. |
| src/test/models_config_json/sdxl/model_index.json | Test Diffusers model_index.json fixture for image_generation detection. |
| src/test/models_config_json/qwen3/config.json | Test model HF config fixture for text_generation detection. |
| src/test/models_config_json/Qwen3-Reranker-0.6B/config.json | Test model HF config fixture for “questionable architecture” rerank disambiguation. |
| src/test/models_config_json/Qwen3-Embedding-0.6B/config.json | Test model HF config fixture for “questionable architecture” embeddings disambiguation. |
| src/test/models_config_json/Qwen3-8B/config.json | Test model HF config fixture for “questionable architecture” ambiguity handling. |
| src/test/models_config_json/qwen3_multi_arch/config.json | Test model HF config fixture for multi-architecture task resolution. |
| src/test/models_config_json/qwen3_asr/config.json | Test model HF config fixture for speech2text detection. |
| src/test/models_config_json/qwen3_6/config.json | Test model HF config fixture for heuristic text_generation detection. |
| src/test/models_config_json/qwen2_rerank/config.json | Test model HF config fixture for rerank detection. |
| src/test/models_config_json/qwen2_embedding/config.json | Test model HF config fixture for embeddings detection. |
| src/test/models_config_json/phi3/config.json | Test model HF config fixture for text_generation detection. |
| src/test/models_config_json/parlertts/config.json | Test model HF config fixture for text2speech detection. |
| src/test/models_config_json/NullArch/config.json | Test model HF config fixture for null-architectures negative path. |
| src/test/models_config_json/no_architectures/config.json | Test model HF config fixture for missing-architectures negative path. |
| src/test/models_config_json/llama/config.json | Test model HF config fixture for text_generation detection. |
| src/test/models_config_json/lfm/config.json | Test model HF config fixture for text_generation detection. |
| src/test/models_config_json/Kokoro/config.json | Test model HF config fixture for null-architectures → text2speech special-case. |
| src/test/models_config_json/invalid_architecture/config.json | Test model HF config fixture for unsupported-architecture negative path. |
| src/test/models_config_json/gemma4/config.json | Test model HF config fixture for text_generation detection. |
| src/test/models_config_json/flux/config.json | Test model HF config fixture for image_generation detection. |
| src/test/models_config_json/flux_pipeline/model_index.json | Test Diffusers model_index.json fixture for image_generation detection. |
| src/test/models_config_json/cross_encoder/config.json | Test model HF config fixture for rerank detection. |
| src/test/models_config_json/bge/config.json | Test model HF config fixture for embeddings detection. |
| src/test/models_config_json/bge_reranker/config.json | Test model HF config fixture for rerank detection. |
| src/BUILD | Adds RapidJSON + pull-module deps to cli_parser target; adds new test and test data glob. |
| #include <fstream> | ||
| #include <filesystem> | ||
| #include <iostream> | ||
| #include <optional> | ||
| #include <stdexcept> | ||
| #include <string> | ||
| #include <map> | ||
| #include <utility> |
| std::string getTaskForQuestionableArchitecture(const std::string& architecture, const std::string& normalizedModelIdentifier) { | ||
| const auto architectureRules = questionableArchitectureTaskKeywords.find(architecture); | ||
| if (architectureRules == questionableArchitectureTaskKeywords.end()) { | ||
| return ""; | ||
| } | ||
| const auto& [defaultTask, patternRules] = architectureRules->second; | ||
| for (const auto& [task, keyword] : patternRules) { | ||
| if (normalizedModelIdentifier.find(keyword) != std::string::npos) { | ||
| return task; | ||
| } | ||
| } | ||
| return defaultTask; | ||
| } |
There was a problem hiding this comment.
currently the only questionable architecture is Qwen3ForCausalLM which is used most frequently for text generation like Qwen3-4B, Qwen3-8B etc. There is no unique pattern specific for text generation. The pattern for rerank and embed should help in proper task identification. For potential other questionable architectures it would be possible to set empty default task.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
mzegla
left a comment
There was a problem hiding this comment.
I'm not a fan of putting entire auto detection logic directly to cli_parser file, especially if we expect it to grow.
From the architecture point of view I would encourage the following:
- Keep CLI parser clean - on the parser level just parse input.
- Have a separate file with a class
AutoTaskDetectoror something similar - Instead of multiple mapping we could have object representations like "Qwen3ForCausalLM" maps to Qwen3ForCausalLM class which internally has logic to check further (since its ambiguous and we can't directly tell if it's for generation, embedding, reranker etc.). Missing or null architecture could have their own hirarchy.
I know it would expand the code significantly, but such approach would look more scalable and easier to manage to me. As there are tons of models and multiple selection criteria, having multiple mapping and top-level chain of ifs directly in CLI parser does not look like a good design to me.
| @@ -0,0 +1,22 @@ | |||
| //**************************************************************************** | |||
| // Copyright 2025 Intel Corporation | |||
| "test/tensor_conversion_test.cpp", | ||
| "test/tensorinfo_test.cpp", | ||
| "test/tensorutils_test.cpp", | ||
| "test/task_determine_test.cpp", |
There was a problem hiding this comment.
Add as separate test library target and make those new config files as data field there.
| #include <rapidjson/document.h> | ||
| #include <rapidjson/error/en.h> | ||
| #include <rapidjson/istreamwrapper.h> |
There was a problem hiding this comment.
include src/port/rapidjson*hpp instead to avoid spilling pragmas/ifdefs.
| SPDLOG_DEBUG("Using local absolute model path for graph export: {}", hfSettings.exportSettings.modelPath); | ||
| } | ||
| const std::string taskValue = getEffectiveTaskParameter(); | ||
| if (inferredTaskParameter.has_value()) { | ||
| SPDLOG_INFO("Identified default task '{}' from model config", inferredTaskParameter.value()); |
There was a problem hiding this comment.
We did not use spdlog in cli_parser on purpose. Logging could not be setup yet. Use cout. We also did use cout for model pulling.
| #endif | ||
|
|
||
| #include "capi_frontend/server_settings.hpp" | ||
| #include "logging.hpp" |
There was a problem hiding this comment.
spdlog was not used in cli_parser on purpose - during cli parsing spdlog could be not initialized yet.
| "//src/graph_export:t2s_graph_cli_parser", | ||
| "//src/graph_export:s2t_graph_cli_parser", | ||
| "//src/graph_export:image_generation_graph_cli_parser", | ||
| "//src/pull_module:curl_downloader", |
There was a problem hiding this comment.
Why we do need to add those includes here? We are spilling dependencies.
| srcs = ["cli_parser.cpp"], | ||
| deps = [ | ||
| "@com_github_jarro2783_cxxopts//:cxxopts", | ||
| "@com_github_tencent_rapidjson//:rapidjson", |
There was a problem hiding this comment.
Depend on src/port/rapidjson*
| std::string responseBody; | ||
| const std::string hfEndpoint = ensureTrailingSlash(getEnvOrDefault(HF_ENDPOINT_ENV_VAR, DEFAULT_HF_ENDPOINT)); | ||
| const std::string configUrl = hfEndpoint + *sourceModel + "/resolve/main/" + MODEL_CONFIG_FILENAME; | ||
| const auto status = fetchUrlToString(configUrl, getEnvOrDefault(HF_TOKEN_ENV_VAR), responseBody); |
There was a problem hiding this comment.
What happens if someone launches OVMS with local model without access to network?
| } | ||
|
|
||
| std::string CLIParser::determineDefaultTaskParameter(const std::optional<std::string>& modelPath, const std::optional<std::string>& sourceModel, const std::optional<std::string>& modelRepositoryPath) { | ||
| if (modelPath.has_value() && !modelPath->empty()) { |
There was a problem hiding this comment.
I think most logic for determining the task doesn't depend a lot on cli_parser and could be separated into new file, which wouldn't be rebuild with each cli_parser.cpp change.
🛠 Summary
CVS-188024
task parameter will be determined automatically based on model config.json, model_index.json and in special cases based on model name pattern.
It simplifies model deployment from HF hub.
While using --source_model - it will always create or update graph.pbtxt
While using --model_path - it will using params from graph.pbtxt if exist and no overwrite params are passed. When graph.pbtxt is missing, it will determine task automatically
In case task can't be determined like for unknows architectures - no default task will be set to be provided by the user.
🧪 Checklist
``