Add gpt-oss-20b to llm-chatbot notebooks (#3113)

sbalandi · aleksandr-mokrov · web-flow · commit bcd1edf27a7d · 2025-11-13T17:26:59.000+01:00
Co-authored-by: Aleksandr Mokrov &lt;aleksandr.mokrov@intel.com&gt;
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -349,6 +349,7 @@ Golang
 googlenet
 GOPRO
 GPT
+gpt
 gpu
 GPU's
 GPUs
@@ -717,6 +718,7 @@ optimizable
 Orca
 otsl
 OSNet
+oss
 OTSL
 OuteTTS
 outpainting
diff --git a/notebooks/llm-chatbot/README.md b/notebooks/llm-chatbot/README.md
@@ -81,6 +81,8 @@ For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2
 * **GLM-Z1-32B-0414** - GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. You can find more info in [model card](https://huggingface.co/THUDM/GLM-Z1-9B-0414).
 * **Qwen3-1.7/4B/8B/14B** - Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5. You can find more info in [model card](https://huggingface.co/Qwen/Qwen3-8B).
 * **AFM-4.5B** - AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. You can find more info in [model card](https://huggingface.co/arcee-ai/AFM-4.5B).
+* **gpt-oss-20b** - gpt-oss-20b is a 20 billion parameter open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. You can find more info in [model card](https://huggingface.co/openai/gpt-oss-20b).
+>**Note**: gpt-oss-20b model is not supported with OpenVINO GPU plugin.
 
 The image below illustrates the provided user instruction and model answer examples.
 
diff --git a/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb b/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb
@@ -87,7 +87,7 @@
     "\"datasets<4.0.0\" \\\n",
     "\"accelerate\" \\\n",
     "\"gradio>=4.19\" \\\n",
-    "\"transformers==4.53.3\" \\\n",
+    "\"transformers==4.55.4\" \\\n",
     "\"huggingface-hub===0.35.3\" \\\n",
     "\"einops\" \"transformers_stream_generator\" \"tiktoken\" \"bitsandbytes\"\n",
     "\n",
@@ -446,6 +446,8 @@
     "        * scale_estimation: **True**\n",
     "        * dataset: **wikitext2**\n",
     "* **AFM-4.5B** - AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. You can find more info in [model card](https://huggingface.co/arcee-ai/AFM-4.5B).\n",
+    "* **gpt-oss-20b** - gpt-oss-20b is a 20 billion parameter open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. You can find more info in [model card](https://huggingface.co/openai/gpt-oss-20b).\n",
+    ">**Note**: gpt-oss-20b model is not supported with OpenVINO GPU plugin.\n",
     "</details>"
    ]
   },
diff --git a/notebooks/llm-chatbot/llm-chatbot.ipynb b/notebooks/llm-chatbot/llm-chatbot.ipynb
@@ -88,7 +88,7 @@
     "\"accelerate\" \\\n",
     "\"gradio>=4.19\" \\\n",
     "\"huggingface-hub==0.35.3\" \\\n",
-    " \"einops\" \"transformers==4.53.3\" \"transformers_stream_generator\" \"tiktoken\" \"bitsandbytes\"\n",
+    " \"einops\" \"transformers==4.55.4\" \"transformers_stream_generator\" \"tiktoken\" \"bitsandbytes\"\n",
     "\n",
     "if platform.system() == \"Darwin\":\n",
     "    %pip install -q \"numpy<2.0.0\""
@@ -341,6 +341,8 @@
     "        * scale_estimation: **True**\n",
     "        * dataset: **wikitext2**\n",
     "* **AFM-4.5B** - AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. You can find more info in [model card](https://huggingface.co/arcee-ai/AFM-4.5B).\n",
+    "* **gpt-oss-20b** - gpt-oss-20b is a 20 billion parameter open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. You can find more info in [model card](https://huggingface.co/openai/gpt-oss-20b).\n",
+    ">**Note**: gpt-oss-20b model is not supported with OpenVINO GPU plugin.\n",
     "  </detals>\n"
    ]
   },
diff --git a/utils/llm_config.py b/utils/llm_config.py
@@ -499,6 +499,12 @@ def qwen_completion_to_prompt(completion):
             "remote_code": False,
             "start_message": DEFAULT_SYSTEM_PROMPT,
         },
+        "gpt-oss-20b": {
+            "model_id": "openai/gpt-oss-20b",
+            "remote_code": False,
+            "start_message": DEFAULT_SYSTEM_PROMPT + " You should not show your reasoning steps. Reasoning: low.",
+            "exclude_on_devices": ["GPU"],
+        },
     },
     "Chinese": {
         "minicpm4-8b": {"model_id": "openbmb/MiniCPM4-8B", "remote_code": True, "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE},
@@ -857,15 +863,18 @@ def get_llm_selection_widget(languages=list(SUPPORTED_LLM_MODELS), models=SUPPOR
 
     lang_dropdown = widgets.Dropdown(options=languages or [])
 
-    # Define dependent drop down
+    filter_models_by_device = lambda model_info: device not in model_info[1].get("exclude_on_devices", [])
 
-    model_dropdown = widgets.Dropdown(options=models)
+    # Define dependent drop down
+    supported_models = dict(filter(filter_models_by_device, models.items()))
+    model_dropdown = widgets.Dropdown(options=supported_models)
 
     def dropdown_handler(change):
         global default_language
         default_language = change.new
         # If statement checking on dropdown value and changing options of the dependent dropdown accordingly
-        model_dropdown.options = SUPPORTED_LLM_MODELS[change.new]
+        supported_models = SUPPORTED_LLM_MODELS[change.new]
+        model_dropdown.options = dict(filter(filter_models_by_device, supported_models.items()))
 
     lang_dropdown.observe(dropdown_handler, names="value")
     compression_dropdown = widgets.Dropdown(options=SUPPORTED_OPTIMIZATIONS if device != "NPU" else ["INT4-NPU", "FP16"])