You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
823
823
824
824
<variables>
825
-
DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
826
-
DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
827
-
CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
825
+
<document-ocr-data>: OCR-extracted text content from the document page that provides textual information for classification
826
+
<document-image>: Visual representation of the document page that provides layout, formatting, and visual structure information
827
+
<document-types>: List of valid document types with their descriptions that the document must be classified into
828
828
</variables>
829
829
task_prompt: >-
830
830
<reasoning-guidelines>
@@ -836,6 +836,10 @@ classification:
836
836
- Provide specific evidence from both visual and textual analysis
837
837
</reasoning-guidelines>
838
838
839
+
<document-types>
840
+
{CLASS_NAMES_AND_DESCRIPTIONS}
841
+
</document-types>
842
+
839
843
<output-format>
840
844
Return your classification as valid JSON following this exact structure:
0 commit comments