@@ -92,12 +92,37 @@ Represents a page in the document:
9292
9393This node represent a paragraph, a heading or any text within the document.
9494
95- - ` category ` : The type ` "doc" ` .
95+ - ` category ` : The classification of the text within the document .
9696- ` content ` : A string representing the textual content.
9797- ` marks ` : List of [ marks] ( #marks ) applied to the text, such as bold, italic, etc.
9898- ` attributes ` : Can contain metadata like the bounding box representing where this portion of text is located in the page.
9999
100-
100+ ### Category
101+ Below are the various categories of text that may be found within a document:
102+
103+ ** Category Type**
104+ - ` page-header ` : Represents the header of the page.
105+ - ` footer ` : Represents the footer of the page.
106+ - ` heading ` : Any heading within the document.
107+ - ` figure ` : Represents a figure or an image.
108+ - ` other ` : Any other unclassified text.
109+ - ` appendix ` : Text within an appendix.
110+ - ` keywords ` : List of keywords.
111+ - ` acknowledgments ` : Section acknowledging contributors.
112+ - ` caption ` : Caption associated with a figure or table.
113+ - ` toc ` : Table of contents.
114+ - ` abstract ` : The abstract of the document.
115+ - ` footnote ` : Text at the bottom of the page providing additional information.
116+ - ` body ` : Main body text of the document.
117+ - ` itemize-item ` : Item in a list or bullet point.
118+ - ` title ` : The title of the document.
119+ - ` reference ` : References or citations within the document.
120+ - ` affiliation ` : Author's institutional affiliation.
121+ - ` general-terms ` : General terms section.
122+ - ` formula ` : Mathematical formula or equation.
123+ - ` categories ` : Categories or topics listed in the document.
124+ - ` table ` : Represents a table.
125+ - ` authors ` : List of authors.
101126
102127### Marks
103128
@@ -119,8 +144,9 @@ Attributes are optional fields that can store additional information for each no
119144
120145- ` DocumentAttributes ` : General attributes for the document (currently reserved for the future).
121146- ` PageAttributes ` : Specific page related attributes, such as the page number.
122- - ` TextAttributes ` : Text related attributes, such as bounding boxes.
147+ - ` TextAttributes ` : Text related attributes, such as bounding boxes or level .
123148- ` BoundingBox ` : A box that specifies the position of a text in the page.
149+ - ` Level ` : The specific level of the text within a document, for example, for headings.
124150
125151
126152## Getting started
0 commit comments