OneOffTech · avvertix · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026
diff --git a/docs/introduction.md b/docs/introduction.md
@@ -44,6 +44,10 @@ Once installed, `parxy` provides the following commands:
 | `parxy docker` | Generate a Docker Compose configuration for self-hosted services |
 | `parxy pdf:merge` | Merge multiple PDF files into one, with support for selecting specific page ranges |
 | `parxy pdf:split` | Split a PDF file into individual pages |
+| `parxy pdf:outline` | Print or export a PDF's outline (bookmarks / table of contents) |
+| `parxy pdf:tags` | Extract the tag (structure) tree of a tagged, accessible PDF |
+| `parxy pdf:tags-check` | Check whether a PDF is a tagged (accessible) PDF |
+| `parxy pdf:xmp` | Read and extract the XMP metadata of a PDF |
 
 ```bash
 # Parse a PDF to markdown

diff --git a/docs/reference/cli.md b/docs/reference/cli.md
@@ -223,6 +223,28 @@ parxy pdf:merge [OPTIONS] INPUTS...
 |--------|-------|------|---------|-------------|
 | `--output` | `-o` | `text` | - | Output file path for the merged PDF. If not specified, you will be prompted. |
 
+## `parxy pdf:outline`
+
+Print or export the outline (bookmarks / table of contents) of a PDF
+
+```
+parxy pdf:outline [OPTIONS] INPUT_FILE
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INPUT_FILE` | Yes | PDF file to inspect |
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--output` | `-o` | `text` | - | Write the outline as JSON to this file instead of printing a tree. |
+| `--json` | - | `flag` | `false` | Print the outline as JSON to stdout. |
+| `--flat` | - | `flag` | `false` | Print a flat, indented list instead of a tree. |
+
 ## `parxy pdf:split`
 
 Split a PDF file into individual pages
@@ -245,6 +267,134 @@ parxy pdf:split [OPTIONS] INPUT_FILE
 | `--prefix` | `-p` | `text` | - | Prefix for output filenames. If not specified, uses the input filename. |
 | `--pages` | - | `text` | - | Page range to extract (1-based). Examples: "1" (single page), "1:3" (pages 1-3), ":3" (up to page 3), "3:" (from page 3). If not specified, all pages are extracted. |
 | `--combine` | - | `flag` | `false` | Combine extracted pages into a single PDF instead of one file per page. |
+| `--every` | `-e` | `integer` | - | Split into chunks of N pages each. Cannot be used with --combine. |
+
+## `parxy pdf:split-by-text`
+
+Split a PDF into chunks whenever a page matches a text condition
+
+```
+parxy pdf:split-by-text [OPTIONS] INPUT_FILE
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INPUT_FILE` | Yes | PDF file to split |
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--text` | `-t` | `text` | - | Text to match. Can be repeated for multiple patterns (OR logic). |
+| `--mode` | `-m` | `text` | `contains` | Matching mode: "contains" (default) or "starts-with". |
+| `--ignore-case` | `-i` | `flag` | `false` | Case-insensitive matching. |
+| `--regex` | - | `flag` | `false` | Treat --text values as regular expressions. |
+| `--discard-preamble` | - | `flag` | `false` | Discard pages that appear before the first matching page. |
+| `--output` | `-o` | `text` | - | Output directory for chunk files (default: {stem}_split next to input). |
+| `--prefix` | `-p` | `text` | - | Prefix for output filenames. Defaults to the input filename stem. |
+
+## `parxy pdf:tag-skeleton`
+
+Copy a tagged PDF keeping its tags but removing visible content
+
+```
+parxy pdf:tag-skeleton [OPTIONS] INPUT_FILE
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INPUT_FILE` | Yes | Tagged PDF file to strip |
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--output` | `-o` | `text` | - | Output path for the tags-only PDF (default: {stem}_tags.pdf next to input). |
+
+## `parxy pdf:tag-template`
+
+Create an empty tagged PDF skeleton for accessibility work
+
+```
+parxy pdf:tag-template [OPTIONS]
+```
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--output` | `-o` | `text` | - | Output file path for the template PDF. If not specified, you will be prompted. |
+| `--pages` | - | `integer` | `1` | Number of blank pages to create (default: 1). |
+| `--lang` | - | `text` | `en-US` | Document language tag set on the catalog (default: en-US). |
+| `--title` | - | `text` | - | Optional document title stored in the PDF metadata. |
+
+## `parxy pdf:tags`
+
+Extract the tag (structure) tree of a tagged PDF
+
+```
+parxy pdf:tags [OPTIONS] INPUT_FILE
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INPUT_FILE` | Yes | PDF file to inspect |
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--output` | `-o` | `text` | - | Write the extracted tags as JSON to this file instead of printing a tree. |
+| `--json` | - | `flag` | `false` | Print the extracted tags as JSON to stdout. |
+| `--text` | - | `flag` | `false` | Include the text content of each element. Rebuilds the tree per page; accessibility attributes (alt text, page refs) are not shown in this mode. |
+
+## `parxy pdf:tags-check`
+
+Check whether a PDF is a tagged (accessible) PDF
+
+```
+parxy pdf:tags-check [OPTIONS] INPUT_FILE
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INPUT_FILE` | Yes | PDF file to inspect |
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--json` | - | `flag` | `false` | Output the detection result as JSON. |
+
+## `parxy pdf:xmp`
+
+Read and extract the XMP metadata of a PDF
+
+```
+parxy pdf:xmp [OPTIONS] INPUT_FILE
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INPUT_FILE` | Yes | PDF file to inspect |
+
+**Options:**
+
+| Option | Short | Type | Default | Description |
+|--------|-------|------|---------|-------------|
+| `--output` | `-o` | `text` | - | Write the metadata to this file. A .xml extension writes the raw XMP packet; any other extension writes parsed JSON. |
+| `--json` | - | `flag` | `false` | Print the parsed metadata as JSON to stdout. |
+| `--raw` | - | `flag` | `false` | Print the raw XMP XML packet to stdout. |
 
 ## `parxy tui`
 

diff --git a/docs/tutorials/using_cli.md b/docs/tutorials/using_cli.md
@@ -14,6 +14,9 @@ The Parxy CLI lets you:
 | `parxy markdown` | Convert documents to Markdown files, with support for multiple drivers and folder processing                |
 | `parxy pdf:merge`| Merge multiple PDF files into one, with support for page ranges                                            |
 | `parxy pdf:split`| Split a PDF into individual pages, with optional page range and single-file extraction                      |
+| `parxy pdf:outline`| Print or export a PDF's outline (bookmarks / table of contents)                                          |
+| `parxy pdf:tags` | Inspect and extract the tag (structure) tree of a tagged, accessible PDF                                    |
+| `parxy pdf:xmp`  | Read and extract XMP metadata from a PDF                                                                    |
 | `parxy drivers`  | List available document processing drivers                                                                  |
 | `parxy env`      | Generate a default `.env` configuration file                                                                |
 | `parxy docker`   | Create a Docker Compose configuration for running Parxy-related services                                    |
@@ -303,6 +306,88 @@ Page range formats (1-based): `3` · `2:5` · `:5` · `3:`
 For more detailed examples and use cases, see the [Merge and split PDFs](../howto/merge_and_split_pdfs.md) guide.
 
 
+## Inspecting PDFs
+
+Beyond text extraction, Parxy can inspect a PDF's structure and metadata: its outline (bookmarks), its accessibility tag tree, and its XMP metadata. Each command prints a human-readable view by default and can emit JSON with `--json` (to stdout) or `--output` (to a file).
+
+### Outline (bookmarks)
+
+The `pdf:outline` command prints the table of contents as a tree:
+
+```bash
+parxy pdf:outline document.pdf
+```
+
+Use `--flat` for an indented list instead of a tree, or export the structure:
+
+```bash
+# Flat listing
+parxy pdf:outline document.pdf --flat
+
+# Export as JSON (flat entries + nested tree)
+parxy pdf:outline document.pdf -o outline.json
+```
+
+The command exits with code `2` when the PDF has no bookmarks, which is handy in scripts.
+
+### Tags (accessibility structure)
+
+A *tagged* PDF carries a logical structure tree (`/StructTreeRoot`) that makes it accessible. Start by checking whether a PDF is tagged:
+
+```bash
+parxy pdf:tags-check document.pdf
+```
+
+This reports whether the content is marked, whether a structure tree is present, the document language, and the number of structure elements. It exits with `0` for a tagged PDF and `2` otherwise.
+
+Extract the tag tree itself with `pdf:tags`:
+
+```bash
+# Print the structure tree (with page references and alt text)
+parxy pdf:tags document.pdf
+
+# Include the visible text of each element (rebuilt per page)
+parxy pdf:tags document.pdf --text
+
+# Export the full nested structure as JSON
+parxy pdf:tags document.pdf -o tags.json
+```
+
+The default view walks the document-wide structure tree and shows accessibility attributes (alt text, titles, page references) but not body text, which lives in the page content streams. The `--text` view reconstructs the structure per page including each element's visible text, but without the accessibility attributes.
+
+Two companion commands help with accessibility work:
+
+```bash
+# Copy a tagged PDF keeping its tags but removing visible content
+parxy pdf:tag-skeleton document.pdf -o tags-only.pdf
+
+# Create an empty tagged PDF skeleton from scratch
+parxy pdf:tag-template -o template.pdf --pages 3 --lang en-US
+```
+
+### XMP metadata
+
+The `pdf:xmp` command reads the XMP metadata packet (an RDF/XML block holding properties such as `dc:title`, `dc:creator`, and `pdf:Producer`) and prints the parsed properties alongside the classic `/Info` dictionary:
+
+```bash
+parxy pdf:xmp document.pdf
+```
+
+You can view the original packet or export the metadata:
+
+```bash
+# Print the raw XMP XML packet
+parxy pdf:xmp document.pdf --raw
+
+# Export parsed metadata as JSON
+parxy pdf:xmp document.pdf --json
+
+# Save the raw XMP packet (a .xml path writes the raw packet,
+# any other extension writes parsed JSON)
+parxy pdf:xmp document.pdf -o metadata.xml
+```
+
+
 ## Managing Drivers
 
 To view the list of supported document parsing drivers:
@@ -368,6 +453,9 @@ With the CLI, you can use Parxy as a **standalone document parsing tool** — id
 | `parxy markdown` | Generate Markdown files; accepts JSON results and supports `--page-separators` |
 | `parxy pdf:merge`| Merge multiple PDF files with page range support             |
 | `parxy pdf:split`| Split PDF into individual pages; supports `--pages` and `--combine` |
+| `parxy pdf:outline`| Print or export a PDF's outline (bookmarks)               |
+| `parxy pdf:tags` | Inspect and extract a tagged PDF's structure tree; supports `--text` |
+| `parxy pdf:xmp`  | Read and extract XMP metadata; supports `--raw` and JSON export |
 | `parxy drivers`  | List supported drivers                                       |
 | `parxy env`      | Create default configuration file                            |
 | `parxy docker`   | Generate Docker Compose setup                                |