diff --git a/README.md b/README.md
index 92ccce1d..fb4f1657 100644
--- a/README.md
+++ b/README.md
@@ -18,12 +18,13 @@
## Features
- **Text** — Multi-turn chat, streaming, system prompts, JSON output
-- **Image** — Text-to-image with aspect ratio and batch controls
+- **Image** — Text-to-image with aspect ratio, batch controls, and fine-grained sizing
- **Video** — Async video generation with progress tracking
-- **Speech** — TTS with 30+ voices, speed control, streaming playback
+- **Speech** — TTS with 30+ voices, speed control, streaming playback, SRT subtitles
- **Music** — Text-to-music with lyrics, instrumental mode, auto lyrics, and cover generation from reference audio
- **Vision** — Image understanding and description
- **Search** — Web search powered by MiniMax
+- **File** — Upload, list, and delete files from MiniMax storage
- **Dual Region** — Seamless Global (`api.minimax.io`) and CN (`api.minimaxi.com`) support
@@ -60,12 +61,16 @@ mmx music generate --prompt "Upbeat pop" --lyrics "[verse] La da dee, sunny day"
mmx search "MiniMax AI latest news"
mmx vision photo.jpg
mmx quota
+
+# File management
+mmx file upload --file doc.pdf
+mmx file list
+mmx file delete --file-id 123456789
```
## Commands
### `mmx text`
-
```bash
mmx text chat --message "Write a poem"
mmx text chat --model MiniMax-M2.7-highspeed --message "Hello" --stream
@@ -80,10 +85,13 @@ cat messages.json | mmx text chat --messages-file - --output json
mmx image "A cat in a spacesuit"
mmx image generate --prompt "A cat" --n 3 --aspect-ratio 16:9
mmx image generate --prompt "Logo" --out-dir ./out/
+# Fine-grained sizing
+mmx image generate --prompt "A photo" --width 1024 --height 768
+# With prompt optimizer
+mmx image generate --prompt "cat spacesuit" --prompt-optimizer
```
### `mmx video`
-
```bash
mmx video generate --prompt "Ocean waves at sunset" --download sunset.mp4
mmx video generate --prompt "A robot painting" --async
@@ -98,6 +106,9 @@ mmx speech synthesize --text "Hello!" --out hello.mp3
mmx speech synthesize --text "Stream me" --stream | mpv -
mmx speech synthesize --text "Hi" --voice English_magnetic_voiced_man --speed 1.2
echo "Breaking news" | mmx speech synthesize --text-file - --out news.mp3
+# With SRT subtitles (if voice model supports it)
+mmx speech synthesize --text "Hello world" --subtitles --out hello.mp3
+# saves hello.mp3 + hello.srt
mmx speech voices
```
@@ -116,7 +127,6 @@ mmx music cover --prompt "Indie folk" --audio https://example.com/song.mp3 --out
```
### `mmx vision`
-
```bash
mmx vision photo.jpg
mmx vision describe --image https://example.com/img.jpg --prompt "What breed?"
@@ -130,6 +140,21 @@ mmx search "MiniMax AI"
mmx search query --q "latest news" --output json
```
+### `mmx file`
+
+```bash
+# Upload a file (returns file ID for use with vision, speech, etc.)
+mmx file upload --file doc.pdf
+mmx file upload --file image.png --purpose vision
+
+# List all uploaded files
+mmx file list
+mmx file list --output json
+
+# Delete a file by ID
+mmx file delete --file-id 123456789
+```
+
### `mmx auth`
```bash
@@ -153,6 +178,7 @@ is auto-detected by probing both Global and CN.
```bash
mmx quota
+mmx quotas # show detailed quota breakdown per modality
mmx config show
mmx config set --key region --value cn
mmx config set --key default-text-model --value MiniMax-M2.7-highspeed
@@ -162,8 +188,8 @@ mmx config export-schema | jq .
### `mmx update`
```bash
-mmx update
-mmx update latest
+mmx update # update to latest stable
+mmx update latest # update to latest pre-release
```
## Thanks to
diff --git a/skill/SKILL.md b/skill/SKILL.md
index 79b7a60b..c6abe417 100644
--- a/skill/SKILL.md
+++ b/skill/SKILL.md
@@ -5,28 +5,19 @@ description: Use mmx to generate text, images, video, speech, and music via the
# MiniMax CLI — Agent Skill Guide
-Use `mmx` to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform.
+**Repo:** https://github.com/MiniMax-AI/cli
+**NPM:** https://www.npmjs.com/package/mmx-cli
+**Requires:** Node.js 18+, MiniMax Token Plan (Global or CN)
-## Prerequisites
+Use `mmx` to generate text, images, video, speech, music, web search, and file storage — via the MiniMax AI platform.
+
+## Quick Install
```bash
-# Install
npm install -g mmx-cli
-
-# Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json)
-mmx auth login --api-key sk-xxxxx
-
-# Verify active auth source
-mmx auth status
-
-# Or pass per-call
-mmx text chat --api-key sk-xxxxx --message "Hello"
+npx skills add MiniMax-AI/cli -y -g # add as OpenClaw skill
```
-Region is auto-detected. Override with `--region global` or `--region cn`.
-
----
-
## Agent Flags
Always use these flags in non-interactive (agent/CI) contexts:
@@ -42,6 +33,34 @@ Always use these flags in non-interactive (agent/CI) contexts:
---
+## Authentication
+
+```bash
+# Interactive — choose OAuth or paste an API key
+mmx auth login
+
+# Non-interactive — paste API key directly
+mmx auth login --api-key ***
+
+# Skip the menu — auto-select OAuth for the given region
+mmx auth login --recommend --region=global # → api.minimax.io
+mmx auth login --recommend --region=cn # → api.minimaxi.com
+
+# Verify current auth status
+mmx auth status
+mmx auth refresh
+mmx auth logout
+```
+
+**Auth notes:**
+- Credentials are stored in `~/.mmx/config.json` — separate from OpenClaw's own MiniMax OAuth config
+- OAuth and API-key are mutually exclusive; logging in with one clears the other
+- `--api-key` flag can be passed per-command to override stored auth
+- With an API key, region is auto-detected by probing both Global and CN endpoints
+- `mmx auth status` is the canonical way to verify active authentication
+
+---
+
## Commands
### text chat
@@ -54,12 +73,12 @@ mmx text chat --message [flags]
| Flag | Type | Description |
|---|---|---|
-| `--message ` | string, **required**, repeatable | Message text. Prefix with `role:` to set role (e.g. `"system:You are helpful"`, `"user:Hello"`) |
+| `--message ` | string, **required**, repeatable | Message text. Prefix with `role:` to set role (`"system:"`, `"user:"`, `"assistant:"`) |
| `--messages-file ` | string | JSON file with messages array. Use `-` for stdin |
| `--system ` | string | System prompt |
| `--model ` | string | Model ID (default: `MiniMax-M2.7`) |
| `--max-tokens ` | number | Max tokens (default: 4096) |
-| `--temperature ` | number | Sampling temperature (0.0, 1.0] |
+| `--temperature ` | number | Sampling temperature (0.0–1.0] |
| `--top-p ` | number | Nucleus sampling threshold |
| `--stream` | boolean | Stream tokens (default: on in TTY) |
| `--tool ` | string, repeatable | Tool definition JSON or file path |
@@ -78,7 +97,7 @@ mmx text chat \
cat conversation.json | mmx text chat --messages-file - --output json
```
-**stdout**: response text (text mode) or full response object (json mode).
+**stdout:** response text (text mode) or full response object (json mode).
---
@@ -90,6 +109,8 @@ Generate images. Model: `image-01`.
mmx image generate --prompt [flags]
```
+Single-token shorthand: `mmx image "prompt here"`
+
| Flag | Type | Description |
|---|---|---|
| `--prompt ` | string, **required** | Image description |
@@ -111,13 +132,19 @@ mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)
+
+# Fine-grained sizing
+mmx image generate --prompt "A photo" --width 1024 --height 768
+
+# With prompt optimizer
+mmx image generate --prompt "cat spacesuit" --prompt-optimizer
```
---
### video generate
-Generate video. Default model: `MiniMax-Hailuo-2.3`. This is an async task — by default it polls until completion.
+Generate video. Default model: `MiniMax-Hailuo-2.3`. Async task — polls until completion by default.
```bash
mmx video generate --prompt [flags]
@@ -135,13 +162,12 @@ mmx video generate --prompt [flags]
| `--poll-interval ` | number | Polling interval (default: 5) |
```bash
-# Non-blocking: get task ID
+# Non-blocking: get task ID immediately
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}
-# Blocking: wait and get file path
+# Blocking: wait for completion, save to file
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
-# stdout: ocean.mp4
```
### video task get
@@ -164,12 +190,14 @@ mmx video download --file-id [--out ]
### speech synthesize
-Text-to-speech. Default model: `speech-2.8-hd`. Max 10k chars.
+Text-to-speech. Default model: `speech-2.8-hd`. Max 10k chars. 30+ voices available.
```bash
mmx speech synthesize --text [flags]
```
+Single-token shorthand: `mmx speech "Hello!"`
+
| Flag | Type | Description |
|---|---|---|
| `--text ` | string | Text to synthesize |
@@ -184,19 +212,27 @@ mmx speech synthesize --text [flags]
| `--bitrate ` | number | Bitrate (default: 128000) |
| `--channels ` | number | Audio channels (default: 1) |
| `--language ` | string | Language boost |
-| `--subtitles` | boolean | Download and save subtitles as `.srt` file (alongside `--out` audio file). API must support subtitles for the selected model.
+| `--subtitles` | boolean | Include subtitle timing data |
| `--pronunciation ` | string, repeatable | Custom pronunciation |
| `--sound-effect ` | string | Add sound effect |
| `--out ` | string | Save audio to file |
| `--stream` | boolean | Stream raw audio to stdout |
```bash
+# Save to file
mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
-# stdout: hello.mp3
-mmx speech synthesize --text "Hello" --subtitles --out hello.mp3
-# saves hello.mp3 + hello.srt (SRT subtitle file)
+# With SRT subtitles (if voice model supports it)
+mmx speech synthesize --text "Hello world" --subtitles --out hello.mp3
+# saves hello.mp3 + hello.srt
+
+# List all available voices
+mmx speech voices
+# Stream to audio player (pipe raw audio to mpv)
+mmx speech synthesize --text "Stream me" --stream | mpv -
+
+# From stdin
echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3
```
@@ -204,30 +240,30 @@ echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3
### music generate
-Generate music. Responds well to rich, structured descriptions.
-
-**Model:** `music-2.6-free` — unlimited for API key users, RPM = 3.
+Generate music. Model: `music-2.6-free` — unlimited for API key users, RPM = 3. Responds well to rich, structured descriptions.
```bash
mmx music generate --prompt [--lyrics ] [flags]
```
+Single-token shorthand: `mmx music "Upbeat pop"`
+
| Flag | Type | Description |
|---|---|---|
| `--prompt ` | string | Music style description (can be detailed) |
-| `--lyrics ` | string | Song lyrics with structure tags. Required unless `--instrumental` or `--lyrics-optimizer` is used. |
+| `--lyrics ` | string | Song lyrics with structure tags. Required unless `--instrumental` or `--lyrics-optimizer` is used |
| `--lyrics-file ` | string | Read lyrics from file. Use `-` for stdin |
-| `--lyrics-optimizer` | boolean | Auto-generate lyrics from prompt. Cannot be used with `--lyrics` or `--instrumental`. |
-| `--instrumental` | boolean | Generate instrumental music (no vocals). Cannot be used with `--lyrics`. |
+| `--lyrics-optimizer` | boolean | Auto-generate lyrics from prompt. Cannot be used with `--lyrics` or `--instrumental` |
+| `--instrumental` | boolean | Generate instrumental music (no vocals) |
| `--vocals ` | string | Vocal style, e.g. `"warm male baritone"`, `"bright female soprano"`, `"duet with harmonies"` |
| `--genre ` | string | Music genre, e.g. folk, pop, jazz |
| `--mood ` | string | Mood or emotion, e.g. warm, melancholic, uplifting |
| `--instruments ` | string | Instruments to feature, e.g. `"acoustic guitar, piano"` |
| `--tempo ` | string | Tempo description, e.g. fast, slow, moderate |
| `--bpm ` | number | Exact tempo in beats per minute |
-| `--key ` | string | Musical key, e.g. C major, A minor, G sharp |
+| `--key ` | string | Musical key, e.g. C major, A minor |
| `--avoid ` | string | Elements to avoid in the generated music |
-| `--use-case ` | string | Use case context, e.g. `"background music for video"`, `"theme song"` |
+| `--use-case ` | string | Use case context, e.g. `"background music for video"` |
| `--structure ` | string | Song structure, e.g. `"verse-chorus-verse-bridge-chorus"` |
| `--references ` | string | Reference tracks or artists, e.g. `"similar to Ed Sheeran"` |
| `--extra ` | string | Additional fine-grained requirements |
@@ -238,10 +274,8 @@ mmx music generate --prompt [--lyrics ] [flags]
| `--out ` | string | Save audio to file |
| `--stream` | boolean | Stream raw audio to stdout |
-At least one of `--prompt` or `--lyrics` is required.
-
```bash
-# With lyrics
+# With explicit lyrics
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet
# Auto-generate lyrics from prompt
@@ -250,7 +284,7 @@ mmx music generate --prompt "Upbeat pop about summer" --lyrics-optimizer --out s
# Instrumental
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3 --quiet
-# Detailed prompt with vocal characteristics
+# Detailed with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
--vocals "male and female duet, harmonies in chorus" \
--instruments "acoustic guitar, piano" \
@@ -263,9 +297,7 @@ mmx music generate --prompt "Warm morning folk" \
### music cover
-Generate a cover version of a song based on reference audio.
-
-**Model:** `music-cover-free` — unlimited for API key users, RPM = 3.
+Generate a cover version of a song based on reference audio. Model: `music-cover-free` — unlimited for API key users, RPM = 3.
```bash
mmx music cover --prompt (--audio | --audio-file ) [flags]
@@ -274,10 +306,10 @@ mmx music cover --prompt (--audio | --audio-file ) [flags]
| Flag | Type | Description |
|---|---|---|
| `--prompt ` | string, **required** | Target cover style, e.g. `"Indie folk, acoustic guitar, warm male vocal"` |
-| `--audio ` | string | URL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB) |
+| `--audio ` | string | URL of reference audio (mp3, wav, flac — 6s to 6min, max 50MB) |
| `--audio-file ` | string | Local reference audio file (auto base64-encoded) |
-| `--lyrics ` | string | Cover lyrics. If omitted, extracted from reference audio via ASR. |
-| `--lyrics-file ` | string | Read lyrics from file. Use `-` for stdin |
+| `--lyrics ` | string | Cover lyrics. If omitted, extracted from reference audio via ASR |
+| `--lyrics-file ` | string | Read lyrics from file |
| `--seed ` | number | Random seed 0–1000000 for reproducible results |
| `--format ` | string | Audio format: `mp3`, `wav`, `pcm` (default: `mp3`) |
| `--sample-rate ` | number | Sample rate (default: 44100) |
@@ -289,14 +321,16 @@ mmx music cover --prompt (--audio | --audio-file ) [flags]
```bash
# Cover from URL
mmx music cover --prompt "Indie folk, acoustic guitar, warm male vocal" \
- --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --out cover.mp3 --quiet
+ --audio https://filecdn.minimax.chat/public/example.mp3 --out cover.mp3 --quiet
# Cover from local file with custom lyrics
mmx music cover --prompt "Jazz, piano, slow" \
--audio-file original.mp3 --lyrics-file lyrics.txt --out jazz_cover.mp3 --quiet
# Reproducible result with seed
-mmx music cover --prompt "Pop, upbeat" --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --seed 42 --out cover.mp3
+mmx music cover --prompt "Pop, upbeat" \
+ --audio https://filecdn.minimax.chat/public/example.mp3 \
+ --seed 42 --out cover.mp3
```
---
@@ -309,6 +343,8 @@ Image understanding via VLM. Provide either `--image` or `--file-id`, not both.
mmx vision describe (--image | --file-id ) [flags]
```
+Single-token shorthand: `mmx vision photo.jpg`
+
| Flag | Type | Description |
|---|---|---|
| `--image ` | string | Local path or URL (auto base64-encoded) |
@@ -319,8 +355,6 @@ mmx vision describe (--image | --file-id ) [flags]
mmx vision describe --image photo.jpg --prompt "What breed?" --output json
```
-**stdout**: description text (text mode) or full response (json mode).
-
---
### search query
@@ -331,6 +365,8 @@ Web search via MiniMax.
mmx search query --q
```
+Single-token shorthand: `mmx search "query here"`
+
| Flag | Type | Description |
|---|---|---|
| `--q ` | string, **required** | Search query |
@@ -341,6 +377,58 @@ mmx search query --q "MiniMax AI" --output json --quiet
---
+### file upload
+
+Upload a file to MiniMax storage. Use the returned `file_id` with vision, speech, or other file-dependent commands.
+
+```bash
+mmx file upload --file [--purpose ] [flags]
+```
+
+| Flag | Type | Description |
+|---|---|---|
+| `--file ` | string, **required** | Local path to the file |
+| `--purpose ` | string | File purpose: `retrieval` (default) or `vision` |
+
+```bash
+mmx file upload --file doc.pdf
+mmx file upload --file image.png --purpose vision
+# stdout: { file_id, filename, purpose, bytes, created_at }
+```
+
+### file list
+
+List all uploaded files in MiniMax storage.
+
+```bash
+mmx file list [--output json]
+```
+
+```bash
+# Formatted table (default)
+mmx file list
+# ID FILENAME PURPOSE SIZE_KB CREATED
+# 123456789 doc.pdf retrieval 2048.0 2026-05-27 10:30
+
+# JSON output
+mmx file list --output json
+```
+
+### file delete
+
+Delete an uploaded file by ID.
+
+```bash
+mmx file delete --file-id
+```
+
+```bash
+mmx file delete --file-id 123456789
+# stdout: deleted
+```
+
+---
+
### quota show
Display Token Plan usage and remaining quotas.
@@ -351,6 +439,20 @@ mmx quota show [--output json]
---
+## Single-Token Shorthand Commands
+
+MiniMax CLI supports quick single-token shortcuts for common operations:
+
+```bash
+mmx image "A cat in a spacesuit" # image generate
+mmx speech "Hello!" --out hello.mp3 # speech synthesize
+mmx music "Upbeat pop" --out song.mp3 # music generate
+mmx vision photo.jpg # vision describe
+mmx search "MiniMax AI latest news" # search query
+```
+
+---
+
## Tool Schema Export
Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:
@@ -363,8 +465,6 @@ mmx config export-schema
mmx config export-schema --command "video generate"
```
-Use this to dynamically register mmx commands as tools in your agent framework.
-
---
## Exit Codes
@@ -381,47 +481,23 @@ Use this to dynamically register mmx commands as tools in your agent framework.
---
-## Piping Patterns
+## Configuration
```bash
-# stdout is always clean data — safe to pipe
-mmx text chat --message "Hi" --output json | jq '.content'
-
-# stderr has progress/spinners — discard if needed
-mmx video generate --prompt "Waves" 2>/dev/null
-
-# Chain: generate image → describe it
-URL=$(mmx image generate --prompt "A sunset" --quiet)
-mmx vision describe --image "$URL" --quiet
-
-# Async video workflow
-TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
-mmx video task get --task-id "$TASK" --output json
-mmx video download --task-id "$TASK" --out robot.mp4
+mmx config show # show current config
+mmx config set --key region --value cn # set platform region
+mmx config set --key default-text-model --value MiniMax-M2.7-highspeed # set default model
+mmx update # update CLI to latest
+mmx quota # show token plan usage
```
----
-
-## Configuration Precedence
-
-CLI flags → environment variables → `~/.mmx/config.json` → defaults.
-
-```bash
-# Persistent config
-mmx config set --key region --value cn
-mmx config show
-
-# Environment
-export MINIMAX_API_KEY=sk-xxxxx
-export MINIMAX_REGION=cn
-```
+**Config precedence:** CLI flags → environment variables → `~/.mmx/config.json` → defaults.
### Default Model Configuration
Set per-modality defaults so you don't need `--model` every time:
```bash
-# Set defaults
mmx config set --key default-text-model --value MiniMax-M2.7-highspeed
mmx config set --key default-speech-model --value speech-2.8-hd
mmx config set --key default-video-model --value MiniMax-Hailuo-2.3
@@ -437,4 +513,25 @@ mmx music generate --prompt "Upbeat pop" --instrumental
mmx text chat --model MiniMax-M2.7 --message "Hello"
```
-**Resolution priority**: `--model` flag > config default > hardcoded fallback.
+**Resolution priority:** `--model` flag > config default > hardcoded fallback.
+
+---
+
+## Piping Patterns
+
+```bash
+# stdout is always clean data — safe to pipe
+mmx text chat --message "Hi" --output json | jq '.content'
+
+# stderr has progress/spinners — discard if needed
+mmx video generate --prompt "Waves" 2>/dev/null
+
+# Chain: generate image → describe it
+URL=$(mmx image generate --prompt "A sunset" --quiet)
+mmx vision describe --image "$URL" --quiet
+
+# Async video workflow
+TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
+mmx video task get --task-id "$TASK" --output json
+mmx video download --task-id "$TASK" --out robot.mp4
+```