From 13aacc4f8abf0f920e287f9da8ef99fcde48f1c8 Mon Sep 17 00:00:00 2001 From: DuckBot Date: Wed, 27 May 2026 22:50:16 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20rebuild=20README=20and=20skill=20from?= =?UTF-8?q?=20source=20=E2=80=94=20add=20file=20commands,=20subtitles,=20f?= =?UTF-8?q?ine-grained=20sizing,=20prompt-optimizer?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 40 ++++++-- skill/SKILL.md | 265 +++++++++++++++++++++++++++++++++---------------- 2 files changed, 214 insertions(+), 91 deletions(-) diff --git a/README.md b/README.md index 92ccce1d..fb4f1657 100644 --- a/README.md +++ b/README.md @@ -18,12 +18,13 @@ ## Features - **Text** — Multi-turn chat, streaming, system prompts, JSON output -- **Image** — Text-to-image with aspect ratio and batch controls +- **Image** — Text-to-image with aspect ratio, batch controls, and fine-grained sizing - **Video** — Async video generation with progress tracking -- **Speech** — TTS with 30+ voices, speed control, streaming playback +- **Speech** — TTS with 30+ voices, speed control, streaming playback, SRT subtitles - **Music** — Text-to-music with lyrics, instrumental mode, auto lyrics, and cover generation from reference audio - **Vision** — Image understanding and description - **Search** — Web search powered by MiniMax +- **File** — Upload, list, and delete files from MiniMax storage - **Dual Region** — Seamless Global (`api.minimax.io`) and CN (`api.minimaxi.com`) support MiniMax @@ -60,12 +61,16 @@ mmx music generate --prompt "Upbeat pop" --lyrics "[verse] La da dee, sunny day" mmx search "MiniMax AI latest news" mmx vision photo.jpg mmx quota + +# File management +mmx file upload --file doc.pdf +mmx file list +mmx file delete --file-id 123456789 ``` ## Commands ### `mmx text` - ```bash mmx text chat --message "Write a poem" mmx text chat --model MiniMax-M2.7-highspeed --message "Hello" --stream @@ -80,10 +85,13 @@ cat messages.json | mmx text chat --messages-file - --output json mmx image "A cat in a spacesuit" mmx image generate --prompt "A cat" --n 3 --aspect-ratio 16:9 mmx image generate --prompt "Logo" --out-dir ./out/ +# Fine-grained sizing +mmx image generate --prompt "A photo" --width 1024 --height 768 +# With prompt optimizer +mmx image generate --prompt "cat spacesuit" --prompt-optimizer ``` ### `mmx video` - ```bash mmx video generate --prompt "Ocean waves at sunset" --download sunset.mp4 mmx video generate --prompt "A robot painting" --async @@ -98,6 +106,9 @@ mmx speech synthesize --text "Hello!" --out hello.mp3 mmx speech synthesize --text "Stream me" --stream | mpv - mmx speech synthesize --text "Hi" --voice English_magnetic_voiced_man --speed 1.2 echo "Breaking news" | mmx speech synthesize --text-file - --out news.mp3 +# With SRT subtitles (if voice model supports it) +mmx speech synthesize --text "Hello world" --subtitles --out hello.mp3 +# saves hello.mp3 + hello.srt mmx speech voices ``` @@ -116,7 +127,6 @@ mmx music cover --prompt "Indie folk" --audio https://example.com/song.mp3 --out ``` ### `mmx vision` - ```bash mmx vision photo.jpg mmx vision describe --image https://example.com/img.jpg --prompt "What breed?" @@ -130,6 +140,21 @@ mmx search "MiniMax AI" mmx search query --q "latest news" --output json ``` +### `mmx file` + +```bash +# Upload a file (returns file ID for use with vision, speech, etc.) +mmx file upload --file doc.pdf +mmx file upload --file image.png --purpose vision + +# List all uploaded files +mmx file list +mmx file list --output json + +# Delete a file by ID +mmx file delete --file-id 123456789 +``` + ### `mmx auth` ```bash @@ -153,6 +178,7 @@ is auto-detected by probing both Global and CN. ```bash mmx quota +mmx quotas # show detailed quota breakdown per modality mmx config show mmx config set --key region --value cn mmx config set --key default-text-model --value MiniMax-M2.7-highspeed @@ -162,8 +188,8 @@ mmx config export-schema | jq . ### `mmx update` ```bash -mmx update -mmx update latest +mmx update # update to latest stable +mmx update latest # update to latest pre-release ``` ## Thanks to diff --git a/skill/SKILL.md b/skill/SKILL.md index 79b7a60b..c6abe417 100644 --- a/skill/SKILL.md +++ b/skill/SKILL.md @@ -5,28 +5,19 @@ description: Use mmx to generate text, images, video, speech, and music via the # MiniMax CLI — Agent Skill Guide -Use `mmx` to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform. +**Repo:** https://github.com/MiniMax-AI/cli +**NPM:** https://www.npmjs.com/package/mmx-cli +**Requires:** Node.js 18+, MiniMax Token Plan (Global or CN) -## Prerequisites +Use `mmx` to generate text, images, video, speech, music, web search, and file storage — via the MiniMax AI platform. + +## Quick Install ```bash -# Install npm install -g mmx-cli - -# Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json) -mmx auth login --api-key sk-xxxxx - -# Verify active auth source -mmx auth status - -# Or pass per-call -mmx text chat --api-key sk-xxxxx --message "Hello" +npx skills add MiniMax-AI/cli -y -g # add as OpenClaw skill ``` -Region is auto-detected. Override with `--region global` or `--region cn`. - ---- - ## Agent Flags Always use these flags in non-interactive (agent/CI) contexts: @@ -42,6 +33,34 @@ Always use these flags in non-interactive (agent/CI) contexts: --- +## Authentication + +```bash +# Interactive — choose OAuth or paste an API key +mmx auth login + +# Non-interactive — paste API key directly +mmx auth login --api-key *** + +# Skip the menu — auto-select OAuth for the given region +mmx auth login --recommend --region=global # → api.minimax.io +mmx auth login --recommend --region=cn # → api.minimaxi.com + +# Verify current auth status +mmx auth status +mmx auth refresh +mmx auth logout +``` + +**Auth notes:** +- Credentials are stored in `~/.mmx/config.json` — separate from OpenClaw's own MiniMax OAuth config +- OAuth and API-key are mutually exclusive; logging in with one clears the other +- `--api-key` flag can be passed per-command to override stored auth +- With an API key, region is auto-detected by probing both Global and CN endpoints +- `mmx auth status` is the canonical way to verify active authentication + +--- + ## Commands ### text chat @@ -54,12 +73,12 @@ mmx text chat --message [flags] | Flag | Type | Description | |---|---|---| -| `--message ` | string, **required**, repeatable | Message text. Prefix with `role:` to set role (e.g. `"system:You are helpful"`, `"user:Hello"`) | +| `--message ` | string, **required**, repeatable | Message text. Prefix with `role:` to set role (`"system:"`, `"user:"`, `"assistant:"`) | | `--messages-file ` | string | JSON file with messages array. Use `-` for stdin | | `--system ` | string | System prompt | | `--model ` | string | Model ID (default: `MiniMax-M2.7`) | | `--max-tokens ` | number | Max tokens (default: 4096) | -| `--temperature ` | number | Sampling temperature (0.0, 1.0] | +| `--temperature ` | number | Sampling temperature (0.0–1.0] | | `--top-p ` | number | Nucleus sampling threshold | | `--stream` | boolean | Stream tokens (default: on in TTY) | | `--tool ` | string, repeatable | Tool definition JSON or file path | @@ -78,7 +97,7 @@ mmx text chat \ cat conversation.json | mmx text chat --messages-file - --output json ``` -**stdout**: response text (text mode) or full response object (json mode). +**stdout:** response text (text mode) or full response object (json mode). --- @@ -90,6 +109,8 @@ Generate images. Model: `image-01`. mmx image generate --prompt [flags] ``` +Single-token shorthand: `mmx image "prompt here"` + | Flag | Type | Description | |---|---|---| | `--prompt ` | string, **required** | Image description | @@ -111,13 +132,19 @@ mmx image generate --prompt "A cat in a spacesuit" --output json --quiet mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet # stdout: saved file paths (one per line) + +# Fine-grained sizing +mmx image generate --prompt "A photo" --width 1024 --height 768 + +# With prompt optimizer +mmx image generate --prompt "cat spacesuit" --prompt-optimizer ``` --- ### video generate -Generate video. Default model: `MiniMax-Hailuo-2.3`. This is an async task — by default it polls until completion. +Generate video. Default model: `MiniMax-Hailuo-2.3`. Async task — polls until completion by default. ```bash mmx video generate --prompt [flags] @@ -135,13 +162,12 @@ mmx video generate --prompt [flags] | `--poll-interval ` | number | Polling interval (default: 5) | ```bash -# Non-blocking: get task ID +# Non-blocking: get task ID immediately mmx video generate --prompt "A robot." --async --quiet # stdout: {"taskId":"..."} -# Blocking: wait and get file path +# Blocking: wait for completion, save to file mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet -# stdout: ocean.mp4 ``` ### video task get @@ -164,12 +190,14 @@ mmx video download --file-id [--out ] ### speech synthesize -Text-to-speech. Default model: `speech-2.8-hd`. Max 10k chars. +Text-to-speech. Default model: `speech-2.8-hd`. Max 10k chars. 30+ voices available. ```bash mmx speech synthesize --text [flags] ``` +Single-token shorthand: `mmx speech "Hello!"` + | Flag | Type | Description | |---|---|---| | `--text ` | string | Text to synthesize | @@ -184,19 +212,27 @@ mmx speech synthesize --text [flags] | `--bitrate ` | number | Bitrate (default: 128000) | | `--channels ` | number | Audio channels (default: 1) | | `--language ` | string | Language boost | -| `--subtitles` | boolean | Download and save subtitles as `.srt` file (alongside `--out` audio file). API must support subtitles for the selected model. +| `--subtitles` | boolean | Include subtitle timing data | | `--pronunciation ` | string, repeatable | Custom pronunciation | | `--sound-effect ` | string | Add sound effect | | `--out ` | string | Save audio to file | | `--stream` | boolean | Stream raw audio to stdout | ```bash +# Save to file mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet -# stdout: hello.mp3 -mmx speech synthesize --text "Hello" --subtitles --out hello.mp3 -# saves hello.mp3 + hello.srt (SRT subtitle file) +# With SRT subtitles (if voice model supports it) +mmx speech synthesize --text "Hello world" --subtitles --out hello.mp3 +# saves hello.mp3 + hello.srt + +# List all available voices +mmx speech voices +# Stream to audio player (pipe raw audio to mpv) +mmx speech synthesize --text "Stream me" --stream | mpv - + +# From stdin echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3 ``` @@ -204,30 +240,30 @@ echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3 ### music generate -Generate music. Responds well to rich, structured descriptions. - -**Model:** `music-2.6-free` — unlimited for API key users, RPM = 3. +Generate music. Model: `music-2.6-free` — unlimited for API key users, RPM = 3. Responds well to rich, structured descriptions. ```bash mmx music generate --prompt [--lyrics ] [flags] ``` +Single-token shorthand: `mmx music "Upbeat pop"` + | Flag | Type | Description | |---|---|---| | `--prompt ` | string | Music style description (can be detailed) | -| `--lyrics ` | string | Song lyrics with structure tags. Required unless `--instrumental` or `--lyrics-optimizer` is used. | +| `--lyrics ` | string | Song lyrics with structure tags. Required unless `--instrumental` or `--lyrics-optimizer` is used | | `--lyrics-file ` | string | Read lyrics from file. Use `-` for stdin | -| `--lyrics-optimizer` | boolean | Auto-generate lyrics from prompt. Cannot be used with `--lyrics` or `--instrumental`. | -| `--instrumental` | boolean | Generate instrumental music (no vocals). Cannot be used with `--lyrics`. | +| `--lyrics-optimizer` | boolean | Auto-generate lyrics from prompt. Cannot be used with `--lyrics` or `--instrumental` | +| `--instrumental` | boolean | Generate instrumental music (no vocals) | | `--vocals ` | string | Vocal style, e.g. `"warm male baritone"`, `"bright female soprano"`, `"duet with harmonies"` | | `--genre ` | string | Music genre, e.g. folk, pop, jazz | | `--mood ` | string | Mood or emotion, e.g. warm, melancholic, uplifting | | `--instruments ` | string | Instruments to feature, e.g. `"acoustic guitar, piano"` | | `--tempo ` | string | Tempo description, e.g. fast, slow, moderate | | `--bpm ` | number | Exact tempo in beats per minute | -| `--key ` | string | Musical key, e.g. C major, A minor, G sharp | +| `--key ` | string | Musical key, e.g. C major, A minor | | `--avoid ` | string | Elements to avoid in the generated music | -| `--use-case ` | string | Use case context, e.g. `"background music for video"`, `"theme song"` | +| `--use-case ` | string | Use case context, e.g. `"background music for video"` | | `--structure ` | string | Song structure, e.g. `"verse-chorus-verse-bridge-chorus"` | | `--references ` | string | Reference tracks or artists, e.g. `"similar to Ed Sheeran"` | | `--extra ` | string | Additional fine-grained requirements | @@ -238,10 +274,8 @@ mmx music generate --prompt [--lyrics ] [flags] | `--out ` | string | Save audio to file | | `--stream` | boolean | Stream raw audio to stdout | -At least one of `--prompt` or `--lyrics` is required. - ```bash -# With lyrics +# With explicit lyrics mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet # Auto-generate lyrics from prompt @@ -250,7 +284,7 @@ mmx music generate --prompt "Upbeat pop about summer" --lyrics-optimizer --out s # Instrumental mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3 --quiet -# Detailed prompt with vocal characteristics +# Detailed with vocal characteristics mmx music generate --prompt "Warm morning folk" \ --vocals "male and female duet, harmonies in chorus" \ --instruments "acoustic guitar, piano" \ @@ -263,9 +297,7 @@ mmx music generate --prompt "Warm morning folk" \ ### music cover -Generate a cover version of a song based on reference audio. - -**Model:** `music-cover-free` — unlimited for API key users, RPM = 3. +Generate a cover version of a song based on reference audio. Model: `music-cover-free` — unlimited for API key users, RPM = 3. ```bash mmx music cover --prompt (--audio | --audio-file ) [flags] @@ -274,10 +306,10 @@ mmx music cover --prompt (--audio | --audio-file ) [flags] | Flag | Type | Description | |---|---|---| | `--prompt ` | string, **required** | Target cover style, e.g. `"Indie folk, acoustic guitar, warm male vocal"` | -| `--audio ` | string | URL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB) | +| `--audio ` | string | URL of reference audio (mp3, wav, flac — 6s to 6min, max 50MB) | | `--audio-file ` | string | Local reference audio file (auto base64-encoded) | -| `--lyrics ` | string | Cover lyrics. If omitted, extracted from reference audio via ASR. | -| `--lyrics-file ` | string | Read lyrics from file. Use `-` for stdin | +| `--lyrics ` | string | Cover lyrics. If omitted, extracted from reference audio via ASR | +| `--lyrics-file ` | string | Read lyrics from file | | `--seed ` | number | Random seed 0–1000000 for reproducible results | | `--format ` | string | Audio format: `mp3`, `wav`, `pcm` (default: `mp3`) | | `--sample-rate ` | number | Sample rate (default: 44100) | @@ -289,14 +321,16 @@ mmx music cover --prompt (--audio | --audio-file ) [flags] ```bash # Cover from URL mmx music cover --prompt "Indie folk, acoustic guitar, warm male vocal" \ - --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --out cover.mp3 --quiet + --audio https://filecdn.minimax.chat/public/example.mp3 --out cover.mp3 --quiet # Cover from local file with custom lyrics mmx music cover --prompt "Jazz, piano, slow" \ --audio-file original.mp3 --lyrics-file lyrics.txt --out jazz_cover.mp3 --quiet # Reproducible result with seed -mmx music cover --prompt "Pop, upbeat" --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --seed 42 --out cover.mp3 +mmx music cover --prompt "Pop, upbeat" \ + --audio https://filecdn.minimax.chat/public/example.mp3 \ + --seed 42 --out cover.mp3 ``` --- @@ -309,6 +343,8 @@ Image understanding via VLM. Provide either `--image` or `--file-id`, not both. mmx vision describe (--image | --file-id ) [flags] ``` +Single-token shorthand: `mmx vision photo.jpg` + | Flag | Type | Description | |---|---|---| | `--image ` | string | Local path or URL (auto base64-encoded) | @@ -319,8 +355,6 @@ mmx vision describe (--image | --file-id ) [flags] mmx vision describe --image photo.jpg --prompt "What breed?" --output json ``` -**stdout**: description text (text mode) or full response (json mode). - --- ### search query @@ -331,6 +365,8 @@ Web search via MiniMax. mmx search query --q ``` +Single-token shorthand: `mmx search "query here"` + | Flag | Type | Description | |---|---|---| | `--q ` | string, **required** | Search query | @@ -341,6 +377,58 @@ mmx search query --q "MiniMax AI" --output json --quiet --- +### file upload + +Upload a file to MiniMax storage. Use the returned `file_id` with vision, speech, or other file-dependent commands. + +```bash +mmx file upload --file [--purpose ] [flags] +``` + +| Flag | Type | Description | +|---|---|---| +| `--file ` | string, **required** | Local path to the file | +| `--purpose ` | string | File purpose: `retrieval` (default) or `vision` | + +```bash +mmx file upload --file doc.pdf +mmx file upload --file image.png --purpose vision +# stdout: { file_id, filename, purpose, bytes, created_at } +``` + +### file list + +List all uploaded files in MiniMax storage. + +```bash +mmx file list [--output json] +``` + +```bash +# Formatted table (default) +mmx file list +# ID FILENAME PURPOSE SIZE_KB CREATED +# 123456789 doc.pdf retrieval 2048.0 2026-05-27 10:30 + +# JSON output +mmx file list --output json +``` + +### file delete + +Delete an uploaded file by ID. + +```bash +mmx file delete --file-id +``` + +```bash +mmx file delete --file-id 123456789 +# stdout: deleted +``` + +--- + ### quota show Display Token Plan usage and remaining quotas. @@ -351,6 +439,20 @@ mmx quota show [--output json] --- +## Single-Token Shorthand Commands + +MiniMax CLI supports quick single-token shortcuts for common operations: + +```bash +mmx image "A cat in a spacesuit" # image generate +mmx speech "Hello!" --out hello.mp3 # speech synthesize +mmx music "Upbeat pop" --out song.mp3 # music generate +mmx vision photo.jpg # vision describe +mmx search "MiniMax AI latest news" # search query +``` + +--- + ## Tool Schema Export Export all commands as Anthropic/OpenAI-compatible JSON tool schemas: @@ -363,8 +465,6 @@ mmx config export-schema mmx config export-schema --command "video generate" ``` -Use this to dynamically register mmx commands as tools in your agent framework. - --- ## Exit Codes @@ -381,47 +481,23 @@ Use this to dynamically register mmx commands as tools in your agent framework. --- -## Piping Patterns +## Configuration ```bash -# stdout is always clean data — safe to pipe -mmx text chat --message "Hi" --output json | jq '.content' - -# stderr has progress/spinners — discard if needed -mmx video generate --prompt "Waves" 2>/dev/null - -# Chain: generate image → describe it -URL=$(mmx image generate --prompt "A sunset" --quiet) -mmx vision describe --image "$URL" --quiet - -# Async video workflow -TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId') -mmx video task get --task-id "$TASK" --output json -mmx video download --task-id "$TASK" --out robot.mp4 +mmx config show # show current config +mmx config set --key region --value cn # set platform region +mmx config set --key default-text-model --value MiniMax-M2.7-highspeed # set default model +mmx update # update CLI to latest +mmx quota # show token plan usage ``` ---- - -## Configuration Precedence - -CLI flags → environment variables → `~/.mmx/config.json` → defaults. - -```bash -# Persistent config -mmx config set --key region --value cn -mmx config show - -# Environment -export MINIMAX_API_KEY=sk-xxxxx -export MINIMAX_REGION=cn -``` +**Config precedence:** CLI flags → environment variables → `~/.mmx/config.json` → defaults. ### Default Model Configuration Set per-modality defaults so you don't need `--model` every time: ```bash -# Set defaults mmx config set --key default-text-model --value MiniMax-M2.7-highspeed mmx config set --key default-speech-model --value speech-2.8-hd mmx config set --key default-video-model --value MiniMax-Hailuo-2.3 @@ -437,4 +513,25 @@ mmx music generate --prompt "Upbeat pop" --instrumental mmx text chat --model MiniMax-M2.7 --message "Hello" ``` -**Resolution priority**: `--model` flag > config default > hardcoded fallback. +**Resolution priority:** `--model` flag > config default > hardcoded fallback. + +--- + +## Piping Patterns + +```bash +# stdout is always clean data — safe to pipe +mmx text chat --message "Hi" --output json | jq '.content' + +# stderr has progress/spinners — discard if needed +mmx video generate --prompt "Waves" 2>/dev/null + +# Chain: generate image → describe it +URL=$(mmx image generate --prompt "A sunset" --quiet) +mmx vision describe --image "$URL" --quiet + +# Async video workflow +TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId') +mmx video task get --task-id "$TASK" --output json +mmx video download --task-id "$TASK" --out robot.mp4 +```