Universal Edit JSON authoring conventions. Read this before composing any Edit JSON. The conventions agents most often get wrong are listed here once. The same file ships with both the Shotstack CLI skill and the Shotstack MCP server.
Don't invent property names or enum values. The Shotstack schema is published — fetch one of these before composing JSON from scratch:
- https://shotstack.io/docs/api/api.edit.json — single-file OpenAPI Schema. Machine-validatable; load it once and validate locally instead of round-tripping the API.
- https://shotstack.io/docs/api/ — interactive HTML reference. Fastest for human scanning.
- https://shotstack.io/docs/guide/llms-full.txt — single-file LLM-friendly version of the full guide + reference.
- https://github.com/shotstack/oas-api-definition/tree/main/schemas — raw OpenAPI YAML, source of truth.
CSS naming conventions (alignment, vertical: "center") do not apply. The spec uses precise names that often differ from web/CSS instincts:
| You'd guess (wrong) | API uses (right) |
|---|---|
alignment |
align |
align: "center" (string on rich-text asset) |
align: { "horizontal": "center", "vertical": "middle" } (object) |
align.vertical: "center" |
align.vertical: "middle" |
font.name |
font.family |
duration |
length |
transitions: [...] (array) |
transition: { in, out } (object) |
fit: "cover" (CSS instinct: scale+crop maintaining aspect) |
fit: "crop" — Shotstack's cover STRETCHES without maintaining aspect ratio |
When the API rejects a property, the error message names the field — fix and retry. Don't guess twice.
timeline.tracks is an array. The first element (tracks[0]) is the TOP layer; the last element is the BOTTOM layer. This is opposite to most z-index conventions.
"Tracks are layered on top of each other in the same order they are added to the array with the top most track layered over the top of those below it." — Shotstack docs
Practical rule: put captions, overlays, and titles in early tracks; put video/image backgrounds in later tracks.
{
"timeline": {
"tracks": [
{ "clips": [/* TOP — captions */] },
{ "clips": [/* MIDDLE — title overlay */] },
{ "clips": [/* BOTTOM — background video */] }
]
}
}These string values are accepted in addition to numbers:
| Where | Value | Meaning |
|---|---|---|
clip.start |
"auto" |
Start when the previous clip on the same track finishes |
clip.start |
"alias://<name>" |
Inherit start from another clip with alias: "<name>" |
clip.length |
"auto" |
Asset's natural duration. Use for foreground (video, voiceover, scene). |
clip.length |
"end" |
Until timeline ends, capped at asset duration. Use for background (music, captions, watermark). |
clip.length |
"alias://<name>" |
Inherit length from another clip |
"end" does NOT loop short audio — use a numeric length if you need precise control.
The alias:// protocol is also used in rich-caption src to auto-transcribe a referenced audio/video clip — see references/caption.md.
All asset src URLs must be publicly accessible HTTPS. No local file paths, no data: URIs, no signed URLs that expire mid-render. The render workers fetch assets from the public internet.
For test renders without your own assets, use the placeholder library at https://shotstack-assets.s3.amazonaws.com/ — see references/asset-library.md.
Clips on the same track must not have overlapping start/length ranges — overlapping clips flicker because the engine can't decide which to display. Anything visible at the same time goes on a separate track. This is the most common structural mistake.
Sequential clips (one finishes, the next starts) can share a track — "start": "auto" chains them. A cross-fade needs two tracks with a small time overlap and a transition on each.
Validate before you render: shotstack validate <file> catches same-track overlaps — plus unloaded fonts, non-public src URLs, and wrong property names/enums — offline, no API key, no credits.
Compose from this rather than round-tripping the full schema — these are the values renders actually use. (api.edit.json stays authoritative; shotstack validate checks against it.)
Clip-level (wraps any asset):
| Field | Values |
|---|---|
fit |
crop (fill + crop, default — CSS object-fit: cover) · contain (letterbox) · cover (stretch, ignores aspect) · none |
position |
center top bottom left right topLeft topRight bottomLeft bottomRight |
offset |
{ "x": <−1..1>, "y": <−1..1> } — fraction of the canvas; y positive = up |
scale / opacity |
number (1 = the fit result) / 0..1 |
effect |
Ken-Burns drift: zoomIn zoomOut slideLeft slideRight slideUp slideDown — each also …Slow / …Fast |
filter |
blur boost contrast darken greyscale lighten muted negative |
transition |
{ "in": …, "out": … } ↓ |
transition.in / .out — each also takes a Slow/Fast suffix (e.g. fadeSlow, slideUpFast):
none fade reveal wipeLeft wipeRight slideLeft slideRight slideUp slideDown carouselLeft carouselRight carouselUp carouselDown shuffle* (eight corners, e.g. shuffleTopRight) zoom.
rich-text asset — the styled-text workhorse (use instead of text/title):
{
"type": "rich-text",
"text": "UTOPIA",
"font": { "family": "<loaded-family>", "size": 160, "weight": "700", "color": "#141414", "opacity": 1 },
"style": { "letterSpacing": 6, "lineHeight": 0.95, "textTransform": "uppercase", "textDecoration": "none" },
"stroke": { "width": 3, "color": "#000000" },
"shadow": { "offsetX": 0, "offsetY": 6, "blur": 18, "color": "#000000", "opacity": 0.4 },
"background": { "color": "#ffffff", "opacity": 1, "borderRadius": 16 },
"align": { "horizontal": "center", "vertical": "middle" },
"animation": { "preset": "fadeIn", "duration": 0.6, "style": "word", "direction": "up" }
}align.horizontal = left|center|right; align.vertical = top|middle|bottom (not center). animation.preset = fadeIn slideIn typewriter ascend shift movingLetters; animation.style = character|word; animation.direction = left|right|up|down. Give the clip a width/height box so text wraps and aligns where you expect.
For motion beyond this (kinetic type, count-ups, shine sweeps, grain, pulsing CTAs) reach for html5 — see references/html5-snippets.md.
position picks one of nine anchor points (center default; top bottom left right topLeft topRight bottomLeft bottomRight); offset nudges from there. offset is a fraction of the output frame, not a centred −1..+1 grid: offset.x positive → right (× frame width), offset.y positive → up (× frame height). Range is ±10; anything past ±1 pushes the clip off-frame.
Clip-level width/height (pixels) define a bounding box. For image/video, fit fills it — crop (keep aspect, crop overflow) · contain (letterbox) · cover (stretch, distorts) · none. For rich-text, that same clip box sets the text-wrap width and the area align positions within — size text on the clip, not the asset. Without width/height a clip fills the frame, so unsized text centres across the whole output. scale then multiplies the result (uniform on both axes).
Order of operations: fit → position → offset → rotate → scale. Full reference: references/positioning.md.
Pick output.resolution (preset) OR output.size.width+output.size.height (custom):
| Preset | Pixels @ fps |
|---|---|
preview |
512×288 @ 15 |
mobile |
640×360 @ 25 |
sd |
1024×576 @ 25 |
hd |
1280×720 @ 25 (default) |
1080 |
1920×1080 @ 25 |
Custom sizes must be divisible by 2.
Use only the current asset types; the deprecated ones still parse but should not be used in new templates.
| Type | Purpose |
|---|---|
video |
Video file (mp4, mov, webm). |
image |
Static image — jpg, png, webp, gif, bmp, tiff. |
audio |
Audio clip placed at a specific time on the timeline. |
rich-text |
Styled text overlay with full typography control. Use this instead of text/html/title. |
svg |
Vector graphics from raw SVG markup. See references/svg.md. |
html5 |
Self-contained HTML/CSS/JS page rendered in an iframe (motion graphics, charts, animated overlays). Preloads gsap/d3/anime/lottie. See references/html5.md. Never use the deprecated html asset. |
rich-caption |
Word-level animated captions sourced from audio, video, or subtitle files. See references/caption.md. |
luma |
Luma matte for masking effects. |
image-to-video |
AI: animate a still image into a short video clip. Billed per generation. |
text-to-image |
AI: generate an image from a text prompt. Billed per generation. |
text-to-speech |
AI: generate speech from a text prompt. Billed per generation. |
timeline.fonts[] is a separate field for custom font URLs (not an asset type).
For background music, use an audio asset on its own track with length: "end". Do NOT use timeline.soundtrack — it is deprecated. The audio asset path supports keyframes, custom timing, fades, and effects; soundtrack does not.
text, title, caption, html, shape. They still parse but produce inferior output. Replace with:
| If you'd use… | Use instead |
|---|---|
text or title |
rich-text |
caption |
rich-caption |
html |
html5 (for motion graphics or animated overlays) or rich-text (for static styled text) |
shape |
svg with <rect>, <circle>, <polygon> etc. |
timeline.soundtrack |
audio asset on its own track with length: "end" |
image-to-video, text-to-speech, and text-to-image are billed per generation even when invoked through the sandbox stage endpoint (which is otherwise free). They are async — the render submits the AI job and waits. Renders containing AI assets take longer.
One overlay is a rich-text job. Several videos "in different styles" is not — making them all rich-text ships one look eight times. Match the asset to the ambition:
| Level | Asset | Use for |
|---|---|---|
| 1 — type & layout | rich-text |
Titles, lower-thirds, kickers, captions, price/CTA pills. Fast and reliable; the right default for static styled text. |
| 2 — shapes | svg |
Colour panels, rules, badges, frames, geometric accents behind or around type. |
| 3 — motion graphics | html5 |
Kinetic type, count-ups / price odometers, shine sweeps, animated gradients, film grain, masked reveals, data-driven overlays — anything that should move beyond a transition or a Ken-Burns effect. gsap / anime / d3 / lottie are preloaded. See references/html5.md and the copy-paste clips in references/html5-snippets.md. |
When the brief asks for a range, deliberately spread across the ladder. A strong set: a couple of clean rich-text studio cuts, one or two svg colour-block promos, and several html5 pieces carrying the real motion (kinetic headline, price odometer, shine-swept CTA, grain-graded teaser). Reserve the elaborate html5 treatments for the hero / hype cuts where motion sells the product. If every clip in a "variety" brief is rich-text, you have not delivered variety.
Use custom Google Fonts via timeline.fonts[]. System fonts (Arial, Helvetica, Times New Roman) are NOT installed and will fail with "Font not found".
CRITICAL: Do NOT construct or fabricate Google Fonts URLs from memory. Google rotates them (v26 → v31 → ...) and the hash filenames change with each version. Any URL you reconstruct from training data is almost certainly a 404. Use ONLY the verified entries below, copied verbatim.
See references/fonts.md for how to source fonts from the Studio SDK catalogue and the built-in fallback fonts.
{
"timeline": {
"fonts": [
{ "src": "https://fonts.gstatic.com/s/montserrat/v31/JTUSjIg1_i6t8kCHKm45xW5rygbi49c.ttf" }
],
"tracks": [
{
"clips": [{
"asset": {
"type": "rich-text",
"text": "Hello",
"font": { "family": "JTUSjIg1_i6t8kCHKm45xW5rygbi49c", "size": 60, "weight": "700", "color": "#ffffff" }
},
"start": 0, "length": 3
}]
}
]
}
}The font.family value MUST match the family column in the table above (it's the filename basename without .ttf). If family and the URL's basename don't match, the font will not load.
- Reverse track order.
tracks[0]is the TOP layer, not the bottom. Captions go in early tracks; backgrounds go in late tracks. - System fonts.
Arial,Helvetica,Times New Roman, etc. are not installed. Use Google Fonts viatimeline.fonts[](preferred) or one of the built-in fonts inreferences/fonts.md. - Captions fill the whole frame. A
rich-captionclip withoutwidth,height, andfit: "none"covers the entire output. Use a named preset fromreferences/caption.md. <text>inside an SVG asset. Raw<text>is unsupported. Use arich-textasset for any text content; reserve SVG for shapes only.- Composing custom caption styles when presets exist. The five named presets (Nico, Kai, Kapow, Lovely Little Lychee, Rizz) cover the common styles. Use one verbatim from
references/caption.mdunless the user asks for something specific.
For details beyond this core guide (rich-caption presets, SVG constraints, full font URL list, troubleshooting), see the references/ directory in the Shotstack CLI repo or fetch the topic-specific docs from https://shotstack.io/docs/guide/llms-full.txt.