AetherEngine

A video player engine for Apple platforms.
FFmpeg demuxes. VideoToolbox decodes. AVPlayer handles Dolby Atmos.
You ship the UI.

What it is

A player engine that gets the hard parts right (HDR, Dolby Vision, Dolby Atmos, A/V sync across multiple clocks) and exposes a CALayer plus a handful of async methods. No AVPlayerViewController. No opinionated controls. No analytics. Embed the layer, call play(), read the published properties for state.

You provide the transport bar. You provide the dropdowns. You provide the pretty.

What it handles

Area	Details
Containers	MKV, MP4, WebM, MPEG-TS, AVI, OGG, FLV
HW decode	H.264, HEVC, HEVC Main10 via VideoToolbox
SW decode	AV1 (dav1d), VP9 fallback. Pooled pixel buffers, no per-frame allocations
HDR10	10-bit P010 output, BT.2020 + PQ color tagging on every frame
HDR10+	Per-frame ST 2094-40 dynamic metadata extracted from `AV_PKT_DATA_DYNAMIC_HDR10_PLUS` and attached to every `CMSampleBuffer` via `kCMSampleAttachmentKey_HDR10PlusPerFrameData`
Dolby Vision	Profile 5 / 8.1 / 8.4. Format description tagged as `kCMVideoCodecType_DolbyVisionHEVC` (`'dvh1'`) with a `dvcC` extension built from FFmpeg's DV configuration record so DV-capable TVs switch into DV mode; HDR10 / HLG fallback on non-DV TVs
HLG	Transfer function detected and forwarded
HDR to SDR	Software tonemap via `VTPixelTransferSession` when Match Dynamic Range is off
Audio	AAC, AC3, EAC3, FLAC, MP3, Opus, Vorbis, TrueHD, DTS, ALAC, PCM
Dolby Atmos	EAC3+JOC passthrough via local HLS + AVPlayer with Dolby MAT 2.0 wrapping
Surround	5.1 / 7.1 with correct `AudioChannelLayout` tagging
Subtitles	SubRip / ASS / SSA / WebVTT / mov_text streamed inline; PGS / HDMV PGS / DVB / DVD rendered as `CGImage` with normalised position; sidecar `.srt` / `.ass` / `.vtt` URLs decoded via short-lived context
Seek	Decoder + renderer flush, pre-target frame skip, no "fast forward from keyframe" artifact
Streaming	HTTP Range + chunked delegate reads via `URLSession`
Resilience	Exponential backoff on transient network errors, background pause, display-link aware lifecycle

Quick start

import AetherEngine

let player = try AetherEngine()
view.layer.addSublayer(player.videoLayer)

try await player.load(url: videoURL)                 // or
try await player.load(url: videoURL, startPosition: 347.5)

player.play()
player.pause()
player.setRate(1.5)
await player.seek(to: 120)
player.stop()

// Observe (Combine @Published)
player.$state         // .idle, .loading, .playing, .paused, .seeking, .error
player.$currentTime
player.$duration
player.$videoFormat   // .sdr, .hdr10, .hdr10Plus, .dolbyVision, .hlg

player.audioTracks    // [TrackInfo]
player.selectAudioTrack(index: trackID)

// Subtitles, text and bitmap, one published list
player.subtitleTracks                          // [TrackInfo] for the loaded source
player.selectSubtitleTrack(index: streamID)    // embedded, text or bitmap
player.selectSidecarSubtitle(url: srtURL)      // .srt / .ass / .vtt next to the media
player.clearSubtitle()
player.$subtitleCues                           // [SubtitleCue], body is .text(String) or .image(SubtitleImage)
player.$isSubtitleActive                       // host mirror gate
player.$isLoadingSubtitles                     // sidecar fetch + decode in progress

Install via Swift Package Manager:

.package(url: "https://github.com/superuser404notfound/AetherEngine", branch: "main")

Dolby Atmos pipeline

AVSampleBufferAudioRenderer ignores Atmos metadata. AVPlayer doesn't. So for EAC3+JOC streams, AetherEngine demuxes the EAC3 packets, wraps them into fMP4 with a dec3 box that declares JOC (numDepSub=1, depChanLoc=0x0100), serves the segments from a local HLS server on 127.0.0.1:<port>, and points AVPlayer at the playlist. AVPlayer wraps the bitstream as Dolby MAT 2.0 and the receiver lights up the Atmos indicator.

Demux ──┬─ Video packets ──► Decode queue ──► AVSampleBufferDisplayLayer
        │                                              │
        │                                              │ controlTimebase synced
        │                                              │ to AVPlayer clock
        │                                              │
        └─ EAC3 packets ───► fMP4 muxer ──► HLS server ──► AVPlayer
                                                         │
                                                         └─► receiver / speaker

The AVSampleBufferDisplayLayer is driven by a CMTimebase whose source is bound directly to AVPlayerItem.timebase via CMTimebaseSetSourceTimebase. The HLS pipe takes 2-4 seconds to buffer; during that window the timebase is paused and video holds on frame 1. Once AVPlayer.timeControlStatus flips to .playing and the item timebase is live, the bind is established and from that moment on video and audio share the same hardware-aware clock (including AVR / soundbar Atmos decoder latency, MAT 2.0 unpack delay, pre-roll, and pause/resume) without any periodic drift correction.

If the active output route can't take multichannel (Bluetooth A2DP, HFP, LE, or any route reporting fewer than 6 output channels), AetherEngine skips the Atmos pipeline entirely and routes EAC3 through the regular FFmpeg PCM decoder, so you still get sound instead of silence. If a TV advertises Atmos in EDID but AVPlayer stalls anyway (some AVRs do this), a 5-second watchdog falls back to PCM automatically.

HDR routing

Source	Output pixel format	Tagged as
H.264, HEVC (SDR)	8-bit NV12	BT.709
HEVC Main10 (HDR10), HDR display	10-bit P010	BT.2020 / PQ
HEVC Main10 (HDR10+), HDR display	10-bit P010	BT.2020 / PQ + per-frame ST 2094-40
HEVC Main10 (DV P5/P8), DV display	10-bit P010	BT.2020 / PQ + dvcC + RPU passthrough
HEVC Main10, SDR display	8-bit NV12 (tonemapped)	BT.709
AV1 HDR	10-bit P010	BT.2020 / PQ

HDR to SDR tonemapping runs through a dedicated VTPixelTransferSession with a pre-allocated CVPixelBufferPool, separate from the decompression session so it doesn't interfere with the controlTimebase-driven display path used by Atmos.

On tvOS, the display layer opts into preferredDynamicRange = .high so the compositor doesn't silently clip BT.2020 pixels to Rec.709 after the TV has been told to switch to HDR.

HDR10+ dynamic metadata

For HDR10+ content the demuxer emits an AVDynamicHDRPlus struct as packet side data on each frame. The decoder serialises it back to the user-data-registered ITU-T T.35 byte form via av_dynamic_hdr_plus_to_t35, stashes it under the packet PTS, and on the way out of VTDecompressionSession pairs it with the matching decoded frame. The bytes are then attached to the outgoing CMSampleBuffer via kCMSampleAttachmentKey_HDR10PlusPerFrameData (introduced in iOS 16 / tvOS 16). The display layer forwards the metadata to the system compositor, which sends it onward over HDMI; HDR10+-capable TVs apply the source's per-scene tone-mapping curves instead of falling back to the static HDR10 base layer. Tonemap-to-SDR drops the metadata since it would be irrelevant on an SDR output.

Dolby Vision signalling

For DV streams the demuxer surfaces an AVDOVIDecoderConfigurationRecord on the codec parameters' coded side data. The decoder packs it into the 24-byte ISO BMFF dvcC box body, attaches it as a sample-description-extension atom alongside hvcC, and promotes the codec type from kCMVideoCodecType_HEVC to kCMVideoCodecType_DolbyVisionHEVC ('dvh1'). On DV-capable displays this is what triggers the TV-side switch to Dolby Vision mode for Profile 5 (no HDR10 fallback), 8.1 (HDR10 backward-compatible) and 8.4 (HLG backward-compatible). On HDR10-only TVs we leave the format as plain 'hvc1' so the existing HDR10 / HLG backward-compatible base layer plays correctly. Detection logging is unconditional in DEBUG so consumers can verify the path engages without owning DV hardware.

Subtitles

Subtitle packets are routed through the same demux loop as audio and video. No second AVIO connection, no full-file scan. Each packet decodes inline through avcodec_decode_subtitle2, the result lands in a single [SubtitleCue] published list:

Text codecs (SubRip / ASS / SSA / WebVTT / mov_text) → SubtitleCue.body = .text(String). ASS dialogue headers and override blocks ({\an8}, {\b1}, ...) are stripped; \N becomes a real newline so the host can render with regular text layout.
Bitmap codecs (PGS / HDMV PGS / DVB / DVD) → .image(SubtitleImage). The indexed pixel plane is walked through its palette, premultiplied against alpha, and wrapped as a CGImage. Position is normalised in [0..1] against the source video frame so the host scales to any on-screen rect.
Sidecar files (a separate .srt / .ass / .vtt URL) → selectSidecarSubtitle(url:) opens its own short-lived AVFormatContext, decodes the whole file once, atomically swaps the result into subtitleCues.

A single packet that carries multiple rects (PGS often emits signs/songs at the top alongside dialogue at the bottom) becomes multiple cues at the same time range, and the host renders all of them. Cues are inserted in sorted order; backward seeks dedupe by start|end so the list doesn't grow on rewind.

The host stays in charge of the actual paint: text styling, overlay layout, fade transitions, position scaling against the on-screen video rect.

Architecture

Sources/AetherEngine/
├── AetherEngine.swift             Public API + demux/decode orchestration + subtitle stream decode
├── PlayerState.swift              PlaybackState, VideoFormat, TrackInfo, SubtitleCue, SubtitleImage
├── Demuxer/
│   ├── Demuxer.swift              libavformat wrapper
│   └── AVIOReader.swift           URLSession → avio_alloc_context
├── Decoder/
│   ├── VideoDecoder.swift         VideoToolbox + HDR tonemap
│   ├── SoftwareVideoDecoder.swift dav1d / libavcodec fallback
│   └── SubtitleDecoder.swift      Sidecar URL one-shot decode (text only)
├── Renderer/
│   └── SampleBufferRenderer.swift Display layer + B-frame reorder
└── Audio/
    ├── AudioDecoder.swift         libswresample → PCM
    ├── AudioOutput.swift          AVSampleBufferAudioRenderer
    ├── HLSAudioEngine.swift       AVPlayer driver for Atmos passthrough
    ├── HLSAudioServer.swift       Local HLS HTTP server
    └── FMP4AudioMuxer.swift       EAC3 → fMP4 with dec3/JOC

Dependencies

Package	License	Purpose
FFmpegBuild	LGPL-3.0	Slim FFmpeg 7.1 + dav1d 1.5
VideoToolbox	System	Hardware decode, tonemap
AVFoundation	System	Audio renderer, AVPlayer, sync
CoreMedia	System	Sample buffers, timing

Non-goals

Things AetherEngine deliberately doesn't do, so you don't have to read the source to find out:

No built-in UI. No controls, no transport bar, no pretty HUD.
No analytics, telemetry, or session reporting. Wire your own to the @Published state.
No playlist / queue management. Call load(url:) when you want the next one.
No subtitle overlay. The engine decodes packets and emits SubtitleCue (text or CGImage with normalised position); your UI paints them with whatever style and animation you want.
No Metal shaders. Everything renders through Apple's native display stack.
No third-party networking. URLSession handles bytes; TLS / HTTP-3 / proxies / MDM rules ride for free.

Requirements

	Min
iOS	16.0
tvOS	16.0
macOS	14.0
Swift	6.0
Xcode	16.0

Used by

Sodalite: native Jellyfin client for Apple TV.

Built with

AetherEngine is vibe-coded, designed and shipped by Vincent Herbst in close pair-programming with Claude (Anthropic). The commit log is the receipt: nearly every commit carries a Co-Authored-By: Claude trailer.

License

LGPL-3.0 with Apple Store / DRM Exception. The exception clause grants explicit permission to distribute through application stores (Apple App Store, TestFlight, etc.) whose terms otherwise conflict with LGPL §4–6. Modifications to the engine itself still have to be released under LGPL.

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
Sources/AetherEngine		Sources/AetherEngine
.gitignore		.gitignore
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AetherEngine

What it is

What it handles

Quick start

Dolby Atmos pipeline

HDR routing

HDR10+ dynamic metadata

Dolby Vision signalling

Subtitles

Architecture

Dependencies

Non-goals

Requirements

Used by

Built with

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AetherEngine

What it is

What it handles

Quick start

Dolby Atmos pipeline

HDR routing

HDR10+ dynamic metadata

Dolby Vision signalling

Subtitles

Architecture

Dependencies

Non-goals

Requirements

Used by

Built with

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages