Skip to content

fix: resolve camera preview and export pipeline issues#1679

Open
hidumou wants to merge 1 commit intoCapSoftware:mainfrom
hidumou:fix/camera-and-export-pipeline
Open

fix: resolve camera preview and export pipeline issues#1679
hidumou wants to merge 1 commit intoCapSoftware:mainfrom
hidumou:fix/camera-and-export-pipeline

Conversation

@hidumou
Copy link

@hidumou hidumou commented Mar 25, 2026

Summary

This PR fixes three issues in the desktop recording/export pipeline:

  1. Camera preview did not re-open when the same camera was already active.
  2. Windows camera exports could show snow/noisy artifacts because camera buffers were read with the wrong stride/layout assumptions.
  3. 4K 60fps export could fail with Failed to export recording: Encode: Invalid argument because encoded packets could reach the MP4 muxer with invalid timestamp ordering (pts < dts).

Changes

  • Re-open the camera preview window when the selected camera is already active but the preview window is closed.
  • Respect camera buffer stride/layout on Windows during FFmpeg conversion and texture upload paths.
  • Tighten the libx264 high-throughput export configuration used for 4K60 software fallback by disabling B-frames.
  • Add a packet timestamp safety fix before MP4 muxing so packets are not written with pts < dts.

Root Cause

For the 4K60 export failure, logs showed non-monotonic timestamp ordering during software H.264 export. The encoder/muxer path could produce packets where pts became smaller than dts, which causes FFmpeg/MP4 muxing to fail with Invalid argument.

For the Windows camera artifact issue, camera frames were previously treated as tightly packed buffers, but real device buffers can contain row padding/stride. That caused corrupted image interpretation in export/output paths.

Validation

  • cargo fmt --all
  • cargo check -p cap-enc-ffmpeg -p cap-export
  • cargo check --manifest-path apps/desktop/src-tauri/Cargo.toml

Additionally, the failing local 4K60 export project was re-run and progressed well past the previous early failure range without immediately reproducing Encode: Invalid argument.

Greptile Summary

This PR addresses three distinct bugs in the recording/export pipeline: camera preview re-open on same-device re-selection, Windows camera buffer stride handling, and MP4 muxing failures from pts < dts ordering violations during 4K60 software H.264 export.

  • Camera window fix (lib.rs): Correctly re-opens the preview window when the same camera is already active by unwrapping skip_camera_window once and reusing it in both the early-return path and the standard path. Clean and correct.
  • Timestamp fix (base.rs): Adds a second pts < dts guard after the existing DTS monotonicity block. This correctly catches the case where the encoder emits pts < dts without any DTS ordering violation (first block doesn't fire). Paired with the bf=0 change in h264.rs, this should eliminate Invalid argument errors from the MP4 muxer.
  • B-frame disable (h264.rs): bf=0 removes B-frames from the high-throughput software path, eliminating the primary source of PTS reordering. b-adapt=0 is redundant once bf=0 is set but harmless.
  • Stride-aware camera buffers (win.rs): The largest change introduces infer_camera_buffer_layout to derive per-frame buffer stride from data_len / total_rows, then threads this stride through copy_plane, convert_uyvy_to_yuyv, and flip_camera_buffer_into. The approach is sound for standard Windows MF devices. One behavioral change to be aware of: upload_mf_buffer_to_texture now returns an error immediately for MJPEG/H264 frames (previously continued with incorrect raw-pixel assumptions), which is better fail-fast behavior but could affect MJPEG camera live preview if that path was previously relied upon.

Confidence Score: 4/5

  • Safe to merge; all three fixes are logically correct with one behavioral change in the MJPEG texture upload path that warrants verification.
  • The timestamp and B-frame fixes are well-reasoned and correctly scoped. The stride inference approach is sound for the stated target (Windows MF camera devices). The one open question — whether any in-use camera streams MJPEG through upload_mf_buffer_to_texture — should be confirmed before merging, as that path now returns a hard error instead of silently processing garbage data.
  • crates/recording/src/output_pipeline/win.rs — specifically upload_mf_buffer_to_texture for MJPEG camera compatibility.

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/lib.rs Extracts skip_camera_window from Option earlier to allow it to be reused in both the early-return path (camera already active, window closed) and the normal path. Clean fix for the camera preview re-open regression.
crates/enc-ffmpeg/src/base.rs Adds a second pts < dts guard after the existing DTS monotonicity fix. The new guard correctly covers the case where DTS was not bumped but the encoder still emitted a packet with pts < dts. The fix is sound but may need documentation to explain why two separate guards are needed.
crates/enc-ffmpeg/src/video/h264.rs Adds bf=0 and changes b-adapt from 1→0 for the high-throughput software path. Setting bf=0 disables B-frames entirely, which eliminates a source of pts/dts reordering. b-adapt=0 is redundant once bf=0 is set, but harmless.
crates/recording/src/output_pipeline/win.rs Large refactor adding stride-aware camera buffer handling. New infer_camera_buffer_layout heuristic derives stride from data_len / total_rows. Works correctly for common semi-planar (NV12/NV21) and packed formats. upload_mf_buffer_to_texture now returns an error for MJPEG/H264 frames (previously processed incorrectly as raw pixel data), which is better-fail-fast behavior but a behavioral change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Camera Frame Arrives] --> B{PixelFormat?}
    B -->|MJPEG| C[decode_mjpeg_frame\nFFmpeg MJPEG decoder]
    B -->|Other| D[infer_camera_buffer_layout\nderives stride from data_len / total_rows]
    D -->|None: MJPEG/H264 or\nbad divisibility| E[Return Error]
    D -->|Some layout| F{UYVY422?}
    F -->|Yes| G[convert_uyvy_to_yuyv\nstride-aware + flip support]
    F -->|No| H[Use raw data\nwith layout stride]
    G --> I[copy_plane with\nnew src_stride param]
    H --> I
    I --> J[ffmpeg::frame::Video returned\nto encoder pipeline]

    K[Packet encoded] --> L{DTS monotonic?}
    L -->|dts ≤ last_dts| M[Bump DTS to last_dts + 1\nAdjust PTS if pts < new DTS]
    L -->|OK| N{pts < dts?}
    M --> N
    N -->|Yes| O[Set pts = dts]
    N -->|No| P[write_interleaved to MP4]
    O --> P
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/win.rs
Line: 1850-1867

Comment:
**MJPEG cameras will now hard-fail in `upload_mf_buffer_to_texture`**

`infer_camera_buffer_layout` always returns `None` for `PixelFormat::MJPEG` and `PixelFormat::H264`, so this function will now return an error immediately for any camera that streams in MJPEG format. Before this PR, the function continued (incorrectly, producing garbage), but now it fails outright.

`camera_frame_to_ffmpeg` correctly short-circuits MJPEG at the top of the function before reaching the layout inference. `upload_mf_buffer_to_texture` has no equivalent guard.

If any MJPEG cameras are used for the live preview path (which goes through texture upload), the preview will break silently. Consider adding a similar early-return guard:

```rust
// MJPEG/H264 frames cannot be uploaded as raw pixel textures
if matches!(
    frame.pixel_format,
    cap_camera_windows::PixelFormat::MJPEG | cap_camera_windows::PixelFormat::H264
) {
    return Err(windows::core::Error::new(
        windows::core::HRESULT(-1),
        "MJPEG/H264 cannot be uploaded as a raw texture".to_string(),
    ));
}
```

If MJPEG was already broken in this path before (the old `bytes_per_pixel = _ => 2` was clearly wrong for compressed data), this is acceptable fail-fast behavior — but it should be documented explicitly.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/win.rs
Line: 1149-1186

Comment:
**Stride inference may fail for devices with asymmetric UV plane alignment**

`infer_camera_buffer_layout` derives `primary_stride` from `data_len / total_rows`, which assumes both the luma and chroma planes in planar formats (YUV420P, YV12) use strides consistent with a single stride value. Specifically, for YUV420P it assumes `secondary_stride = primary_stride / 2`.

This assumption holds for the vast majority of Windows camera drivers (MF aligns all planes to the same width multiple), but a device that provides a Y plane with stride 1936 and U/V planes with stride 1024 (a different alignment unit) would fail the `data_len % total_rows != 0` check and cause the frame to be dropped with an error.

The resulting error is logged clearly, so failed frames are visible. This is a reasonable trade-off given that such cameras are rare, but it's worth noting as a known limitation of the stride inference approach.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/enc-ffmpeg/src/base.rs
Line: 88-92

Comment:
**Second `pts < dts` guard is correct but worth a comment**

The existing DTS monotonicity block (lines 76–86) already adjusts PTS when DTS is bumped. This new block catches the distinct case where the encoder emits a packet with `pts < dts` without any DTS ordering violation (i.e., the first block didn't fire). This is a real scenario with libx264 at certain configurations.

A short comment would help future readers understand why two separate guards exist:

```rust
// Secondary guard: encoder may emit pts < dts even when DTS is
// already monotonic (e.g. during stream init or B-frame drain).
if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts())
    && pts < dts
{
    self.packet.set_pts(Some(dts));
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: resolve camera preview and export p..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Comment on lines +1850 to +1867
let layout = infer_camera_buffer_layout(
frame.pixel_format,
frame.width as usize,
frame.height as usize,
original_data.len(),
)
.ok_or_else(|| {
windows::core::Error::new(
windows::core::HRESULT(-1),
format!(
"Failed to infer camera buffer layout for {:?} {}x{} ({} bytes)",
frame.pixel_format,
frame.width,
frame.height,
original_data.len()
),
)
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 MJPEG cameras will now hard-fail in upload_mf_buffer_to_texture

infer_camera_buffer_layout always returns None for PixelFormat::MJPEG and PixelFormat::H264, so this function will now return an error immediately for any camera that streams in MJPEG format. Before this PR, the function continued (incorrectly, producing garbage), but now it fails outright.

camera_frame_to_ffmpeg correctly short-circuits MJPEG at the top of the function before reaching the layout inference. upload_mf_buffer_to_texture has no equivalent guard.

If any MJPEG cameras are used for the live preview path (which goes through texture upload), the preview will break silently. Consider adding a similar early-return guard:

// MJPEG/H264 frames cannot be uploaded as raw pixel textures
if matches!(
    frame.pixel_format,
    cap_camera_windows::PixelFormat::MJPEG | cap_camera_windows::PixelFormat::H264
) {
    return Err(windows::core::Error::new(
        windows::core::HRESULT(-1),
        "MJPEG/H264 cannot be uploaded as a raw texture".to_string(),
    ));
}

If MJPEG was already broken in this path before (the old bytes_per_pixel = _ => 2 was clearly wrong for compressed data), this is acceptable fail-fast behavior — but it should be documented explicitly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/win.rs
Line: 1850-1867

Comment:
**MJPEG cameras will now hard-fail in `upload_mf_buffer_to_texture`**

`infer_camera_buffer_layout` always returns `None` for `PixelFormat::MJPEG` and `PixelFormat::H264`, so this function will now return an error immediately for any camera that streams in MJPEG format. Before this PR, the function continued (incorrectly, producing garbage), but now it fails outright.

`camera_frame_to_ffmpeg` correctly short-circuits MJPEG at the top of the function before reaching the layout inference. `upload_mf_buffer_to_texture` has no equivalent guard.

If any MJPEG cameras are used for the live preview path (which goes through texture upload), the preview will break silently. Consider adding a similar early-return guard:

```rust
// MJPEG/H264 frames cannot be uploaded as raw pixel textures
if matches!(
    frame.pixel_format,
    cap_camera_windows::PixelFormat::MJPEG | cap_camera_windows::PixelFormat::H264
) {
    return Err(windows::core::Error::new(
        windows::core::HRESULT(-1),
        "MJPEG/H264 cannot be uploaded as a raw texture".to_string(),
    ));
}
```

If MJPEG was already broken in this path before (the old `bytes_per_pixel = _ => 2` was clearly wrong for compressed data), this is acceptable fail-fast behavior — but it should be documented explicitly.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1149 to +1186
fn infer_camera_buffer_layout(
pixel_format: cap_camera_windows::PixelFormat,
width: usize,
height: usize,
data_len: usize,
) -> Option<CameraBufferLayout> {
use cap_camera_windows::PixelFormat;

if width == 0 || height == 0 || data_len == 0 {
return None;
}

let total_rows = match pixel_format {
PixelFormat::NV12
| PixelFormat::NV21
| PixelFormat::YUV420P
| PixelFormat::YV12
| PixelFormat::P010 => height + (height / 2),
PixelFormat::YUYV422
| PixelFormat::UYVY422
| PixelFormat::GRAY16
| PixelFormat::RGB565
| PixelFormat::ARGB
| PixelFormat::RGB32
| PixelFormat::RGB24
| PixelFormat::BGR24
| PixelFormat::GRAY8 => height,
PixelFormat::MJPEG | PixelFormat::H264 => return None,
};

if total_rows == 0 || data_len % total_rows != 0 {
return None;
}

let primary_stride = data_len / total_rows;

match pixel_format {
PixelFormat::NV12 | PixelFormat::NV21 => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stride inference may fail for devices with asymmetric UV plane alignment

infer_camera_buffer_layout derives primary_stride from data_len / total_rows, which assumes both the luma and chroma planes in planar formats (YUV420P, YV12) use strides consistent with a single stride value. Specifically, for YUV420P it assumes secondary_stride = primary_stride / 2.

This assumption holds for the vast majority of Windows camera drivers (MF aligns all planes to the same width multiple), but a device that provides a Y plane with stride 1936 and U/V planes with stride 1024 (a different alignment unit) would fail the data_len % total_rows != 0 check and cause the frame to be dropped with an error.

The resulting error is logged clearly, so failed frames are visible. This is a reasonable trade-off given that such cameras are rare, but it's worth noting as a known limitation of the stride inference approach.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/win.rs
Line: 1149-1186

Comment:
**Stride inference may fail for devices with asymmetric UV plane alignment**

`infer_camera_buffer_layout` derives `primary_stride` from `data_len / total_rows`, which assumes both the luma and chroma planes in planar formats (YUV420P, YV12) use strides consistent with a single stride value. Specifically, for YUV420P it assumes `secondary_stride = primary_stride / 2`.

This assumption holds for the vast majority of Windows camera drivers (MF aligns all planes to the same width multiple), but a device that provides a Y plane with stride 1936 and U/V planes with stride 1024 (a different alignment unit) would fail the `data_len % total_rows != 0` check and cause the frame to be dropped with an error.

The resulting error is logged clearly, so failed frames are visible. This is a reasonable trade-off given that such cameras are rare, but it's worth noting as a known limitation of the stride inference approach.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +88 to +92
if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts())
&& pts < dts
{
self.packet.set_pts(Some(dts));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Second pts < dts guard is correct but worth a comment

The existing DTS monotonicity block (lines 76–86) already adjusts PTS when DTS is bumped. This new block catches the distinct case where the encoder emits a packet with pts < dts without any DTS ordering violation (i.e., the first block didn't fire). This is a real scenario with libx264 at certain configurations.

A short comment would help future readers understand why two separate guards exist:

// Secondary guard: encoder may emit pts < dts even when DTS is
// already monotonic (e.g. during stream init or B-frame drain).
if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts())
    && pts < dts
{
    self.packet.set_pts(Some(dts));
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/enc-ffmpeg/src/base.rs
Line: 88-92

Comment:
**Second `pts < dts` guard is correct but worth a comment**

The existing DTS monotonicity block (lines 76–86) already adjusts PTS when DTS is bumped. This new block catches the distinct case where the encoder emits a packet with `pts < dts` without any DTS ordering violation (i.e., the first block didn't fire). This is a real scenario with libx264 at certain configurations.

A short comment would help future readers understand why two separate guards exist:

```rust
// Secondary guard: encoder may emit pts < dts even when DTS is
// already monotonic (e.g. during stream init or B-frame drain).
if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts())
    && pts < dts
{
    self.packet.set_pts(Some(dts));
}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@richiemcilroy
Copy link
Member

Hey @hidumou, was this an issue you have personally experienced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants