-
Notifications
You must be signed in to change notification settings - Fork 436
Web gpu maskeditor rendering #6812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…equidistant points using equidistant point sampling on splines, including arc length sampling for equidistance in curves
…rush cross artefacts, fixed brush size
…dedicated file, fixed errors
…ht mouse button gesture not expanding brush to max size
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe PR introduces GPU-accelerated brush rendering via WebGPU to the mask editor. New dependencies (TypeGPU, WebGPU types) and configuration enable GPU pipelines (shaders, renderer). Stroke processing gains equidistant point resampling using Catmull-Rom spline interpolation. Brush controls shift from smoothing precision to step size. Canvas history now supports ImageBitmap for efficient GPU texture transfers. Multiple test suites validate new stroke and brush utilities. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User Input
participant UI as UI (Component)
participant Draw as useBrushDrawing
participant GPU as GPUBrushRenderer
participant Canvas as CPU Canvas
participant History as Canvas History
User->>UI: Start drawing stroke
UI->>Draw: startDrawing()
Draw->>Draw: Initialize StrokeProcessor
loop For each mouse move
User->>UI: Mouse position
UI->>Draw: handleDrawing(point)
Draw->>Draw: StrokeProcessor.addPoint(point)
Draw->>Draw: resampleSegment(Catmull-Rom)
alt GPU Path
Draw->>GPU: gpuDrawPoint(points)
GPU->>GPU: renderStrokeToAccumulator()
else CPU Path (fallback)
Draw->>Canvas: Direct canvas rendering
end
end
User->>UI: End drawing
UI->>Draw: drawEnd()
Draw->>GPU: compositeStroke(targetView)
GPU->>GPU: Copy accum texture → main texture
Draw->>Draw: updateGPUFromCanvas()
Draw->>History: saveState(ImageBitmap)
History->>History: Store or replace state
Note over Draw,History: On undo/redo
History->>Draw: restoreState(ImageBitmap)
Draw->>GPU: copyGpuToCanvas()
GPU->>Canvas: readback → blit to canvas
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
🎨 Storybook Build Status✅ Build completed successfully! ⏰ Completed at: 11/21/2025, 08:38:58 PM UTC 🔗 Links🎉 Your Storybook is ready for review! |
🎭 Playwright Test Results⏰ Completed at: 11/21/2025, 08:48:48 PM UTC 📈 Summary
📊 Test Reports by Browser
🎉 Click on the links above to view detailed test results for each browser configuration. |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Bundle Size ReportSummary
Category Glance Per-category breakdownApp Entry Points — 3.18 MB (baseline 3.13 MB) • 🔴 +44 kBMain entry bundles and manifests
Status: 3 added / 3 removed Graph Workspace — 945 kB (baseline 945 kB) • ⚪ 0 BGraph editor runtime, canvas, workflow orchestration
Status: 1 added / 1 removed Views & Navigation — 7.97 kB (baseline 7.97 kB) • ⚪ 0 BTop-level views, pages, and routed surfaces
Status: 1 added / 1 removed Panels & Settings — 306 kB (baseline 306 kB) • ⚪ 0 BConfiguration panels, inspectors, and settings screens
Status: 6 added / 6 removed UI Components — 141 kB (baseline 141 kB) • ⚪ 0 BReusable component library chunks
Status: 6 added / 6 removed Data & Services — 12.5 kB (baseline 12.5 kB) • ⚪ 0 BStores, services, APIs, and repositories
Status: 3 added / 3 removed Utilities & Hooks — 2.94 kB (baseline 2.94 kB) • ⚪ 0 BHelpers, composables, and utility bundles
Status: 1 added / 1 removed Vendor & Third-Party — 5.7 MB (baseline 5.32 MB) • 🔴 +374 kBExternal libraries and shared vendor chunks
Status: 3 added / 3 removed Other — 3.87 MB (baseline 3.87 MB) • ⚪ 0 BBundles that do not match a named category
Status: 19 added / 19 removed |
| if (store.tgpuRoot) return | ||
|
|
||
| try { | ||
| const root = await TGPU.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[security] high Priority
Issue: Missing WebGPU feature capability checking before device usage
Context: Direct GPU device initialization without checking if WebGPU is supported, which can cause runtime errors on unsupported browsers/devices
Suggestion: Add feature detection with graceful fallback: if (!navigator.gpu) { /* fallback to canvas */ } before TGPU.init()
| this.device.queue.writeBuffer(this.uniformBuffer, 0, buffer) | ||
|
|
||
| // 2. Batch Instance Data | ||
| const batchSize = Math.min(points.length, MAX_STROKES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[performance] high Priority
Issue: Missing GPU resource bounds checking for MAX_STROKES limit
Context: StrokeProcessor and GPUBrushRenderer can exceed MAX_STROKES=10000 limit without validation, potentially causing buffer overflow or silent truncation
Suggestion: Add validation in renderStrokeInternal: if (points.length > MAX_STROKES) { /* batch process or warn */ }
| let a = color.a; | ||
| if (a > 0.0) { | ||
| r = r / a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[quality] medium Priority
Issue: Potential division by zero in shader readback
Context: Alpha unpremultiplication 'r = r / a' can cause undefined behavior when alpha is very small but not exactly zero
Suggestion: Add epsilon check: if (a > 0.001) { r = r / a; } else { r = 0.0; } to prevent numerical instability
| ? this.erasePipeline | ||
| : this.compositePipeline | ||
|
|
||
| const bindGroup0 = this.device.createBindGroup({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[performance] high Priority
Issue: Missing cleanup and memory leak for GPU bind groups
Context: bind groups are created repeatedly in compositeStroke, blitToCanvas without tracking or cleanup, causing GPU memory accumulation
Suggestion: Cache bind groups by key or implement proper cleanup in destroy() method to prevent GPU memory leaks
| drawShape(point, interpolatedOpacity) | ||
| if (renderer && compositionOp === CompositionOperation.SourceOver) { | ||
| gpuRender(points) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[architecture] high Priority
Issue: Mixed GPU and CPU rendering paths lack proper fallback coordination
Context: GPU path failures silently fall back to CPU rendering without state synchronization, potentially causing visual inconsistencies and performance issues
Suggestion: Implement proper error boundaries and state synchronization between GPU/CPU paths, or fail fast with clear user feedback when GPU initialization fails
| "semver": "^7.7.2", | ||
| "three": "^0.170.0", | ||
| "tiptap-markdown": "^0.8.10", | ||
| "typegpu": "catalog:", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[quality] medium Priority
Issue: Missing TypeGPU library usage validation and version dependencies
Context: Adding typegpu as a production dependency without proper capability detection increases bundle size for users who cannot use WebGPU
Suggestion: Consider making TypeGPU an optional dynamic import or add capability detection to avoid loading on unsupported devices
| // Generate dense points for the segment | ||
| const densePoints: Point[] = [] | ||
|
|
||
| const samples = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[performance] medium Priority
Issue: Fixed sample count causes unnecessary computation for short segments
Context: Using fixed 20 samples for all segments regardless of length wastes computation on short segments and may undersample long segments
Suggestion: Calculate sample count dynamically based on segment length: const samples = Math.max(2, Math.ceil(segmentLength / minSampleDistance))
| @@ -6,7 +6,6 @@ | |||
| "lib": [ | |||
| "ES2023", | |||
| "ES2023.Array", | |||
| "ESNext.Iterator", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[quality] low Priority
Issue: Removing ESNext.Iterator from TypeScript lib array limits modern iterator support
Context: Removing ESNext.Iterator might break compatibility with advanced iterator features that may be used elsewhere in the codebase
Suggestion: Verify this removal is intentional and doesn't break iterator-related functionality elsewhere
| import { StrokeProcessor } from './StrokeProcessor' | ||
| import { getEffectiveBrushSize, getEffectiveHardness } from './brushUtils' | ||
|
|
||
| // GPU Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[architecture] critical Priority
Issue: Global module state creates memory leaks and prevents multiple component instances
Context: Global variables for GPU resources (device, renderer, textures) at module level prevent proper cleanup and break component isolation
Suggestion: Move all GPU state into a composable or service class that can be properly instantiated, destroyed, and scoped to component lifecycle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comprehensive PR Review
This review is generated by Claude. It may not always be accurate, as with human reviewers. If you believe that any of the comments are invalid or incorrect, please state why for each. For others, please implement the changes in one way or another.
Review Summary
PR: Web gpu maskeditor rendering (#6812)
Impact: 2685 additions, 344 deletions across 23 files
Issue Distribution
- Critical: 1
- High: 3
- Medium: 3
- Low: 1
Category Breakdown
- Architecture: 2 issues
- Security: 1 issue
- Performance: 3 issues
- Code Quality: 3 issues
Key Findings
Architecture & Design
This PR introduces a comprehensive WebGPU rendering system for the mask editor, which is architecturally sound but has some critical design issues:
- Global State Management (CRITICAL): GPU resources are managed at module level rather than component lifecycle, creating memory leaks and preventing multiple instances
- Mixed Rendering Paths: GPU and CPU rendering paths lack proper coordination and error handling
Security Considerations
The main security concern is the lack of WebGPU feature detection, which could cause runtime errors on unsupported browsers rather than graceful degradation.
Performance Impact
The implementation shows good GPU optimization patterns but has several performance concerns:
- Missing bounds checking for MAX_STROKES limit could cause buffer overflows
- GPU bind groups are created repeatedly without caching, causing memory accumulation
- Fixed sample counts in stroke processing waste computation on short segments
Integration Points
The integration follows ComfyUI patterns well, properly using the store system and Vue composables. The TypeGPU library integration is reasonable but could be optimized for bundle size.
Positive Observations
- Excellent WebGPU pipeline architecture with proper render and compute pass separation
- Good use of TypeGPU for type-safe shader development
- Comprehensive stroke smoothing with Catmull-Rom splines
- Proper alpha blending and premultiplied alpha handling
- Well-structured composable pattern following Vue 3 best practices
- Extensive test coverage for new functionality
References
Next Steps
- Address critical global state management issues before merge
- Add proper WebGPU capability detection with fallback
- Implement GPU resource cleanup and memory management
- Consider bundle size optimization for TypeGPU dependency
This is a comprehensive automated review. For architectural decisions requiring human judgment, please request additional manual review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (22)
src/composables/maskeditor/useCanvasHistory.ts (3)
7-9: Consider extracting a shared history state type aliasThe
{ mask: ImageData | ImageBitmap; rgb: ImageData | ImageBitmap }shape is now repeated forstates, therestoreStateparameter, and implicitly at call sites. Extracting atype CanvasHistoryState = { mask: ImageData | ImageBitmap; rgb: ImageData | ImageBitmap }(or similar) would reduce duplication and keep future changes to this structure localized.Also applies to: 126-129, 163-163
58-61: saveState behavior with provided data vs. context availabilityThe new
saveState(providedMaskData?, providedRgbData?)overload is useful, and the cleanup of shifted ImageBitmaps viaclose()is good for avoiding leaks. Two behavioral points to double-check:
if (!maskCtx || !rgbCtx || !maskCanvas || !rgbCanvas) returnshort-circuits even when bothprovidedMaskData/providedRgbDataare supplied, which prevents using history purely as a bitmap container if the 2D contexts/canvases are not ready yet.- The provided data branch only triggers when both
providedMaskDataandprovidedRgbDataare truthy; mixed cases (one provided, one not) silently fall back to reading both from the canvases.
If callers are expected to sometimes push GPU-produced bitmaps before 2D contexts exist, or supply only one channel, you may want to relax the context guard for the “provided” path and/or support per-channel overrides.Also applies to: 76-90, 95-103
134-146: ImageBitmap restore/cleanup semantics and environment robustnessThe ImageBitmap branches in
restoreStateand the explicitclose()calls in both the rotation andclearStatespaths look correct and should help manage GPU/bitmap memory. A couple of softer considerations:
state.mask instanceof ImageBitmap/state.rgb instanceof ImageBitmapassumesImageBitmapexists at runtime; tests now provide a shim, but if this composable is ever used in SSR/Node without that shim, you may want a defensivetypeof ImageBitmap !== 'undefined'check around the instanceofs.clearRectcurrently uses the bitmap’swidth/height; if canvases can be resized independently of stored states, you might consider clearing based on the current canvas size instead to avoid leaving stale pixels outside the bitmap area.
Also, since histories now activelyclose()bitmaps, it’s worth documenting that ownership of passed-in ImageBitmaps is transferred touseCanvasHistoryso callers don’t reuse a closed bitmap later.Also applies to: 149-154
src/components/maskeditor/MaskEditorContent.vue (1)
158-168: Consider adding error handling for GPU initialization.While the code checks for availability, GPU initialization can fail for various reasons (unsupported hardware, driver issues, etc.). Consider wrapping the GPU initialization in a try-catch block and gracefully degrading to CPU-only rendering if GPU setup fails.
// Initialize GPU resources if (toolManager?.brushDrawing) { - await toolManager.brushDrawing.initGPUResources() - if (gpuCanvasRef.value && toolManager?.brushDrawing.initPreviewCanvas) { - // Match preview canvas resolution to mask canvas - gpuCanvasRef.value.width = maskCanvasRef.value.width - gpuCanvasRef.value.height = maskCanvasRef.value.height - - toolManager.brushDrawing.initPreviewCanvas(gpuCanvasRef.value) + try { + await toolManager.brushDrawing.initGPUResources() + if (gpuCanvasRef.value && toolManager?.brushDrawing.initPreviewCanvas) { + // Match preview canvas resolution to mask canvas + gpuCanvasRef.value.width = maskCanvasRef.value.width + gpuCanvasRef.value.height = maskCanvasRef.value.height + + toolManager.brushDrawing.initPreviewCanvas(gpuCanvasRef.value) + } + } catch (error) { + console.warn('[MaskEditorContent] GPU initialization failed, falling back to CPU rendering:', error) } }src/composables/maskeditor/StrokeProcessor.test.ts (1)
63-73: Align debug logging threshold withtoBeCloseTotoleranceRight now you only log when
|d - spacing| > 0.5, buttoBeCloseTo(spacing, 1)will already fail for differences ≥ ~0.05. That means some failing cases won’t emit the helpful debug logs.To keep logs aligned with the assertion, you could tighten the threshold, for example:
- if (Math.abs(d - spacing) > 0.5) { + if (Math.abs(d - spacing) > 0.05) {This way, any distance that can cause the expectation to fail will also log diagnostic information.
src/composables/maskeditor/brushUtils.ts (1)
1-34: Consider clamping hardness inputs/outputs to the documented [0,1] rangeThe math for
getEffectiveBrushSize/getEffectiveHardnessis clear and keeps the “hard core” radius stable as you grow the soft falloff. Right now, though, both functions assumehardnessis already in[0, 1](as per the JSDoc), and don’t guard against out‑of‑range values.If upstream code ever passes hardness < 0 or > 1, you could end up with negative or >1 effective hardness, which would contradict the return contract and could produce odd gradients.
If you want these helpers to be self‑contained and defensive, consider clamping:
export function getEffectiveBrushSize(size: number, hardness: number): number { - // Scale factor for maximum softness - const MAX_SCALE = 1.5 - const scale = 1.0 + (1.0 - hardness) * (MAX_SCALE - 1.0) + // Clamp hardness to [0, 1] to keep behavior predictable + const clampedHardness = Math.min(Math.max(hardness, 0), 1) + const MAX_SCALE = 1.5 + const scale = 1.0 + (1.0 - clampedHardness) * (MAX_SCALE - 1.0) return size * scale } export function getEffectiveHardness( size: number, hardness: number, effectiveSize: number ): number { if (effectiveSize <= 0) return 0 - // Adjust hardness to maintain the physical radius of the hard core - return (size * hardness) / effectiveSize + const clampedHardness = Math.min(Math.max(hardness, 0), 1) + const raw = (size * clampedHardness) / effectiveSize + // Clamp to [0, 1] to match the documented range + return Math.min(Math.max(raw, 0), 1) }Not mandatory if the store already guarantees valid ranges, but it will make these utilities safer to reuse in other contexts.
src/components/maskeditor/BrushSettingsPanel.vue (1)
55-62: Brush size/step size wiring looks good; consider localizing the “Stepsize” label
:max="500"on thickness matches the 1–500 clamp insetBrushSize, and thestepSizeslider range 1–100 matchessetBrushStepSize, so the store and UI stay in sync.- The new
onStepSizeChangehandler is correctly delegated tostore.setBrushStepSize.- For consistency with the rest of the panel and i18n, consider replacing the hard-coded
label="Stepsize"with a translation key once one exists (and maybe “Step size” spacing in the string).Also applies to: 83-89, 122-124
src/composables/maskeditor/StrokeProcessor.ts (1)
1-110: StrokeProcessor spline + resampling logic looks solid; consider explicit reset semanticsThe Catmull‑Rom windowing and
resampleSegmentintegration handle both dragged strokes and single‑clicks correctly, and the sliding window inaddPointplus the flush loop inendStrokegives full coverage of all segments without duplication.One thing to watch: internal state (
controlPoints,remainder,isFirstPoint,hasProcessedSegment) is never cleared afterendStroke(). If a singleStrokeProcessorinstance can be reused for multiple strokes, you may want either:
- to document this as “one instance per stroke”, or
- add a
reset()that clears the buffers/flags (optionally called fromendStroke) so reuse is safe.You could also consider lifting
const samples = 20to a constructor parameter or class constant if you plan to tune smoothness later, but it’s fine as-is.src/composables/maskeditor/splineUtils.ts (1)
1-125: Spline utilities are correct and robust for stroke processingThe centripetal Catmull‑Rom implementation and
resampleSegmentlogic look mathematically sound, including safeguards for coincident control points and zero‑length segments. The remainder contract also matches howStrokeProcessorcarries spacing across segments.If you ever profile this hot path, you could hoist
getT/interpor inline theadd/mulmath to reduce tiny allocation/closure overhead, but that’s a micro‑optimization and not required.src/composables/maskeditor/gpu/brushShaders.ts (1)
1-171: Shader logic looks coherent; please verify brush size semantics and enum/dep wiringOverall the WGSL templates and TypeGPU wiring look consistent:
- Vertex shader expands a quad around the stroke point and converts to NDC.
- Fragment shader correctly switches between square (Chebyshev) and circle (Euclidean) distance, applies hardness +
fwidth‑based AA, and returns premultiplied color.- Blit/composite shaders are standard full‑screen triangle passes.
- Readback computes un‑premultiplied RGBA and packs to
u32with sane indexing.A few things worth double‑checking:
- In
vs, the comment says “Convert diameter to radius” but the code useslet radius = size * pressure;(no/ 2.0). If thesizevertex attribute is actually a diameter, this will double the brush radius; if it is already a radius, the comment should be updated to avoid confusion.globals.brushShape == 1ufor the square path ties the shader to the exact numeric value of yourBrushShapeenum on the TS side. Consider at least documenting that coupling (or mapping to a simpleru32/bool uniform) so future enum changes don’t silently break shape selection.readbackShaderassumesoutputBuf.length >= width * height; ensure the compute dispatch and buffer allocation on the TS side always matchtextureDimensions(inputTex)to avoid out‑of‑bounds writes.- Lint hints show unresolved imports for
'typegpu'/'typegpu/data'and./gpuSchemain the analysis environment; please confirm those modules are in your dependencies and correctly resolved by your bundler/tsconfig.src/stores/maskEditorStore.ts (1)
20-26: Brush size/step size and new GPU/clear state are consistent; minor default/value nits
setBrushSizenow clamps to[1, 500], which correctly matches the updated thickness slider range.setBrushStepSizeclamps to[1, 100], aligned with the newstepSizeslider; the various usages ofstepSizeinbrushSettings,resetBrushToDefault, andresetStateare consistent type‑wise.- The only behavioral nit is that the initial
brushSettingsusesstepSize: 10while both reset paths usestepSize: 5. If you expect a single canonical default, you may want to align these values; if not, a brief comment would help explain the difference.clearTrigger+triggerClear()and the exportedtgpuRootref give the rest of the mask editor a clean way to react to clears and GPU root setup; once the GPU root type is stable, tighteningtgpuRootaway fromanywould improve type safety.Also applies to: 53-54, 72-75, 115-137, 175-177, 183-190, 211-273
src/composables/maskeditor/useBrushDrawing.ts (9)
20-36: Global GPU state is module‑scoped; consider instance‑scoping for multiple editors
device,renderer, textures, and readback buffers are all module‑level singletons. This works if you only ever mount one mask editor, but will couple all instances to the same GPU state and preview context (and any destroy/clear calls) if multiple editors are ever used concurrently or in quick succession.If multi‑instance support is a goal, consider moving these into per‑
useBrushDrawinginstance state (or a dedicated GPU service with clear ownership and reference counting).
196-208: Brush settings cache: ensure new fields are persisted explicitlyYou load
stepSizefrom cached brush settings but rely onBrush’s structure staying in sync with the store state (store.brushSettings). If new fields are later added toBrush, they won’t be persisted unless explicitly handled.Not urgent, but it might be safer to persist only the fields you care about (size, opacity, hardness, type, stepSize) and ignore/derive the rest, to decouple localStorage format from store internals.
210-216: Clear events should also clear preview and accumulatorThe
clearGPUwatcher zerosmaskTextureandrgbTexture, but it doesn’t reset the stroke accumulator or preview canvas. If the user clears the canvas mid‑stroke, the preview may still display stale stroke data until the next draw.You could also reset stroke/preview state when clearing:
watch( () => store.clearTrigger, () => { clearGPU() + resetDirtyRect() + if (renderer && previewContext) { + renderer.clearPreview(previewContext) + } } )This keeps GPU and CPU views visually consistent after a clear.
697-721:drawLinespacing reuse vialineRemainderis good; consider guarding 0‑length inputThe
resampleSegmentcall pluslineRemainderreuse gives consistent spacing across chained Shift+click segments. One small edge case: ifp1andp2are identical (degenerate line), you still go through GPU/CPU work even though there’s nothing to draw.You could early‑return for zero‑length segments to skip unnecessary work:
- async function drawLine(… - ): Promise<void> { - // Generate equidistant points using segment resampling + async function drawLine(… + ): Promise<void> { + if (p1.x === p2.x && p1.y === p2.y) { + return + } + // Generate equidistant points using segment resamplingNot critical, but a simple micro‑optimization.
727-787:startDrawingGPU/CPU flow is well structured; minor note on RAF branchThe overall start‑of‑stroke flow — reset dirty rect, prepare GPU accumulator, compute spacing from
stepSize, initializeStrokeProcessor, and hide the main canvas while preview is active — is coherent and guarded with try/catch.Note that the
handleDrawingbranch that callsgpuDrawPointonly executes when!isDrawing.value, and then immediately bails inside the RAF if!isDrawing.valueis still true. That effectively makesgpuDrawPointunreachable under normal drawing, so this function is now dead code.If it’s no longer needed, consider removing
gpuDrawPoint(and the special diff>20 path) to reduce complexity.
1011-1142: GPU readback + dirty‑rect cropping are correct; potential future perf win
copyGpuToCanvascorrectly:
- Lazily (re)allocates storage + staging buffers when the canvas size changes.
- Uses the compute
prepareReadbackpath to write into the storage buffer, then copies to map‑readable staging buffers.- Crops the
putImageDatato the currentdirtyRect, falling back to full canvas when invalid.Note that the compute shader always processes the full texture, even if the dirty rect is small, so large canvases will still pay O(W×H) at stroke end. If readback becomes a bottleneck, you might later consider tiling or a rect‑restricted readback, but the current implementation is correct and probably fine for now.
1147-1173:destroycleans resources but leaves some globals non‑null
destroyreleases renderer buffers, textures, readback buffers, and the TypeGPU root, and zeroescurrentBufferSize, which is good. Two small follow‑ups:
- Set
renderer = nullandpreviewContext = nullso accidental use after destroy fails fast.- Optionally
resetDirtyRect()to avoid stale state if the composable is reused without a full reload.Example:
function destroy(): void { renderer?.destroy() + renderer = null … - currentBufferSize = 0 + currentBufferSize = 0 + previewContext = null + resetDirtyRect() }This makes reuse safer in long‑lived SPA sessions.
1247-1260: Preview canvas init assumes GPU is ready; call ordering matters
initPreviewCanvasreturns early ifdeviceorgetContext('webgpu')is unavailable, which is fine. Just be aware that callers must ensureinitGPUResources(or at leastinitTypeGPU) has completed first, otherwisedevicewill be null and preview will silently never initialize.If this has bitten UX already, you could log a warning when
deviceis missing so mis‑ordering is easier to debug.
1373-1394:clearGPUonly clears textures; consider clearing preview and accumulator too
clearGPUwrites zeros into bothmaskTextureandrgbTexture, which is correct for the backing textures. However,currentStrokeTexture(inGPUBrushRenderer) and the preview canvas can still contain the last in‑progress stroke.You already clear the preview in
drawEnd; doing something similar inclearGPUwould keep everything in sync:function clearGPU() { if (!device || !maskTexture || !rgbTexture || !store.maskCanvas) return … device.queue.writeTexture(…rgbTexture…) + + if (renderer && previewContext) { + renderer.clearPreview(previewContext) + } }This avoids “ghost” strokes remaining in the preview after an external clear.
src/composables/maskeditor/gpu/GPUBrushRenderer.ts (2)
571-675: Blit + composite for preview are correct; note repeated uniform packing
blitToCanvascorrectly:
- Optionally draws the background texture first (for erasing over existing content), otherwise clears the target.
- Then composites the accumulated stroke using either the erase or composite preview pipeline.
- Repacks uniforms in the same layout as
compositeStroke.Functionally this is solid. If you later profile and find uniform packing/
ArrayBufferallocation to be hot, you could extract a small helper to reuse a single scratch buffer instead of allocating a new one on each call, but that’s an optimization only.
717-723:destroyreleases GPU resources cleanlyDestroying all buffers and the accumulation texture in
destroyis correct and important for long‑running sessions. Optionally, you might also setthis.currentStrokeTexture = nullto make accidental use after destroy more obvious, but functionally this is fine as‑is.
| private renderStrokeInternal( | ||
| targetView: GPUTextureView, | ||
| pipeline: GPURenderPipeline, | ||
| points: { x: number; y: number; pressure: number }[], | ||
| settings: { | ||
| size: number | ||
| opacity: number | ||
| hardness: number | ||
| color: [number, number, number] | ||
| width: number | ||
| height: number | ||
| brushShape: number | ||
| } | ||
| ) { | ||
| if (points.length === 0) return | ||
|
|
||
| // 1. Update Uniforms | ||
| const buffer = new ArrayBuffer(UNIFORM_SIZE) | ||
| const f32 = new Float32Array(buffer) | ||
| const u32 = new Uint32Array(buffer) | ||
|
|
||
| f32[0] = settings.color[0] | ||
| f32[1] = settings.color[1] | ||
| f32[2] = settings.color[2] | ||
| f32[3] = settings.opacity | ||
| f32[4] = settings.hardness | ||
| f32[5] = 0 // Padding | ||
| f32[6] = settings.width | ||
| f32[7] = settings.height | ||
| u32[8] = settings.brushShape | ||
| this.device.queue.writeBuffer(this.uniformBuffer, 0, buffer) | ||
|
|
||
| // 2. Batch Instance Data | ||
| const batchSize = Math.min(points.length, MAX_STROKES) | ||
| const iData = new Float32Array(batchSize * 4) | ||
| for (let i = 0; i < batchSize; i++) { | ||
| iData[i * 4 + 0] = points[i].x | ||
| iData[i * 4 + 1] = points[i].y | ||
| iData[i * 4 + 2] = settings.size | ||
| iData[i * 4 + 3] = points[i].pressure | ||
| } | ||
| this.device.queue.writeBuffer(this.instanceBuffer, 0, iData) | ||
|
|
||
| // 3. Render Pass | ||
| const encoder = this.device.createCommandEncoder() | ||
| const pass = encoder.beginRenderPass({ | ||
| colorAttachments: [ | ||
| { | ||
| view: targetView, | ||
| loadOp: 'load', | ||
| storeOp: 'store' | ||
| } | ||
| ] | ||
| }) | ||
|
|
||
| pass.setPipeline(pipeline) | ||
| pass.setBindGroup(0, this.uniformBindGroup) | ||
| pass.setVertexBuffer(0, this.quadVertexBuffer) | ||
| pass.setVertexBuffer(1, this.instanceBuffer) | ||
| pass.setIndexBuffer(this.indexBuffer, 'uint16') | ||
| pass.drawIndexed(6, batchSize) | ||
| pass.end() | ||
|
|
||
| this.device.queue.submit([encoder.finish()]) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
MAX_STROKES silently truncates long strokes; batch large point sets instead
In renderStrokeInternal you clamp to:
const batchSize = Math.min(points.length, MAX_STROKES)
…
pass.drawIndexed(6, batchSize)Any points beyond MAX_STROKES are never rendered, so very long strokes will be visibly truncated on the GPU path.
A straightforward fix is to batch in chunks of MAX_STROKES within a single render pass:
- const batchSize = Math.min(points.length, MAX_STROKES)
- const iData = new Float32Array(batchSize * 4)
- for (let i = 0; i < batchSize; i++) {
- iData[i * 4 + 0] = points[i].x
- iData[i * 4 + 1] = points[i].y
- iData[i * 4 + 2] = settings.size
- iData[i * 4 + 3] = points[i].pressure
- }
- this.device.queue.writeBuffer(this.instanceBuffer, 0, iData)
-
- // 3. Render Pass
- const encoder = this.device.createCommandEncoder()
- const pass = encoder.beginRenderPass({ … })
- …
- pass.drawIndexed(6, batchSize)
- pass.end()
- this.device.queue.submit([encoder.finish()])
+ const encoder = this.device.createCommandEncoder()
+ const pass = encoder.beginRenderPass({
+ colorAttachments: [
+ {
+ view: targetView,
+ loadOp: 'load',
+ storeOp: 'store'
+ }
+ ]
+ })
+
+ pass.setPipeline(pipeline)
+ pass.setBindGroup(0, this.uniformBindGroup)
+ pass.setVertexBuffer(0, this.quadVertexBuffer)
+ pass.setIndexBuffer(this.indexBuffer, 'uint16')
+
+ for (let offset = 0; offset < points.length; offset += MAX_STROKES) {
+ const batchSize = Math.min(MAX_STROKES, points.length - offset)
+ const iData = new Float32Array(batchSize * 4)
+ for (let i = 0; i < batchSize; i++) {
+ const p = points[offset + i]
+ iData[i * 4 + 0] = p.x
+ iData[i * 4 + 1] = p.y
+ iData[i * 4 + 2] = settings.size
+ iData[i * 4 + 3] = p.pressure
+ }
+ this.device.queue.writeBuffer(this.instanceBuffer, 0, iData)
+ pass.setVertexBuffer(1, this.instanceBuffer)
+ pass.drawIndexed(6, batchSize)
+ }
+
+ pass.end()
+ this.device.queue.submit([encoder.finish()])This preserves the existing MAX_STROKES buffer size while correctly rendering arbitrarily long strokes.
🏁 Script executed:
#!/bin/bash
# Find MAX_STROKES definition
rg -n 'MAX_STROKES' src/composables/maskeditor/gpu/GPUBrushRenderer.tsLength of output: 135
🏁 Script executed:
#!/bin/bash
# Examine the full renderStrokeInternal method and surrounding context
cat src/composables/maskeditor/gpu/GPUBrushRenderer.ts | head -600 | tail -150Length of output: 4274
🏁 Script executed:
#!/bin/bash
# Check if there are any other batching or stroke rendering methods
rg -n 'renderStroke|batch|MAX_STROKES' src/composables/maskeditor/gpu/GPUBrushRenderer.ts -A2 -B2Length of output: 1566
Batch long strokes in chunks instead of silently truncating at MAX_STROKES
The current implementation silently drops all points beyond MAX_STROKES (10,000):
const batchSize = Math.min(points.length, MAX_STROKES) // Clamps to 10,000
const iData = new Float32Array(batchSize * 4)
for (let i = 0; i < batchSize; i++) { // Never reaches points beyond index 10,000
// ...
}Points beyond the limit are never copied to the instance buffer or rendered. Implement the proposed batching approach to process all points while respecting the buffer size:
- const batchSize = Math.min(points.length, MAX_STROKES)
- const iData = new Float32Array(batchSize * 4)
- for (let i = 0; i < batchSize; i++) {
- iData[i * 4 + 0] = points[i].x
- iData[i * 4 + 1] = points[i].y
- iData[i * 4 + 2] = settings.size
- iData[i * 4 + 3] = points[i].pressure
- }
- this.device.queue.writeBuffer(this.instanceBuffer, 0, iData)
-
- // 3. Render Pass
- const encoder = this.device.createCommandEncoder()
- const pass = encoder.beginRenderPass({ … })
- …
- pass.drawIndexed(6, batchSize)
- pass.end()
- this.device.queue.submit([encoder.finish()])
+ const encoder = this.device.createCommandEncoder()
+ const pass = encoder.beginRenderPass({
+ colorAttachments: [
+ {
+ view: targetView,
+ loadOp: 'load',
+ storeOp: 'store'
+ }
+ ]
+ })
+
+ pass.setPipeline(pipeline)
+ pass.setBindGroup(0, this.uniformBindGroup)
+ pass.setVertexBuffer(0, this.quadVertexBuffer)
+ pass.setIndexBuffer(this.indexBuffer, 'uint16')
+
+ for (let offset = 0; offset < points.length; offset += MAX_STROKES) {
+ const batchSize = Math.min(MAX_STROKES, points.length - offset)
+ const iData = new Float32Array(batchSize * 4)
+ for (let i = 0; i < batchSize; i++) {
+ const p = points[offset + i]
+ iData[i * 4 + 0] = p.x
+ iData[i * 4 + 1] = p.y
+ iData[i * 4 + 2] = settings.size
+ iData[i * 4 + 3] = p.pressure
+ }
+ this.device.queue.writeBuffer(this.instanceBuffer, 0, iData)
+ pass.setVertexBuffer(1, this.instanceBuffer)
+ pass.drawIndexed(6, batchSize)
+ }
+
+ pass.end()
+ this.device.queue.submit([encoder.finish()])🤖 Prompt for AI Agents
In src/composables/maskeditor/gpu/GPUBrushRenderer.ts around lines 505-569, the
current code truncates points to MAX_STROKES and drops the rest; change this to
process points in chunks of up to MAX_STROKES so all points are rendered.
Implement a loop over offsets (for offset = 0; offset < points.length; offset +=
MAX_STROKES) that computes batchSize = Math.min(points.length - offset,
MAX_STROKES), builds an iData Float32Array for points[offset ..
offset+batchSize-1] (or fills it with a small inner loop), calls
device.queue.writeBuffer(this.instanceBuffer, 0, iData), then creates an
encoder, begins a render pass, binds pipeline/buffers, issues
pass.drawIndexed(6, batchSize), ends the pass and submits the encoder; repeat
for each chunk until all points are processed.
re-create a PR to trigger Claude Review for #6767
┆Issue is synchronized with this Notion page by Unito
Summary by CodeRabbit
New Features
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.