docs: OpenXLA performance table and the transferable-precision decision record (#449) by inureyes · Pull Request #562 · lablup/mlxcel

inureyes · 2026-07-01T03:30:44Z

Closes #517. Part of #513.

What

Docs for the OpenXLA performance profiling and the transferable-precision decision.

mlxcel-xla/README.md: a "Performance and the low-precision decision" section: the measured f32/f16 decode-step and end-to-end table (Metal + CPU, MLX as reference); the profiling finding that the Metal decode is GPU-kernel-bound (a 13-thread CPU beats the Metal GPU on the same StableHLO graph, so it is IREE metal-spirv codegen, not host/invoke overhead or bandwidth); the transferable graph-level vs non-transferable per-backend-codegen split; and the decision.
docs/adr/0004-...: a dated decision-record addendum.
Drive-by: removed two em dashes the feat: f16/bf16 precision mode for the OpenXLA emitter and resident weights (#449) #514 Precision section left in the README.

Decision recorded

Graph-level low precision is in scope: f16/bf16 landed (feat: f16/bf16 precision mode for the OpenXLA emitter and resident weights (#449) #514, test: precision accuracy gate and per-device default precision for the OpenXLA backend (#449) #515), ~1.9x on Metal and token-exact, and it transfers to every IREE target (the NPU entry ticket).
int8/int4 weight quantization (feat: int8 weight quantization with quantized matmul on the OpenXLA path (#449) #516) is the NPU lever, but its bandwidth payoff cannot be shown on a compute-bound Metal decode and needs an actual NPU to measure, so it is deferred to a hardware-gated follow-up (only token-exactness would be verifiable on Metal). feat: int8 weight quantization with quantized matmul on the OpenXLA path (#449) #516 stays open; epic epic: transferable low-precision performance for the OpenXLA backend (#449) #513 stays open on it.
Hand-tuning IREE's metal-spirv codegen to chase MLX is out of scope (upstream IREE; MLX owns Apple-Silicon performance).

Docs-only; no code change.

…on record (#449) Document where the OpenXLA decode time goes and why the epic invests in graph-level precision, not Metal kernels. Adds a "Performance and the low-precision decision" section to the mlxcel-xla README (the measured f32/f16 decode-step and end-to-end table across Metal and CPU with MLX as reference; the profiling finding that the Metal decode is GPU-kernel-bound, with a 13-thread CPU beating the Metal GPU on the same graph; the transferable graph-level vs non-transferable per-backend-codegen split) and a dated decision-record addendum to ADR 0004. Records the decision: graph-level low precision is in scope (f16/bf16 landed in #514/#515); int8/int4 weight quantization (#516) is the NPU lever but its bandwidth payoff cannot be shown on a compute-bound Metal decode and needs an actual NPU to measure, so it is deferred to a hardware-gated follow-up; hand-tuning IREE's metal-spirv codegen to chase MLX is out of scope. Docs-only.

inureyes added type:docs Documentation improvements or additions area:inference Generation, sampling, decoding (incl. speculative, DRY) platform:macos macOS (Apple Silicon) specific labels Jul 1, 2026

inureyes force-pushed the feat/517-xla-perf-docs branch from 583cf99 to ef5f58d Compare July 1, 2026 03:32

inureyes merged commit 4e770ec into main Jul 1, 2026
5 checks passed

inureyes deleted the feat/517-xla-perf-docs branch July 1, 2026 03:33

inureyes mentioned this pull request Jul 1, 2026

epic: transferable low-precision performance for the OpenXLA backend (#449) #513

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: OpenXLA performance table and the transferable-precision decision record (#449)#562

docs: OpenXLA performance table and the transferable-precision decision record (#449)#562
inureyes merged 1 commit into
mainfrom
feat/517-xla-perf-docs

inureyes commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inureyes commented Jul 1, 2026

What

Decision recorded

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant