Optimized RVV q1_0 dot by pl752 · Pull Request #31 · PrismML-Eng/llama.cpp

pl752 · 2026-05-04T19:14:52Z

Continuation of #10 for risc-v V extension

Implemented two fixed vlen kernels loosely inspired by AVX2 implementation
VLA causes severe overhead and task only have two realistic VL combinations (in simple form)

Benchmarks were performed with:
OrangePI RV2 sbc (Ky X1 / spacemit k1) 8gb
Armbian Debian trixie rolling release at 6.18.26-current-spacemit kernel
Built with official Spacemit toolchain, but IME wasn't used.
Command: llama-bench -m Bonsai-1.7B.gguf -p 64 -n 16 -t 8 -r 3 -fa 1 -mmp 0
Perplexity for 5x512 chunks: Mean KLD 0.00027, PPL 21.09, Same top p 99,22%

Flow	`pp 64` t/s	`tg 16` t/s	Speedup
Scalar	1.19	0.94	1.0x / 1.0x
`VL128*`	6.08	4.65	5.1x / 4.9x
`VL256`	10.56	7.84	8.9x / 8.3x

* forced VLEN 128 kernel with LMUL=2, for VLEN >= 256: LMUL=1

As always, I would appreciate your feedback

khosravipasha · 2026-05-04T20:13:12Z

Thanks, that's impressive speed on such device :)

Do people need a special setup to build and run this, or the llama.cpp build tools work?

Would be happy to merge it to our fork, don't have a similar device to test it myself though. Will review more closely later this week.

For some reason stopped getting email notifications from Github.

Copilot

Pull request overview

Adds a RISC-V RVV-specific implementation for the q1_0 × q8_0 dot product in the CPU backend, continuing the codebase’s architecture-specific quantized dot-product optimizations.

Changes:

Added two fixed-width RVV kernels for ggml_vec_dot_q1_0_q8_0 targeting 128-bit and 256-bit vector configurations.
Added RVV runtime dispatch in the RISC-V quantized dot-product path.
Updated the RISC-V fallback aliasing so this path can call the true generic implementation when needed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`ggml/src/ggml-cpu/arch/riscv/quants.c`	Adds the new RVV q1_0×q8_0 kernels, helper tables, and runtime dispatch logic.
`ggml/src/ggml-cpu/arch-fallback.h`	Removes the RISC-V alias for the q1_0 generic dot product so the arch-specific implementation can fall back correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

pl752 · 2026-05-05T06:35:43Z

I don't actually know if llama.cpp accounts for Zvl64b, it seems it's for embedded or 32 bit cores

khosravipasha · 2026-05-05T19:53:32Z

yeah copilot might be confused.

Saw a similar PR in main llama.cpp ggml-org#22500
is that related?

pl752 added 2 commits May 3, 2026 20:13

RVV Q1_0 1x1 dot vla

667e09b

RVV Q1_0 1x1 dot fixed vl instead of vla

4c0300f

github-actions Bot added the ggml label May 4, 2026

khosravipasha requested a review from Copilot May 4, 2026 20:13

Copilot started reviewing on behalf of khosravipasha May 4, 2026 20:13 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread ggml/src/ggml-cpu/arch/riscv/quants.c Outdated

Accounted for VLEN=64 even though min VLEN for V ext is 128

9ec8805

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized RVV q1_0 dot#31

Optimized RVV q1_0 dot#31
pl752 wants to merge 3 commits intoPrismML-Eng:prismfrom
pl752:perf/q1_0_rvv_dot

pl752 commented May 4, 2026

Uh oh!

khosravipasha commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

pl752 commented May 5, 2026

Uh oh!

khosravipasha commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pl752 commented May 4, 2026

Uh oh!

khosravipasha commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

pl752 commented May 5, 2026

Uh oh!

khosravipasha commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khosravipasha commented May 5, 2026 •

edited

Loading