profiling radeon 7900xt (20gb vram)

# /s/bee-rocm/build/bin/llama-server --version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 20464 MiB):
  Device 0: AMD Radeon RX 7900 XT, gfx1100 (0x1100), VMM: no, Wave Size: 32, VRAM: 20464 MiB
version: 9459 (07ac3cec6)
built with GNU 16.1.0 for Linux x86_64

# 1. Baseline – no DFlash
## remove draft model / --spec-type dflash / other --spec-* DFlash args
 export GGML_DFLASH_PROFILE=1
 /s/bee-rocm/build/bin/llama-server \
 -m /root/.models/dflash/Qwen3.6-27B-Q4_K_M.gguf \                                                  
 -md /root/.models/dflash/dflash-draft-3.6-q4_k_m.gguf \
 --host 0.0.0.0 --port 8080 \                                              
 --jinja --metrics -ngl all -ngld all \
 --reasoning on -fa on --mlock --no-mmap -np 1 \
 -ctk turbo4 -ctv turbo4 \
 -lv 3 --log-timestamps --log-prefix --log-colors off \
 -c 128000 
##result
 tokens: 481 
 wall tokens/s: 37.03
 decode tokens/s: 41.01
# 2. DFlash default
## same command, with DFlash enabled, but no forced --spec-draft-n-max 3
added 
 --spec-type dflash \
 --spec-dflash-cross-ctx 512 \
 --spec-draft-ngl 999 \      
##result
tokens: 479
wall tokens/s: 38.07
decode tokens/s: 42.30

# 3. DFlash with n-max 3
## same as #2, but add --spec-draft-n-max 3
##result
tokens: 481
wall tokens/s: 37.22
decode tokens/s: 41.14

Test: Write a complete Python 3 module implementing a doubly-linked list with the following methods: append, prepend, insert_at, remove_at, find, reverse, to_list, length, is_empty, iter. Include comprehensive docstrings, type hints, and pytest unit tests for every method. Return only the code, no commentary.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

profiling radeon 7900xt (20gb vram) #40

/s/bee-rocm/build/bin/llama-server --version

1. Baseline – no DFlash

remove draft model / --spec-type dflash / other --spec-* DFlash args

2. DFlash default

same command, with DFlash enabled, but no forced --spec-draft-n-max 3

3. DFlash with n-max 3

same as #2, but add --spec-draft-n-max 3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

profiling radeon 7900xt (20gb vram) #40

Description

/s/bee-rocm/build/bin/llama-server --version

1. Baseline – no DFlash

remove draft model / --spec-type dflash / other --spec-* DFlash args

2. DFlash default

same command, with DFlash enabled, but no forced --spec-draft-n-max 3

3. DFlash with n-max 3

same as #2, but add --spec-draft-n-max 3

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions