Skip to content

Eval bug: DFlash keeps outputting / on my multi-GPU system #42

@IndigoFloyd

Description

@IndigoFloyd

Name and Version

9868 (3706706)

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX 2080Ti 22GB + Tesla P100 16GB

Models

No response

Problem description & steps to reproduce

Command

CUDA_VISIBLE_DEVICES=0,1 GGML_DFLASH_PROFILE=1 GGML_DFLASH_DEBUG=1 ./build-cuda-bee/bin/llama-server -m /home/zhn/lmcpp/Unsloth-Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf --spec-draft-model /home/zhn/lmcpp/Unsloth-Qwen3.6-27B-GGUF/Qwen3.6-27B-DFlash-IQ4_XS.gguf --spec-type dflash --spec-dflash-cross-ctx 1024 --spec-branch-budget 0 --port 8083 -np 1 --kv-unified -ngl all --spec-draft-ngl all -sm layer -mg 0 -b 2048 -ub 512 --ctx-size 100000 --cache-type-k q4_0 --cache-type-v q4_0 --flash-attn on --cache-ram 0 --jinja --reasoning on --chat-template-kwargs '{"preserve_thinking":true}' --temp 0.6 --top-k 20 --top-p 1.0 --min-p 0.0

First Bad Commit

No response

Relevant log output

Logs

Log:

0.14.967.966 I dflash profile: draft ctx=382 cross_len=382 n_draft=15 produced=0 cross=0.008 ms batch=0.002 ms decode=80.408 ms argmax=4.843 ms total=85.261 ms gpu_ring=1 graph_reuse=18
0.14.967.986 I srv  dflash_log_r: dflash reduced verifier decision: graph_enabled=0 view_enabled=0 view_start=0 n_tokens=1 top_k=1 reason=no-eligible-slot
0.14.973.151 I srv  update_slots:   verify ubatch: 1 tok, 5.2ms (5.16ms/tok)
0.15.029.661 I dflash profile: ring_write requested=1 written=1 cpu_copy=0.000 ms gpu_enqueue=0.083 ms gpu_sync=0.040 ms ring_filled_before=382 committed_before=382 gpu=1
0.15.030.349 I dflash profile: kv_cache_update requested=1 update=1 ok=1 time=0.458 ms ring_pos=383 filled=383 committed=383
0.15.030.599 I srv  update_slots: spec cycle (1 slots): draft=85.3ms verify=5.2ms accept=0.0ms other=57.5ms replay_sync=0.0ms recurrent_backup=0.0ms backup_enqueue=0.0ms backup_sync=0.0ms backup_layers=0 backup_tensors=0 backup_cuda_d2d=0 backup_fallback=0 tape_record=0.0ms total=147.9ms
0.15.116.926 E dflash: invalid reduced-logits token -1 in draft at row=1/16 (top_k=1 committed=383 cross_len=383)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions