Add M5 Max 40 GPU (128 GB) benchmark data and visualization#97
Open
Chida82 wants to merge 2 commits into
Open
Conversation
|
Was your laptop connected to a power outlet? My numbers are a bit higher (+2 TPS): ctx_tokens,prefill_tokens,prefill_tps,gen_tokens,gen_tps,kvcache_bytes
2048,2048,375.87,128,31.32,52184460
4096,2048,339.83,128,30.95,80373132
6144,2048,327.62,128,30.91,108561804
8192,2048,311.13,128,27.62,136750476
10240,2048,302.70,128,25.06,164939148
12288,2048,311.74,128,24.94,193127820
14336,2048,315.33,128,24.78,221316492
16384,2048,316.74,128,24.77,249505164
18432,2048,307.98,128,24.79,277693836
20480,2048,304.57,128,26.49,305882508
22528,2048,289.75,128,29.63,334071180
24576,2048,279.28,128,29.63,362259852
26624,2048,272.07,128,29.18,390448524
28672,2048,267.91,128,29.12,418637196
30720,2048,264.20,128,28.83,446825868
32768,2048,261.81,128,28.78,475014540
34816,2048,256.72,128,28.24,503203212
36864,2048,254.37,128,28.27,531391884
38912,2048,253.33,128,28.08,559580556
40960,2048,248.62,128,28.23,587769228
43008,2048,247.22,128,27.73,615957900
45056,2048,244.05,128,27.89,644146572
47104,2048,242.79,128,27.84,672335244
49152,2048,240.79,128,27.87,700523916
51200,2048,233.38,128,27.57,728712588
53248,2048,241.35,128,27.46,756901260
55296,2048,235.96,128,27.34,785089932
57344,2048,240.54,128,26.84,813278604
59392,2048,231.79,128,27.11,841467276
61440,2048,228.84,128,27.25,869655948
63488,2048,229.56,128,26.94,897844620
65536,2048,227.26,128,26.81,926033292 |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Thanks for the awesome work. Here is the benchmark for my M5 Max with CPU 18-Core and GPU 40-Core.
Hardware: Apple M5 Max with CPU 18-Core and GPU 40-Core, 128 GB unified memory. Metal backend, IQ2XXS w2Q2K imatrix GGUF.
Ran the command from speed-bench/README.md:
./ds4-bench \ -m ds4flash.gguf \ --prompt-file speed-bench/promessi_sposi.txt \ --ctx-start 2048 \ --ctx-max 65536 \ --step-incr 2048 \ --gen-tokens 128Then:
python3 speed-bench/plot_speed.py speed-bench/m5_max_40_gpu.csv --title "M5 Max 40 Gpu t/s"other info:
ds4-bench: context buffers 1311.89 MiB (ctx=65665, backend=metal, prefill_chunk=2048, raw_kv_rows=2304, compressed_kv_rows=16418)
ds4: Metal device Apple M5 Max, 128.00 GiB RAM
ds4: requesting Metal residency (may take tens of seconds)... done
ds4: warming Metal model views... done
ds4: Metal model views created in 2.354 ms, residency requested in 277.150 ms, warmup 4.174 ms (mapped 82697.67 MiB from offset 5.08 MiB)
ds4: Metal mapped mmaped model as 2 overlapping shared buffers
ds4: metal backend initialized for graph diagnostics