1cat vllm#50
Merged
Merged
Conversation
✅ AccelMark Validation: All submissions validSee the workflow run for details. |
Adds the AccelMark runner for the 1Cat-vLLM community fork that re-enables AWQ 4-bit inference on Volta (SM70) Tesla V100 via lmdeploy TurboMind kernels and the FLASH_ATTN_V100 attention backend. What is included: * runners/nvidia_onecat_vllm_a43d1bcf/ — runner.py, meta.json (with hardware_label="NVIDIA V100 (SM70)" and suite_support self-declaration), requirements.txt, README.md * configs/runner_configs/runner_nvidia_onecat_vllm_a43d1bcf.yaml.example The README platforms matrix updates automatically — the hardware label is taken from meta.hardware_label rather than the catalogue default, so the V100-specific row is rendered correctly without touching schema/platforms.json or any shared file. Capability flags: * SUPPORTED_PRECISIONS drops BF16 (V100 has no native BF16 datapath). * SUPPORTED_QUANTIZATION_BACKENDS lists only AWQ — the fork's headline contribution; FP8 KV cache and other formats are intentionally not exposed by default. * Auto-injects attention_backend=FLASH_ATTN_V100 unless the user overrides it. * Suite F (Qwen2.5-0.5B-Instruct on a consumer/edge GPU) is marked unsupported — 1Cat-vLLM targets dense + MoE on 4 x V100, not edge inference. Initial commit, not yet validated end-to-end on hardware; all applicable suites are marked "pending". Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Type of change
Testing
# Commands used to verifyChecklist
result.jsonfiles (or I have explained the migration path)BenchmarkRunner, produces validresult.json, includes a reference resultvalidate_submission.pyupdated and all existing results still validateleaderboard/generate.pyproduces correct output on existing resultsRelated issues