Skip to content

[CI Failure]: mi325_8: Language Models Tests (Hybrid) %N #29462

@AndreasKaratzas

Description

@AndreasKaratzas

Name of failing test

uv pip install --system --no-build-isolation 'git+https://github.com/state-spaces/mamba@v2.2.5' && uv pip install --system --no-build-isolation 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.2' && pytest -v -s models/language/generation -m hybrid_model --num-shards=$BUILDKITE_PARALLEL_JOB_COUNT --shard-id=$BUILDKITE_PARALLEL_JOB

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

Parallel test execution in Shard 4 - C++ extension compilation failure during test runtime

Failure: RuntimeError during JIT compilation - "Error compiling objects for extension"

Stack trace highlights:

  • torch/utils/cpp_extension.py:2612 in _run_ninja_build
  • _write_ninja_file_and_compile_objects → ninja build process
  • Extension compilation through setuptools/Cython build_ext

Configuration: Parallel pytest execution (shard 4 of multi-shard run)

Likely cause: JIT compilation failure for PyTorch custom extensions on ROCm. When vLLM imports model code, PyTorch attempts to compile custom CUDA/ROCm kernels on-the-fly using ninja. The compilation crashes on ROCm, possibly due to:

  1. missing ROCm compilation toolchain components (hipcc, rocm-dev packages)
  2. incompatible compiler flags between CUDA and ROCm compilation paths
  3. parallel shard conflicts where multiple test processes simultaneously attempt to compile the same extension to the same cache location, or
  4. insufficient memory/resources during parallel compilation.

The "One of the processes failed with 1" message confirms a process crash during the build phase.

📝 History of failing test

AMD-CI build Buildkite references:

  • 1041
  • 1077
  • 1088
  • 1109
  • 1111

CC List.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    No status

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions