Skip to content

Conversation

@pallavisharma6802
Copy link

What does this PR do?

Fixes a crash when using device_map="auto" in environments where "cpu" is missing from the inferred max_memory dictionary.

Bug

When max_memory is not explicitly provided, accelerate.get_max_memory() may return a dictionary without a "cpu" key (depending on the backend/environment).
_get_device_map then unconditionally scales "cpu" memory, resulting in a KeyError.

Fix

  • Guard access to "cpu" in inferred_max_memory before applying the scaling factor.
  • Preserve existing behavior when "cpu" is present.

Tests

  • Added tests/utils/test_device_map_cpu_guard.py to ensure device_map="auto" does not crash when "cpu" is absent from max_memory.

Fixes #42994

Before submitting

  • Did you read the contributor guideline?
  • Did you write any new necessary tests?

Who can review?

@Cyrilvallez

@pallavisharma6802
Copy link
Author

Thanks for the report! This PR adds a defensive fix for the device_map="auto" crash and includes a regression test.
Happy to make any adjustments if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

quantized model saving failed

1 participant