Skip to content

Fix patch for rocm6.3#793

Merged
al-rigazzi merged 4 commits intodevelopfrom
fix_rocm63
Feb 5, 2026
Merged

Fix patch for rocm6.3#793
al-rigazzi merged 4 commits intodevelopfrom
fix_rocm63

Conversation

@al-rigazzi
Copy link
Collaborator

@al-rigazzi al-rigazzi commented Feb 3, 2026

The header for ROCM 6.3.4 is not compatible with PyTorch 2.7.1 regex. Additionally, an old patch is not useful anymore and has been removed.

@al-rigazzi al-rigazzi requested review from MattToast and ashao February 3, 2026 16:12
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question about the new patch!

Also you mentioned:

Additionally, an old patch is not useful anymore and has been removed.

but it doesn't look like anything is removed in this PR. Do you still want to add that to this PR?

"regex": ".*string.*",
"replacement": ""
"regex": "\\[ \\]\\+",
"replacement": "[ \\t]+"
Copy link
Member

@MattToast MattToast Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not wrong and not sure if this matters to you, but wanted to sanity check real quick:

The way that we currently patch the cmake files is kinda weird, where characters need to be escaped multiple times with the multiple hops from json -> py str -> IO stream, so for instance if we have these patches:

// patches.json (be sure to scrub the comments)
[
    {
        "description": "Fix LoadHIP.cmake regex to match tabs in ROCm version header",
        "source_file": "../package/libtorch/share/cmake/Caffe2/public/LoadHIP.cmake",
        "regex": "\\[ \\]\\+",
        "replacement": "[ \\t]+"
    },
    {
        "description": "What I think you might have meant for the replacement",
        "source_file": "",
        "regex": "",
        "replacement": "[ \\\\t]+"  // <-- Note the extra `\\` here
    }
]

and this little patcher script that roughly matches what we do in smartsim._core._install.redisaiBuilder.RedisAIBuilder._patch_source_files

from smartsim._core._install.mlpackages import RAIPatch
import json

def main() -> int:
    with open("./patches.json", "r") as f:
        curr_kw, fix_kw = json.load(f)
    curr_p = RAIPatch(**curr_kw)
    fix_p = RAIPatch(**fix_kw)

    with (
        open("./orig.cmake", "r") as fi,    # <-- Exact copy of the `LoadHIP.cmake` file
        open("./out.cmake.txt", "w") as fo  # <-- Some random output file
    ):
        for l in fi:
            if m := curr_p.regex.search(l):
                fo.write(f"- | {m.string}")
                fo.write(f"+ | {curr_p.regex.sub(curr_p.replacement, l)}")
                #                                ^^^^^^^^^^^^^^^^^^
                # Using the current replacement as written in this PR
                fo.write(f"? | {curr_p.regex.sub(fix_p.replacement, l)}")
                #                                ^^^^^^^^^^^^^^^^^
                # Using an updated replacement that I _think_ is what you intended
            else:
                fo.write(f"  | {l}")

    return 0

if __name__ == "__main__":
    raise SystemExit(main())

will result in the following being written to a file

  |   string(STRIP ${TEMP2} ROCM_VERSION_DEV_MAJOR)
- |   string(REGEX MATCH "${ROCM_LIB_NAME}_VERSION_MINOR[ ]+[0-9]+" TEMP1 ${ROCM_HEADER_CONTENT})
+ |   string(REGEX MATCH "${ROCM_LIB_NAME}_VERSION_MINOR[ 	]+[0-9]+" TEMP1 ${ROCM_HEADER_CONTENT})
? |   string(REGEX MATCH "${ROCM_LIB_NAME}_VERSION_MINOR[ \t]+[0-9]+" TEMP1 ${ROCM_HEADER_CONTENT})
  |   string(REPLACE "${ROCM_LIB_NAME}_VERSION_MINOR" "" TEMP2 ${TEMP1})

where the + line (as currently written) is writing the literal <tab> character, and the ? line (possibly intended instead) is writing the \t escape sequence.

Copy link
Collaborator Author

@al-rigazzi al-rigazzi Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we identified that in the rocm_version.h file for ROCm 6.3.4, the only way of parsing it correctly (with PT 2.7.1) was to replace the space with a real tab - I can try with the escaped version.

The diff collapsed the two changes, basically I removed the replacement of .string. and added the new one, but in the diff it looks like I have updated the .string. rule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff collapsed the two changes, basically I removed the replacement of .string. and added the new one, but in the diff it looks like I have updated the .string. rule.

Ahhh that makes sense!!

And yeah, I've got maybe a slight preference for using \t over the literal <tab> character, just for future debugging sake, but not enough to overly push the issue. Give it a try, but if it doesn't work immediately this is good enough imo just because all of these files get cleaned up with smart build anyway.

@al-rigazzi al-rigazzi requested a review from MattToast February 4, 2026 15:41
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending \t vs <tab> character experiment!!

@al-rigazzi
Copy link
Collaborator Author

LGTM pending \t vs character experiment!!

OK, I tried, it works and who am I to say no to MattToast?

@al-rigazzi al-rigazzi merged commit 63c4098 into develop Feb 5, 2026
25 checks passed
@al-rigazzi al-rigazzi deleted the fix_rocm63 branch February 5, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants