Skip to content

Conversation

@marc-chevalier
Copy link
Member

@marc-chevalier marc-chevalier commented Nov 20, 2025

We are on aarch64.

When a function needs stack extension, we build a stack that has this shape:

// Remove the extension of the caller's frame used for inline type unpacking
//
// Right now the stack looks like this:
//
// | Arguments from caller |
// |---------------------------| <-- caller's SP
// | Saved LR #1 |
// | Saved FP #1 |
// |---------------------------|
// | Extension space for |
// | inline arg (un)packing |
// |---------------------------| <-- start of this method's frame
// | Saved LR #2 |
// | Saved FP #2 |
// |---------------------------| <-- FP
// | sp_inc |
// | method locals |
// |---------------------------| <-- SP
//
// There are two copies of FP and LR on the stack. They will be identical at
// first, but that can change.
// If the caller has been deoptimized, LR #1 will be patched to point at the
// deopt blob, and LR #2 will still point into the old method.
// If the saved FP (x29) was not used as the frame pointer, but to store an
// oop, the GC will be aware only of FP #2 as the spilled location of x29 and
// will fix only this one.
//
// When restoring, one must then load FP #2 into x29, and LR #1 into x30,
// while keeping in mind that from the scalarized entry point, there will be
// only one copy of each.
//
// The sp_inc stack slot holds the total size of the frame including the
// extension space minus two words for the saved FP and LR. That is how to
// find LR #1. FP #2 is always located just after sp_inc.

Currently, when leaving the frame, we use LR §1 (I use § not to mess with github rendering that interpret # as PR references) as return address (because it can be patched for deoptimization), and FP §2 to restore x29 (because when it contains an oop, the GC is only aware of this copy).

In our failing case, we have a C2-compiled frame that is being deoptimized when returning from a call to an interpreted method. During deoptimization, the function frame::sender_for_compiled_frame(RegisterMap*) const is used to locate the location on the stack where rfp (x29) is saved.

inline frame frame::sender_for_compiled_frame(RegisterMap* map) const {

Actually this function is a bit more general: it computes the sender frame of a compiled frame, and build the RegisterMap. The problem is that during deoptimization, this function locates the wrong save of rfp (FP §1) because the C2 frame is being modified by the deoptimization process and it's not anymore recognized as a C2-compiled method that needs stack repairs. In this modified frame the sender's sp is correctly known (or the deoptimization mechanism would not work), and the saved FP is taken just 2 words above: that is FP §1. On top of that, if rfp contained an oop and the GC moved the pointed object during the call we are returning from, the value we get for rfp is not valid anymore.

The good and bad news is that the GC also locates the saved location of rfp thanks to the same function. The bad news is that GC sees the C2 frame correctly, and so sender_for_compiled_frame can locate FP §2. We can follow a few ideas:

  • make the deoptimized frame bottom under FP/LR §2. This is not possible, for many reasons: we need LR §1, we need to remove the whole frame to find the sender's frame...
  • make sender_for_compiled_frame detects when the deoptimized frame is the one of a C2 compiled method that needs stack repair. No idea how to do that! Also, it seems brittle, and more complicated than the next solution.
  • always pick FP §1: since the deoptimized frame will pick FP §1, in case it's a regular C2 frame, we can also make sure to use FP §1. It is the simplest solution and the one I explain after.

In JDK-8365996, the problem was pretty much the opposite: remove_frame was using FP §1 to restore rfp but the GC only updates FP §2. So the solution was to restore from FP §2:

https://github.com/openjdk/valhalla/pull/1540/files#diff-0f4150a9c607ccd590bf256daa800c0276144682a92bc6bdced5e8bc1bb81f3aR6140-R6145

Here the solution is to revert this part (restore rfp from FP §1), and let GC knows about FP §1 only in sender_for_compiled_frame. Overall, let's never speak about FP/LR §2. This way, we always have the sender's sp, the saved LR and FP consecutively. FP/LR §2 is only needed to make space between the unpacked arguments and the locals, as there would be between regular arguments and locals. They could have a fictive value and we should probably implement that.

This make virtual thread tests fail massively. Surely because of mismatch between our choice of FP §1 or 2. Let's problem list this for now... To help with that, I introduced frame::compiled_frame_details() const to do all this little tricks and return the location of LR/FP§1 and the sender's sp at once, without letting the frame users have to figure out the internal structure.

I found the extraction of compiled_frame_details a bit risky, so I proceded in steps: first making the function, calling it from sender_for_compiled_frame and compare the results with the old way of getting everything. These temporary assert weren't triggered, so I actually used the returned value of compiled_frame_details and removed the newly useless code from sender_for_compiled_frame. So I'm rather confident it does as good as before.

I don't have much opinion about the names of compiled_frame_details and CompiledFramePointers, feel free to suggest better if you have a better idea.

Thanks,
Marc


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JDK-8367151: [lworld] CorrectlyRestoreRfp.java triggers "bad oop found" during deoptimization (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1751/head:pull/1751
$ git checkout pull/1751

Update a local copy of the PR:
$ git checkout pull/1751
$ git pull https://git.openjdk.org/valhalla.git pull/1751/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1751

View PR using the GUI difftool:
$ git pr show -t 1751

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1751.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 20, 2025

👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 20, 2025

@marc-chevalier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8367151: [lworld] CorrectlyRestoreRfp.java triggers "bad oop found" during deoptimization

Reviewed-by: thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the lworld branch:

  • c73220a: 8372345: [lworld] Problem list JDK-8372341

Please see this link for an up-to-date comparison between the source branch of this pull request and the lworld branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the lworld branch, type /integrate in a new comment.

@marc-chevalier marc-chevalier marked this pull request as ready for review November 20, 2025 12:18
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 20, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 20, 2025

Webrevs

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, this is great for future reference!

The changes look good to me. Just a few comments.

I don't have much opinion about the names of compiled_frame_details and CompiledFramePointers, feel free to suggest better if you have a better idea.

Naming is fine with me but do we even need to factor this logic out? Do you expect it to be used in more places in the future?

cfp.sender_pc_addr = (address*)(l_sender_sp - frame::return_addr_offset);

#ifdef ASSERT
// when the stack was extnded (so LR #1 and LR #2 are distinct) and LR #1 was patched
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// when the stack was extnded (so LR #1 and LR #2 are distinct) and LR #1 was patched
// when the stack was extended (so LR #1 and LR #2 are distinct) and LR #1 was patched

// find FP/LR #1. This size is expressed in bytes. Be careful when using it
// from C++ in pointer arithmetic; you might need to divide it by wordSize.
//
// TODO 8371993 store fake values instyead of LR/FP#2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO 8371993 store fake values instyead of LR/FP#2
// TODO 8371993 store fake values instead of LR/FP#2

@TobiHartmann
Copy link
Member

There seems to be a merge conflict.

@openjdk
Copy link

openjdk bot commented Nov 21, 2025

@marc-chevalier this pull request can not be integrated into lworld due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8367151
git fetch https://git.openjdk.org/valhalla.git lworld
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge lworld"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Nov 21, 2025
@marc-chevalier
Copy link
Member Author

marc-chevalier commented Nov 21, 2025

Naming is fine with me but do we even need to factor this logic out? Do you expect it to be used in more places in the future?

I suspect we might need a similar thing in virtual threads. I've seen other places where we do this trick of finding the increment and fixing the framesize to find the sp of the caller. For instance

template<typename FKind>
inline frame FreezeBase::sender(const frame& f) {
assert(FKind::is_instance(f), "");
if (FKind::interpreted) {
return frame(f.sender_sp(), f.interpreter_frame_sender_sp(), f.link(), f.sender_pc());
}
intptr_t** link_addr = link_address<FKind>(f);
intptr_t* sender_sp = (intptr_t*)(link_addr + frame::sender_sp_offset); // f.unextended_sp() + (fsize/wordSize); //
address sender_pc = ContinuationHelper::return_address_at(sender_sp - 1);
assert(sender_sp != f.sp(), "must have changed");
int slot = 0;
CodeBlob* sender_cb = CodeCache::find_blob_and_oopmap(sender_pc, slot);
// Repair the sender sp if the frame has been extended
if (sender_cb->is_nmethod()) {
sender_sp = f.repair_sender_sp(sender_sp, link_addr);
}
return sender_cb != nullptr
? frame(sender_sp, sender_sp, *link_addr, sender_pc, sender_cb,
slot == -1 ? nullptr : sender_cb->oop_map_for_slot(slot, sender_pc),
false /* on_heap ? */)
: frame(sender_sp, sender_sp, *link_addr, sender_pc);
}

looks a lot like what was in sender_for_compiled_frame that I've extracted. I think it's a bit subtle and worth delegating to a common method. It is true, it uses repair_sender_sp but so does my new compiled_frame_details and there is still work to do around.

@TobiHartmann
Copy link
Member

Right, that makes sense to me. @pchilano might want to re-use that code when fixing the Virtual Threads part.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Nov 21, 2025
@marc-chevalier
Copy link
Member Author

/integrate

Thanks @TobiHartmann!

@openjdk
Copy link

openjdk bot commented Nov 24, 2025

Going to push as commit 405db7a.
Since your change was applied there have been 2 commits pushed to the lworld branch:

  • a483e8c: 8209554: [lworld] ClassCastException thrown for JCK test instead of expected IllegalArgumentException
  • c73220a: 8372345: [lworld] Problem list JDK-8372341

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 24, 2025
@openjdk openjdk bot closed this Nov 24, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 24, 2025
@openjdk
Copy link

openjdk bot commented Nov 24, 2025

@marc-chevalier Pushed as commit 405db7a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

2 participants