From e9ddb3484092f20a4f55e1a5f564cb9b5bbcbb4e Mon Sep 17 00:00:00 2001
From: Lotus Fenn <lotus@nexthop.ai>
Date: Wed, 29 Oct 2025 01:35:49 +0000
Subject: [PATCH] Add a patch for printing the AMD Zen CPU reset reason

If I intentionally trigger a CPU soft reset I see this:
```
admin@gold208-dut:~$ sudo dmesg | grep -i reason
[    0.635233] x86/amd: Previous system reset reason [0x00080800]: software wrote 0x6 to reset control register 0xCF9
```

If I intentionally trigger the CPU FCH Watchdog, I see this:
```
admin@gold208-dut:~$ sudo dmesg | grep reason
[    0.632563] x86/amd: Previous system reset reason [0x02000800]: hardware watchdog timer expired
```

Upstream from here:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=ab8131028710d009ab93d6bffd2a2749ade909b0

The patch had to be adapted to v6.1 we're using, that was basically
adding the entire contents (5 constants) of `fch.h` as the file didn't
exist in v6.1, and updating the patch for `amd.c` for context.

Signed-off-by: Nate White <nate@nexthop.ai>
---
 ...-Print-the-reason-for-the-last-reset.patch | 519 ++++++++++++++++++
 ...MD-Ignore-invalid-reset-reason-value.patch |  64 +++
 patches-sonic/series                          |   4 +
 3 files changed, 587 insertions(+)
 create mode 100644 patches-sonic/0001-x86-CPU-AMD-Print-the-reason-for-the-last-reset.patch
 create mode 100644 patches-sonic/0002-x86-CPU-AMD-Ignore-invalid-reset-reason-value.patch

diff --git a/patches-sonic/0001-x86-CPU-AMD-Print-the-reason-for-the-last-reset.patch b/patches-sonic/0001-x86-CPU-AMD-Print-the-reason-for-the-last-reset.patch
new file mode 100644
index 000000000..7f247882d
--- /dev/null
+++ b/patches-sonic/0001-x86-CPU-AMD-Print-the-reason-for-the-last-reset.patch
@@ -0,0 +1,519 @@
+From d0dc75304bc26dfb821f710ddf0d428484620af1 Mon Sep 17 00:00:00 2001
+From: Yazen Ghannam <yazen.ghannam@amd.com>
+Date: Tue, 22 Apr 2025 18:48:30 -0500
+Subject: [PATCH 1/2] x86/CPU/AMD: Print the reason for the last reset
+
+[ Upstream commit ab8131028710d009ab93d6bffd2a2749ade909b0 ]
+
+The following register contains bits that indicate the cause for the
+previous reset.
+
+  PMx000000C0 (FCH::PM::S5_RESET_STATUS)
+
+This is useful for debug. The reasons for reset are broken into 6 high level
+categories. Decode it by category and print during boot.
+
+Specifics within a category are split off into debugging documentation.
+
+The register is accessed indirectly through a "PM" port in the FCH. Use
+MMIO access in order to avoid restrictions with legacy port access.
+
+Use a late_initcall() to ensure that MMIO has been set up before trying to
+access the register.
+
+This register was introduced with AMD Family 17h, so avoid access on older
+families. There is no CPUID feature bit for this register.
+
+  [ bp: Simplify the reason dumping loop.
+    - merge a fix to not access an array element after the last one:
+      https://lore.kernel.org/r/20250505133609.83933-1-superm1@kernel.org
+      Reported-by: James Dutton <james.dutton@gmail.com>
+      ]
+
+  [ mingo:
+    - Use consistent .rst formatting
+    - Fix 'Sleep' class field to 'ACPI-State'
+    - Standardize pin messages around the 'tripped' verbiage
+    - Remove reference to ring-buffer printing & simplify the wording
+    - Use curly braces for multi-line conditional statements ]
+
+Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
+Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
+Link: https://lore.kernel.org/20250422234830.2840784-6-superm1@kernel.org
+---
+ Documentation/arch/x86/amd-debugging.rst | 368 +++++++++++++++++++++++
+ arch/x86/include/asm/amd/fch.h           |  13 +
+ arch/x86/kernel/cpu/amd.c                |  54 ++++
+ 3 files changed, 435 insertions(+)
+ create mode 100644 Documentation/arch/x86/amd-debugging.rst
+ create mode 100644 arch/x86/include/asm/amd/fch.h
+
+diff --git a/Documentation/arch/x86/amd-debugging.rst b/Documentation/arch/x86/amd-debugging.rst
+new file mode 100644
+index 000000000000..d92bf59d62c7
+--- /dev/null
++++ b/Documentation/arch/x86/amd-debugging.rst
+@@ -0,0 +1,368 @@
++.. SPDX-License-Identifier: GPL-2.0
++
++Debugging AMD Zen systems
+++++++++++++++++++++++++++
++
++Introduction
++============
++
++This document describes techniques that are useful for debugging issues with
++AMD Zen systems.  It is intended for use by developers and technical users
++to help identify and resolve issues.
++
++S3 vs s2idle
++============
++
++On AMD systems, it's not possible to simultaneously support suspend-to-RAM (S3)
++and suspend-to-idle (s2idle).  To confirm which mode your system supports you
++can look at ``cat /sys/power/mem_sleep``.  If it shows ``s2idle [deep]`` then
++*S3* is supported.  If it shows ``[s2idle]`` then *s2idle* is
++supported.
++
++On systems that support *S3*, the firmware will be utilized to put all hardware into
++the appropriate low power state.
++
++On systems that support *s2idle*, the kernel will be responsible for transitioning devices
++into the appropriate low power state. When all devices are in the appropriate low
++power state, the hardware will transition into a hardware sleep state.
++
++After a suspend cycle you can tell how much time was spent in a hardware sleep
++state by looking at ``cat /sys/power/suspend_stats/last_hw_sleep``.
++
++This flowchart explains how the AMD s2idle suspend flow works.
++
++.. kernel-figure:: suspend.svg
++
++This flowchart explains how the amd s2idle resume flow works.
++
++.. kernel-figure:: resume.svg
++
++s2idle debugging tool
++=====================
++
++As there are a lot of places that problems can occur, a debugging tool has been
++created at
++`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
++that can help test for common problems and offer suggestions.
++
++If you have an s2idle issue, it's best to start with this and follow instructions
++from its findings.  If you continue to have an issue, raise a bug with the
++report generated from this script to
++`drm/amd gitlab <https://gitlab.freedesktop.org/drm/amd/-/issues/new?issuable_template=s2idle_BUG_TEMPLATE>`_.
++
++Spurious s2idle wakeups from an IRQ
++===================================
++
++Spurious wakeups will generally have an IRQ set to ``/sys/power/pm_wakeup_irq``.
++This can be matched to ``/proc/interrupts`` to determine what device woke the system.
++
++If this isn't enough to debug the problem, then the following sysfs files
++can be set to add more verbosity to the wakeup process: ::
++
++  # echo 1 | sudo tee /sys/power/pm_debug_messages
++  # echo 1 | sudo tee /sys/power/pm_print_times
++
++After making those changes, the kernel will display messages that can
++be traced back to kernel s2idle loop code as well as display any active
++GPIO sources while waking up.
++
++If the wakeup is caused by the ACPI SCI, additional ACPI debugging may be
++needed.  These commands can enable additional trace data: ::
++
++  # echo enable | sudo tee /sys/module/acpi/parameters/trace_state
++  # echo 1 | sudo tee /sys/module/acpi/parameters/aml_debug_output
++  # echo 0x0800000f | sudo tee /sys/module/acpi/parameters/debug_level
++  # echo 0xffff0000 | sudo tee /sys/module/acpi/parameters/debug_layer
++
++Spurious s2idle wakeups from a GPIO
++===================================
++
++If a GPIO is active when waking up the system ideally you would look at the
++schematic to determine what device it is associated with. If the schematic
++is not available, another tactic is to look at the ACPI _EVT() entry
++to determine what device is notified when that GPIO is active.
++
++For a hypothetical example, say that GPIO 59 woke up the system.  You can
++look at the SSDT to determine what device is notified when GPIO 59 is active.
++
++First convert the GPIO number into hex. ::
++
++  $ python3 -c "print(hex(59))"
++  0x3b
++
++Next determine which ACPI table has the ``_EVT`` entry. For example: ::
++
++  $ sudo grep EVT /sys/firmware/acpi/tables/SSDT*
++  grep: /sys/firmware/acpi/tables/SSDT27: binary file matches
++
++Decode this table::
++
++  $ sudo cp /sys/firmware/acpi/tables/SSDT27 .
++  $ sudo iasl -d SSDT27
++
++Then look at the table and find the matching entry for GPIO 0x3b. ::
++
++  Case (0x3B)
++  {
++      M000 (0x393B)
++      M460 ("    Notify (\\_SB.PCI0.GP17.XHC1, 0x02)\n", Zero, Zero, Zero, Zero, Zero, Zero)
++      Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake
++  }
++
++You can see in this case that the device ``\_SB.PCI0.GP17.XHC1`` is notified
++when GPIO 59 is active. It's obvious this is an XHCI controller, but to go a
++step further you can figure out which XHCI controller it is by matching it to
++ACPI.::
++
++  $ grep "PCI0.GP17.XHC1" /sys/bus/acpi/devices/*/path
++  /sys/bus/acpi/devices/device:2d/path:\_SB_.PCI0.GP17.XHC1
++  /sys/bus/acpi/devices/device:2e/path:\_SB_.PCI0.GP17.XHC1.RHUB
++  /sys/bus/acpi/devices/device:2f/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1
++  /sys/bus/acpi/devices/device:30/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM0
++  /sys/bus/acpi/devices/device:31/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM1
++  /sys/bus/acpi/devices/device:32/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT2
++  /sys/bus/acpi/devices/LNXPOWER:0d/path:\_SB_.PCI0.GP17.XHC1.PWRS
++
++Here you can see it matches to ``device:2d``. Look at the ``physical_node``
++to determine what PCI device that actually is. ::
++
++  $ ls -l /sys/bus/acpi/devices/device:2d/physical_node
++  lrwxrwxrwx 1 root root 0 Feb 12 13:22 /sys/bus/acpi/devices/device:2d/physical_node -> ../../../../../pci0000:00/0000:00:08.1/0000:c2:00.4
++
++So there you have it: the PCI device associated with this GPIO wakeup was ``0000:c2:00.4``.
++
++The ``amd_s2idle.py`` script will capture most of these artifacts for you.
++
++s2idle PM debug messages
++========================
++
++During the s2idle flow on AMD systems, the ACPI LPS0 driver is responsible
++to check all uPEP constraints.  Failing uPEP constraints does not prevent
++s0i3 entry.  This means that if some constraints are not met, it is possible
++the kernel may attempt to enter s2idle even if there are some known issues.
++
++To activate PM debugging, either specify ``pm_debug_messagess`` kernel
++command-line option at boot or write to ``/sys/power/pm_debug_messages``.
++Unmet constraints will be displayed in the kernel log and can be
++viewed by logging tools that process kernel ring buffer like ``dmesg`` or
++``journalctl``."
++
++If the system freezes on entry/exit before these messages are flushed, a
++useful debugging tactic is to unbind the ``amd_pmc`` driver to prevent
++notification to the platform to start s0i3 entry.  This will stop the
++system from freezing on entry or exit and let you view all the failed
++constraints. ::
++
++  cd /sys/bus/platform/drivers/amd_pmc
++  ls | grep AMD | sudo tee unbind
++
++After doing this, run the suspend cycle and look specifically for errors around: ::
++
++  ACPI: LPI: Constraint not met; min power state:%s current power state:%s
++
++Historical examples of s2idle issues
++====================================
++
++To help understand the types of issues that can occur and how to debug them,
++here are some historical examples of s2idle issues that have been resolved.
++
++Core offlining
++--------------
++An end user had reported that taking a core offline would prevent the system
++from properly entering s0i3.  This was debugged using internal AMD tools
++to capture and display a stream of metrics from the hardware showing what changed
++when a core was offlined.  It was determined that the hardware didn't get
++notification the offline cores were in the deepest state, and so it prevented
++CPU from going into the deepest state. The issue was debugged to a missing
++command to put cores into C3 upon offline.
++
++`commit d6b88ce2eb9d2 ("ACPI: processor idle: Allow playing dead in C3 state") <https://git.kernel.org/torvalds/c/d6b88ce2eb9d2>`_
++
++Corruption after resume
++-----------------------
++A big problem that occurred with Rembrandt was that there was graphical
++corruption after resume.  This happened because of a misalignment of PSP
++and driver responsibility.  The PSP will save and restore DMCUB, but the
++driver assumed it needed to reset DMCUB on resume.
++This actually was a misalignment for earlier silicon as well, but was not
++observed.
++
++`commit 79d6b9351f086 ("drm/amd/display: Don't reinitialize DMCUB on s0ix resume") <https://git.kernel.org/torvalds/c/79d6b9351f086>`_
++
++Back to Back suspends fail
++--------------------------
++When using a wakeup source that triggers the IRQ to wakeup, a bug in the
++pinctrl-amd driver may capture the wrong state of the IRQ and prevent the
++system going back to sleep properly.
++
++`commit b8c824a869f22 ("pinctrl: amd: Don't save/restore interrupt status and wake status bits") <https://git.kernel.org/torvalds/c/b8c824a869f22>`_
++
++Spurious timer based wakeup after 5 minutes
++-------------------------------------------
++The HPET was being used to program the wakeup source for the system, however
++this was causing a spurious wakeup after 5 minutes.  The correct alarm to use
++was the ACPI alarm.
++
++`commit 3d762e21d5637 ("rtc: cmos: Use ACPI alarm for non-Intel x86 systems too") <https://git.kernel.org/torvalds/c/3d762e21d5637>`_
++
++Disk disappears after resume
++----------------------------
++After resuming from s2idle, the NVME disk would disappear.  This was due to the
++BIOS not specifying the _DSD StorageD3Enable property.  This caused the NVME
++driver not to put the disk into the expected state at suspend and to fail
++on resume.
++
++`commit e79a10652bbd3 ("ACPI: x86: Force StorageD3Enable on more products") <https://git.kernel.org/torvalds/c/e79a10652bbd3>`_
++
++Spurious IRQ1
++-------------
++A number of Renoir, Lucienne, Cezanne, & Barcelo platforms have a
++platform firmware bug where IRQ1 is triggered during s0i3 resume.
++
++This was fixed in the platform firmware, but a number of systems didn't
++receive any more platform firmware updates.
++
++`commit 8e60615e89321 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN") <https://git.kernel.org/torvalds/c/8e60615e89321>`_
++
++Hardware timeout
++----------------
++The hardware performs many actions besides accepting the values from
++amd-pmc driver.  As the communication path with the hardware is a mailbox,
++it's possible that it might not respond quickly enough.
++This issue manifested as a failure to suspend: ::
++
++  PM: dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110
++  amd_pmc AMDI0005:00: PM: failed to suspend noirq: error -110
++
++The timing problem was identified by comparing the values of the idle mask.
++
++`commit 3c3c8e88c8712 ("platform/x86: amd-pmc: Increase the response register timeout") <https://git.kernel.org/torvalds/c/3c3c8e88c8712>`_
++
++Failed to reach hardware sleep state with panel on
++--------------------------------------------------
++On some Strix systems certain panels were observed to block the system from
++entering a hardware sleep state if the internal panel was on during the sequence.
++
++Even though the panel got turned off during suspend it exposed a timing problem
++where an interrupt caused the display hardware to wake up and block low power
++state entry.
++
++`commit 40b8c14936bd2 ("drm/amd/display: Disable unneeded hpd interrupts during dm_init") <https://git.kernel.org/torvalds/c/40b8c14936bd2>`_
++
++Runtime power consumption issues
++================================
++
++Runtime power consumption is influenced by many factors, including but not
++limited to the configuration of the PCIe Active State Power Management (ASPM),
++the display brightness, the EPP policy of the CPU, and the power management
++of the devices.
++
++ASPM
++----
++For the best runtime power consumption, ASPM should be programmed as intended
++by the BIOS from the hardware vendor.  To accomplish this the Linux kernel
++should be compiled with ``CONFIG_PCIEASPM_DEFAULT`` set to ``y`` and the
++sysfs file ``/sys/module/pcie_aspm/parameters/policy`` should not be modified.
++
++Most notably, if L1.2 is not configured properly for any devices, the SoC
++will not be able to enter the deepest idle state.
++
++EPP Policy
++----------
++The ``energy_performance_preference`` sysfs file can be used to set a bias
++of efficiency or performance for a CPU.  This has a direct relationship on
++the battery life when more heavily biased towards performance.
++
++
++BIOS debug messages
++===================
++
++Most OEM machines don't have a serial UART for outputting kernel or BIOS
++debug messages. However BIOS debug messages are useful for understanding
++both BIOS bugs and bugs with the Linux kernel drivers that call BIOS AML.
++
++As the BIOS on most OEM AMD systems are based off an AMD reference BIOS,
++the infrastructure used for exporting debugging messages is often the same
++as AMD reference BIOS.
++
++Manually Parsing
++----------------
++There is generally an ACPI method ``\M460`` that different paths of the AML
++will call to emit a message to the BIOS serial log. This method takes
++7 arguments, with the first being a string and the rest being optional
++integers::
++
++  Method (M460, 7, Serialized)
++
++Here is an example of a string that BIOS AML may call out using ``\M460``::
++
++  M460 ("  OEM-ASL-PCIe Address (0x%X)._REG (%d %d)  PCSA = %d\n", DADR, Arg0, Arg1, PCSA, Zero, Zero)
++
++Normally when executed, the ``\M460`` method would populate the additional
++arguments into the string.  In order to get these messages from the Linux
++kernel a hook has been added into ACPICA that can capture the *arguments*
++sent to ``\M460`` and print them to the kernel ring buffer.
++For example the following message could be emitted into kernel ring buffer::
++
++  extrace-0174 ex_trace_args         :  "  OEM-ASL-PCIe Address (0x%X)._REG (%d %d)  PCSA = %d\n", ec106000, 2, 1, 1, 0, 0
++
++In order to get these messages, you need to compile with ``CONFIG_ACPI_DEBUG``
++and then turn on the following ACPICA tracing parameters.
++This can be done either on the kernel command line or at runtime:
++
++* ``acpi.trace_method_name=\M460``
++* ``acpi.trace_state=method``
++
++NOTE: These can be very noisy at bootup. If you turn these parameters on
++the kernel command, please also consider turning up ``CONFIG_LOG_BUF_SHIFT``
++to a larger size such as 17 to avoid losing early boot messages.
++
++Tool assisted Parsing
++---------------------
++As mentioned above, parsing by hand can be tedious, especially with a lot of
++messages.  To help with this, a tool has been created at
++`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
++to help parse the messages.
++
++Random reboot issues
++====================
++
++When a random reboot occurs, the high-level reason for the reboot is stored
++in a register that will persist onto the next boot.
++
++There are 6 classes of reasons for the reboot:
++ * Software induced
++ * Power state transition
++ * Pin induced
++ * Hardware induced
++ * Remote reset
++ * Internal CPU event
++
++.. csv-table::
++   :header: "Bit", "Type", "Reason"
++   :align: left
++
++   "0",  "Pin",      "thermal pin BP_THERMTRIP_L was tripped"
++   "1",  "Pin",      "power button was pressed for 4 seconds"
++   "2",  "Pin",      "shutdown pin was tripped"
++   "4",  "Remote",   "remote ASF power off command was received"
++   "9",  "Internal", "internal CPU thermal limit was tripped"
++   "16", "Pin",      "system reset pin BP_SYS_RST_L was tripped"
++   "17", "Software", "software issued PCI reset"
++   "18", "Software", "software wrote 0x4 to reset control register 0xCF9"
++   "19", "Software", "software wrote 0x6 to reset control register 0xCF9"
++   "20", "Software", "software wrote 0xE to reset control register 0xCF9"
++   "21", "ACPI-state", "ACPI power state transition occurred"
++   "22", "Pin",      "keyboard reset pin KB_RST_L was tripped"
++   "23", "Internal", "internal CPU shutdown event occurred"
++   "24", "Hardware", "system failed to boot before failed boot timer expired"
++   "25", "Hardware", "hardware watchdog timer expired"
++   "26", "Remote",   "remote ASF reset command was received"
++   "27", "Internal", "an uncorrected error caused a data fabric sync flood event"
++   "29", "Internal", "FCH and MP1 failed warm reset handshake"
++   "30", "Internal", "a parity error occurred"
++   "31", "Internal", "a software sync flood event occurred"
++
++This information is read by the kernel at bootup and printed into
++the syslog. When a random reboot occurs this message can be helpful
++to determine the next component to debug.
+diff --git a/arch/x86/include/asm/amd/fch.h b/arch/x86/include/asm/amd/fch.h
+new file mode 100644
+index 000000000000..2cf5153edbc2
+--- /dev/null
++++ b/arch/x86/include/asm/amd/fch.h
+@@ -0,0 +1,13 @@
++/* SPDX-License-Identifier: GPL-2.0 */
++#ifndef _ASM_X86_AMD_FCH_H_
++#define _ASM_X86_AMD_FCH_H_
++
++#define FCH_PM_BASE			0xFED80300
++
++/* Register offsets from PM base: */
++#define FCH_PM_DECODEEN			0x00
++#define FCH_PM_DECODEEN_SMBUS0SEL	GENMASK(20, 19)
++#define FCH_PM_SCRATCH			0x80
++#define FCH_PM_S5_RESET_STATUS		0xC0
++
++#endif /* _ASM_X86_AMD_FCH_H_ */
+diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
+index 823f44f7bc94..c3194101a92a 100644
+--- a/arch/x86/kernel/cpu/amd.c
++++ b/arch/x86/kernel/cpu/amd.c
+@@ -9,6 +9,7 @@
+ #include <linux/sched/clock.h>
+ #include <linux/random.h>
+ #include <linux/topology.h>
++#include <asm/amd/fch.h>
+ #include <asm/processor.h>
+ #include <asm/apic.h>
+ #include <asm/cacheinfo.h>
+@@ -1216,3 +1217,56 @@ void amd_check_microcode(void)
+ 	if (cpu_feature_enabled(X86_FEATURE_ZEN2))
+ 		on_each_cpu(zenbleed_check_cpu, NULL, 1);
+ }
++
++static const char * const s5_reset_reason_txt[] = {
++	[0]  = "thermal pin BP_THERMTRIP_L was tripped",
++	[1]  = "power button was pressed for 4 seconds",
++	[2]  = "shutdown pin was tripped",
++	[4]  = "remote ASF power off command was received",
++	[9]  = "internal CPU thermal limit was tripped",
++	[16] = "system reset pin BP_SYS_RST_L was tripped",
++	[17] = "software issued PCI reset",
++	[18] = "software wrote 0x4 to reset control register 0xCF9",
++	[19] = "software wrote 0x6 to reset control register 0xCF9",
++	[20] = "software wrote 0xE to reset control register 0xCF9",
++	[21] = "ACPI power state transition occurred",
++	[22] = "keyboard reset pin KB_RST_L was tripped",
++	[23] = "internal CPU shutdown event occurred",
++	[24] = "system failed to boot before failed boot timer expired",
++	[25] = "hardware watchdog timer expired",
++	[26] = "remote ASF reset command was received",
++	[27] = "an uncorrected error caused a data fabric sync flood event",
++	[29] = "FCH and MP1 failed warm reset handshake",
++	[30] = "a parity error occurred",
++	[31] = "a software sync flood event occurred",
++};
++
++static __init int print_s5_reset_status_mmio(void)
++{
++	unsigned long value;
++	void __iomem *addr;
++	int i;
++
++	if (!cpu_feature_enabled(X86_FEATURE_ZEN))
++		return 0;
++
++	addr = ioremap(FCH_PM_BASE + FCH_PM_S5_RESET_STATUS, sizeof(value));
++	if (!addr)
++		return 0;
++
++	value = ioread32(addr);
++	iounmap(addr);
++
++	for (i = 0; i < ARRAY_SIZE(s5_reset_reason_txt); i++) {
++		if (!(value & BIT(i)))
++			continue;
++
++		if (s5_reset_reason_txt[i]) {
++			pr_info("x86/amd: Previous system reset reason [0x%08lx]: %s\n",
++				value, s5_reset_reason_txt[i]);
++		}
++	}
++
++	return 0;
++}
++late_initcall(print_s5_reset_status_mmio);
+-- 
+2.43.0
+
diff --git a/patches-sonic/0002-x86-CPU-AMD-Ignore-invalid-reset-reason-value.patch b/patches-sonic/0002-x86-CPU-AMD-Ignore-invalid-reset-reason-value.patch
new file mode 100644
index 000000000..911861ce2
--- /dev/null
+++ b/patches-sonic/0002-x86-CPU-AMD-Ignore-invalid-reset-reason-value.patch
@@ -0,0 +1,64 @@
+From aa429e1fbeaa168555aea99038f30a0e05b369e5 Mon Sep 17 00:00:00 2001
+From: Yazen Ghannam <yazen.ghannam@amd.com>
+Date: Mon, 21 Jul 2025 18:11:54 +0000
+Subject: [PATCH 2/2] x86/CPU/AMD: Ignore invalid reset reason value
+
+[ Upstream commit e9576e078220c50ace9e9087355423de23e25fa5 ]
+
+The reset reason value may be "all bits set", e.g. 0xFFFFFFFF. This is a
+commonly used error response from hardware. This may occur due to a real
+hardware issue or when running in a VM.
+
+The user will see all reset reasons reported in this case.
+
+Check for an error response value and return early to avoid decoding
+invalid data.
+
+Also, adjust the data variable type to match the hardware register size.
+
+Fixes: ab8131028710 ("x86/CPU/AMD: Print the reason for the last reset")
+Reported-by: Libing He <libhe@redhat.com>
+Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
+Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Cc: stable@vger.kernel.org
+Link: https://lore.kernel.org/20250721181155.3536023-1-yazen.ghannam@amd.com
+---
+ arch/x86/kernel/cpu/amd.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
+index c3194101a92a..b40f841479f7 100644
+--- a/arch/x86/kernel/cpu/amd.c
++++ b/arch/x86/kernel/cpu/amd.c
+@@ -1243,8 +1243,8 @@ static const char * const s5_reset_reason_txt[] = {
+ 
+ static __init int print_s5_reset_status_mmio(void)
+ {
+-	unsigned long value;
+ 	void __iomem *addr;
++	u32 value;
+ 	int i;
+ 
+ 	if (!cpu_feature_enabled(X86_FEATURE_ZEN))
+@@ -1257,12 +1257,16 @@ static __init int print_s5_reset_status_mmio(void)
+ 	value = ioread32(addr);
+ 	iounmap(addr);
+ 
++	/* Value with "all bits set" is an error response and should be ignored. */
++	if (value == U32_MAX)
++		return 0;
++
+ 	for (i = 0; i < ARRAY_SIZE(s5_reset_reason_txt); i++) {
+ 		if (!(value & BIT(i)))
+ 			continue;
+ 
+ 		if (s5_reset_reason_txt[i]) {
+-			pr_info("x86/amd: Previous system reset reason [0x%08lx]: %s\n",
++			pr_info("x86/amd: Previous system reset reason [0x%08x]: %s\n",
+ 				value, s5_reset_reason_txt[i]);
+ 		}
+ 	}
+-- 
+2.43.0
+
diff --git a/patches-sonic/series b/patches-sonic/series
index 8d862b559..9f293cfba 100644
--- a/patches-sonic/series
+++ b/patches-sonic/series
@@ -201,6 +201,10 @@ cisco-npu-disable-other-bars.patch
 0001-fix-os-crash-caused-by-optoe-when-class-switch.patch
 0001-tty-8250-HSUART-DMA-be-deactivated-for-DNV-CPU.patch
 
+# Nexthop patches
+0001-x86-CPU-AMD-Print-the-reason-for-the-last-reset.patch
+0002-x86-CPU-AMD-Ignore-invalid-reset-reason-value.patch
+
 # Fix to avoid kernel panic on Kernel 6.1.94
 # https://github.com/sonic-net/sonic-buildimage/issues/20901
 #PCI-ASPM-Fix-link-state-exit-during-switch-upstream.patch # Upstreamed