Skip to content

Commit fcd21dd

Browse files
committed
feat(linux): Add s2idle docs
1 parent ea4cabe commit fcd21dd

File tree

3 files changed

+307
-0
lines changed

3 files changed

+307
-0
lines changed

configs/AM62LX/AM62LX_linux_toc.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ linux/Foundational_Components_Power_Management
7676
linux/Foundational_Components/Power_Management/pm_overview
7777
linux/Foundational_Components/Power_Management/pm_cpuidle
7878
linux/Foundational_Components/Power_Management/pm_am62lx_low_power_modes
79+
linux/Foundational_Components/Power_Management/pm_psci_s2idle
7980
linux/Foundational_Components/Power_Management/pm_wakeup_sources
8081

8182
#linux/Foundational_Components/System_Security/SELinux
Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
.. _pm_s2idle_psci:
2+
3+
#############################################
4+
Suspend-to-Idle (S2Idle) and PSCI Integration
5+
#############################################
6+
7+
**********************************
8+
Suspend-to-Idle (S2Idle) Overview
9+
**********************************
10+
11+
Suspend-to-Idle (s2idle), also known as "freeze," is a generic, pure software, light-weight variant of system suspend.
12+
In this state, the Linux kernel freezes user space tasks, suspends devices, and then puts all CPUs into their deepest available idle state.
13+
14+
**********************************
15+
S2Idle vs Deep Sleep (mem)
16+
**********************************
17+
18+
The Linux kernel has sleep states that are global low-power states of the entire system in which user space
19+
code cannot be executed and the overall system activity is significantly reduced.
20+
There's different types of sleep states as mentioned in it's
21+
`documentation<https://docs.kernel.org/admin-guide/pm/sleep-states.html>`__.
22+
System sleep states can be selected using the sysfs entry :file:`/sys/kernel/mem_sleep`
23+
24+
On TI K3 AM62L platform, we currently support the ``s2idle`` and ``deep`` states.
25+
Both of them can achieve similar power savings (e.g., by suspending to RAM / putting DDR into Self-Refresh).
26+
The primary differences lie in the software execution flow, specifically how CPUs are managed and which
27+
PSCI APIs are invoked.
28+
29+
.. list-table:: S2Idle vs Deep Sleep
30+
:widths: 20 40 40
31+
:header-rows: 1
32+
33+
* - Feature
34+
- s2idle (Suspend-to-Idle)
35+
- deep (Suspend-to-RAM)
36+
37+
* - **Kernel String**
38+
- ``s2idle`` or ``freeze``
39+
- ``deep`` or ``mem``
40+
41+
* - **Non-boot CPUs**
42+
- **Online**: Non-boot CPUs are put into a deep idle state but remain logically online.
43+
- **Offline**: Non-boot CPUs are hot-unplugged (removed) from the system via ``CPU_OFF``.
44+
45+
* - **Entry Path**
46+
- **cpuidle**: Uses the standard CPU idle framework and governance. It runtime suspends each driver to make sure it's idle.
47+
- **suspend_ops**: Uses platform-specific suspend operations like each driver's suspend ops and finally the `PSCI_SYSTEM_SUSPEND` is called.
48+
No governors exist to make any decisions.
49+
50+
* - **PSCI Call**
51+
- ``CPU_SUSPEND``: Invoked for every core (Last Man Standing logic coordinates the cluster/system depth).
52+
- ``SYSTEM_SUSPEND``: Typically invoked by the last active CPU after others are offlined.
53+
54+
* - **Resume Flow**
55+
- **Fast**: CPUs exit the idle loop immediately upon interrupt. Context is preserved.
56+
- **Slow**: Kernel must serially bring secondary CPUs back online (Hotplug). Kernel must recreate
57+
threads, re-enable interrupts, resume each driver and restore per-CPU state for every non-boot core.
58+
59+
* - **Latency**
60+
- Lower
61+
- High, primarily due to the overhead of **CPU Hotplug** for non-boot CPUs
62+
63+
*******************
64+
PSCI as the Enabler
65+
*******************
66+
67+
The Power State Coordination Interface (PSCI) is an ARM-defined standard that acts as the fundamental
68+
enabler for s2idle on all ARM platforms that support it. PSCI defines a standardized firmware interface that allows the
69+
Operating System (OS) to request power states without needing intimate knowledge of the underlying
70+
SoC.
71+
72+
**s2idle Call Flow:**
73+
74+
.. code-block:: text
75+
76+
Linux Kernel PSCI Firmware (TF-A)
77+
============ ====================
78+
79+
1. Freeze tasks
80+
|
81+
v
82+
2. Suspend devices
83+
|
84+
v
85+
3. cpuidle driver -----------> CPU_SUSPEND (SMC/HVC)
86+
(per CPU) |
87+
| v
88+
| Coordinate power
89+
| state requests
90+
| |
91+
| v
92+
| CPU enters low-power
93+
| hardware state
94+
|
95+
|<--------- Resume ---------
96+
|
97+
v
98+
4. Resume devices
99+
|
100+
v
101+
5. Thaw tasks
102+
103+
The `cpuidle` driver calls the PSCI `CPU_SUSPEND` API to transition the CPUs into a low-power state.
104+
The effectiveness of s2idle depends heavily on the PSCI implementation's ability to coordinate these
105+
requests and enter the deepest possible hardware state.
106+
107+
************************
108+
OS Initiated (OSI) Mode
109+
************************
110+
111+
PSCI 1.0 introduced **OS Initiated (OSI)** mode, which shifts the responsibility of power state coordination from the platform firmware to the Operating System.
112+
113+
In the default **Platform Coordinated (PC)** mode, the OS independently requests a state for each core. The firmware then aggregates these requests (voting) to
114+
determine if a cluster or the system can be powered down.
115+
116+
In **OS Initiated (OSI)** mode, the OS explicitly manages the hierarchy. The OS determines when the last core in a power domain (e.g., a cluster) is going idle
117+
and explicitly requests the power-down of that domain.
118+
119+
Why OSI?
120+
========
121+
122+
OSI mode allows the OS to make better power decisions because it has visibility into:
123+
* **Task Scheduling:** The OS knows when other cores will wake up.
124+
* **Wakeup Latencies:** The OS can respect Quality of Service (QoS) latency constraints more accurately.
125+
* **Usage Patterns:** The OS can predict idle duration better than firmware.
126+
127+
OSI Sequence
128+
============
129+
130+
The coordination in OSI mode follows a specific "Last Man Standing" sequence. The OS tracks the state of all cores in a topology node (e.g., a cluster).
131+
132+
.. code-block:: text
133+
134+
OSI "Last Man Standing" Flow
135+
136+
Cluster with 2 Cores OS Action PSCI Request
137+
==================== ========= =============
138+
139+
1. Core 0,1: ACTIVE
140+
|
141+
| Core 0 becomes idle
142+
v
143+
2. Core 0: IDLE --> OS requests local --> CPU_SUSPEND
144+
Core 1: ACTIVE Core Power Down (Core PD only)
145+
Cluster stays ON
146+
|
147+
| Core 1 (LAST) becomes idle
148+
v
149+
3. Core 0,1: IDLE --> OS recognizes --> CPU_SUSPEND
150+
"Last Man" scenario (Composite State)
151+
Requests Composite:
152+
- Core 1: PD Core: PD
153+
- Cluster: PD Cluster: PD
154+
- System: PD System: PD
155+
|
156+
v
157+
4. Firmware Verification --> PSCI firmware checks
158+
& System Power Down all cores/clusters idle
159+
If verified: Power down
160+
entire system
161+
If not: Deny request
162+
(race condition)
163+
164+
**Detailed Steps:**
165+
166+
1. **First Core Idle:** When the first core in a cluster goes idle, the OS requests a local idle state
167+
for that core (e.g., Core Power Down) but keeps the cluster running.
168+
169+
2. **Last Core Idle:** When the *last* active core in the cluster is ready to go idle, the OS recognizes
170+
that the entire cluster, and potentially the system, can now be powered down.
171+
172+
3. **Composite Request:** The last core issues a `CPU_SUSPEND` call that requests a **composite state**:
173+
174+
* **Core State:** Power Down
175+
* **Cluster State:** Power Down
176+
* **System State:** Power Down (as demonstrated in the diagram)
177+
178+
4. **Firmware Enforcement:** The PSCI firmware verifies that all other cores and clusters in the requested node are indeed idle.
179+
If they are not, the request is denied (to prevent race conditions).
180+
181+
***********************************
182+
Understanding the Suspend Parameter
183+
***********************************
184+
185+
The `power_state` parameter passed to `CPU_SUSPEND` is the key to requesting these states.
186+
In OSI mode, this parameter must encode the intent for the entire hierarchy.
187+
188+
Power State Parameter Encoding
189+
================================
190+
191+
The `power_state` is a 32-bit parameter defined by the ARM PSCI specification (ARM DEN0022C).
192+
It has two encoding formats, controlled by the platform's build configuration.
193+
194+
Standard Format
195+
===============
196+
197+
This is the default format used by most platforms:
198+
199+
.. code-block:: text
200+
201+
31 26 25 24 23 17 16 15 0
202+
+---------------+------+----------------+----+----------------------+
203+
| Reserved | Pwr | Reserved | ST | State ID |
204+
| (must be 0) | Level| (must be 0) | | (platform-defined) |
205+
+---------------+------+----------------+----+----------------------+
206+
207+
.. list-table:: Standard Format Bit Fields
208+
:widths: 20 80
209+
:header-rows: 1
210+
211+
* - Bit Field
212+
- Description
213+
214+
* - **[31:26]**
215+
- **Reserved**: Must be zero.
216+
217+
* - **[25:24]**
218+
- **Power Level**: Indicates the deepest power domain level that can be powered down.
219+
220+
* ``0``: CPU/Core level
221+
* ``1``: Cluster level
222+
* ``2``: System level
223+
* ``3``: Higher levels (platform-specific)
224+
225+
* - **[23:17]**
226+
- **Reserved**: Must be zero.
227+
228+
* - **[16]**
229+
- **State Type (ST)**: Type of power state.
230+
231+
* ``0``: Standby or Retention (low latency, context preserved)
232+
* ``1``: Power Down (higher latency, may lose context)
233+
234+
* - **[15:0]**
235+
- **State ID**: Platform-specific identifier for the requested power state. The OS and
236+
platform firmware must agree on the meaning of these values, typically defined through
237+
device tree bindings.
238+
239+
**OSI Mode Consideration:**
240+
241+
In OSI mode, the OS is responsible for tracking which cores are idle. When the last core
242+
in a cluster issues this `CPU_SUSPEND` call with Power Level = 1, the PSCI firmware:
243+
244+
1. Verifies that all other cores in the cluster are already in a low-power state
245+
2. If verified, powers down the entire cluster
246+
3. If not verified (race condition), denies the request with an error code
247+
248+
The State ID field is platform-defined and typically documented in the device tree
249+
``idle-state`` nodes using the ``arm,psci-suspend-param`` property. This mechanism,
250+
leveraging ``cpuidle`` and ``s2idle``, allows the kernel to abstract complex platform-specific
251+
low-power modes into a generic framework. The ``idle-state`` nodes in the Device Tree define these power states,
252+
including their entry/exit latencies and target power consumption, enabling the ``cpuidle`` governor to make informed
253+
decisions about which idle state to enter based on system load and predicted idle duration.
254+
255+
The ``arm,psci-suspend-param`` property then directly maps these idle states to the corresponding PSCI ``power_state`` parameter values that the firmware understands.
256+
257+
Example: System Suspend (Standard Format)
258+
=========================================
259+
260+
When the OS targets a system-wide suspend state (e.g., Suspend-to-RAM), the `power_state` parameter is constructed to target the highest power level.
261+
Consider the example value **0x02012234**:
262+
263+
.. list-table:: Power State Parameter Breakdown (0x02012234)
264+
:widths: 20 20 20 40
265+
:header-rows: 1
266+
267+
* - Field
268+
- Bits
269+
- Value
270+
- Meaning
271+
272+
* - Reserved
273+
- [31:26]
274+
- 0
275+
- Must be zero
276+
277+
* - Power Level
278+
- [25:24]
279+
- 2
280+
- System level
281+
282+
* - Reserved
283+
- [23:17]
284+
- 0
285+
- Must be zero
286+
287+
* - State Type
288+
- [16]
289+
- 1
290+
- Power Down
291+
292+
* - State ID
293+
- [15:0]
294+
- 0x2234
295+
- Platform-specific (e.g., "S2RAM")
296+
297+
**Interpretation:**
298+
299+
* **Power Level = 2** tells the firmware that a system-level transition is requested.
300+
* **State Type = 1** indicates a power-down state.
301+
* **State ID = 0x2234** is the platform-specific identifier for this system state.
302+
303+
In the context of **s2idle**, if the OS determines that all constraints are met for system suspension,
304+
the last active CPU (Last Man) will invoke `CPU_SUSPEND` with this parameter. The PSCI firmware then
305+
coordinates the final steps to suspend the system (e.g., placing DDR in self-refresh and powering down the SoC).

source/linux/Foundational_Components_Power_Management.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Power Management
1414
Foundational_Components/Power_Management/pm_rtc_ddr
1515
Foundational_Components/Power_Management/pm_low_power_modes
1616
Foundational_Components/Power_Management/pm_am62lx_low_power_modes
17+
Foundational_Components/Power_Management/pm_psci_s2idle
1718
Foundational_Components/Power_Management/pm_wakeup_sources
1819
Foundational_Components/Power_Management/pm_sw_arch
1920
Foundational_Components/Power_Management/pm_debug

0 commit comments

Comments
 (0)