|
| 1 | +.. _instrumentation: |
| 2 | + |
| 3 | +Instrumentation |
| 4 | +############### |
| 5 | + |
| 6 | +Overview |
| 7 | +******** |
| 8 | + |
| 9 | +The instrumentation subsystem provides compiler-managed runtime system instrumentation capabilities |
| 10 | +for Zephyr applications. It enables developers to trace function calls, observe context switches, |
| 11 | +and profile application performance with minimal manual instrumentation effort. |
| 12 | + |
| 13 | +Unlike the :ref:`tracing <tracing>` subsystem, which provides RTOS-aware tracing with structured |
| 14 | +event APIs, the instrumentation subsystem works at a lower level by leveraging compiler |
| 15 | +instrumentation hooks. This approach makes it possible to capture virtually any function entry and |
| 16 | +exit events without requiring manual tracing calls in the code. |
| 17 | + |
| 18 | +.. admonition:: Tracing vs. Instrumentation |
| 19 | + :class: hint |
| 20 | + |
| 21 | + **When to use Tracing**: Choose the tracing subsystem when you need RTOS-aware event tracing |
| 22 | + (e.g. thread switches, semaphore operations, etc.) and want to minimize overhead. |
| 23 | + |
| 24 | + **When to use Instrumentation**: Choose instrumentation when you need a detailed view of |
| 25 | + function-level execution to better understand code flow, or to identify performance bottlenecks |
| 26 | + without adding manual trace points. |
| 27 | + |
| 28 | +The instrumentation subsystem relies on compiler support for automatic function instrumentation. |
| 29 | +When enabled, the compiler automatically inserts calls to special instrumentation handler functions |
| 30 | +at the entry and exit of every function in your application (excluding those explicitly marked with |
| 31 | +``__no_instrumentation__``). Currently, only GCC is supported with the ``-finstrument-functions`` |
| 32 | +compiler flag. |
| 33 | + |
| 34 | +The subsystem initializes automatically after RAM initialization and uses trigger/stopper functions |
| 35 | +to control when recording is active. The default trigger and stopper functions are both set to |
| 36 | +``main()`` (configurable via Kconfig), meaning instrumentation captures the entire execution from |
| 37 | +when ``main()`` starts until it returns. |
| 38 | + |
| 39 | +The recorded data is stored in RAM and can be accessed from a host computer thanks to a UART backend |
| 40 | +that exposes a set of simple commands. :zephyr_file:`scripts/instrumentation/zaru.py` script allows |
| 41 | +to execute these commands through a high-level command-line interface and makes it easy to obtain |
| 42 | +data in a format suitable for further analysis (e.g. using `Perfetto`_). |
| 43 | + |
| 44 | +Operational Modes |
| 45 | +***************** |
| 46 | + |
| 47 | +The instrumentation subsystem supports two modes that can be enabled independently or together: |
| 48 | + |
| 49 | +Callgraph Mode (Tracing) |
| 50 | +======================== |
| 51 | + |
| 52 | +In callgraph mode (enabled with :kconfig:option:`CONFIG_INSTRUMENTATION_MODE_CALLGRAPH`), the |
| 53 | +subsystem records function entry and exit events along with timestamps and context information in a |
| 54 | +memory buffer. This enables: |
| 55 | + |
| 56 | +- Reconstruction of the complete function call graph |
| 57 | +- Observation of thread context switches |
| 58 | +- Analysis of execution flow and timing relationships |
| 59 | + |
| 60 | +The trace buffer can operate in ring buffer mode (default, overwrites old entries) or fixed buffer |
| 61 | +mode (stops when full). Buffer size is configurable via |
| 62 | +:kconfig:option:`CONFIG_INSTRUMENTATION_MODE_CALLGRAPH_TRACE_BUFFER_SIZE`. |
| 63 | + |
| 64 | +.. code-block:: console |
| 65 | + :caption: Example of callgraph mode output. See :ref:`zaru_usage` for more details. |
| 66 | +
|
| 67 | + $ ./scripts/instrumentation/zaru.py trace |
| 68 | +
|
| 69 | + Thread Name Thread ID CPU Mode Timestamp Function(s) |
| 70 | + ------------------------------------------------------------------------------------------------ |
| 71 | + ... (truncated) ... |
| 72 | +
|
| 73 | + main 0x20001a38 0) 0 | 187837720 ns | sys_dlist_append(); |
| 74 | + main 0x20001a38 0) 0 | 188802680 ns | }; /* z_priq_simple_add */ |
| 75 | + main 0x20001a38 0) 0 | 189282840 ns | }; /* add_to_waitq_locked */ |
| 76 | + main 0x20001a38 0) 0 | 189770000 ns | add_thread_timeout(); |
| 77 | + main 0x20001a38 0) 0 | 190732920 ns | }; /* pend_locked */ |
| 78 | + main 0x20001a38 0) 0 | 191198480 ns | k_spin_release(); |
| 79 | + main 0x20001a38 0) 0 | 192125560 ns | z_swap() { |
| 80 | + main 0x20001a38 0) 0 | 192590080 ns | k_spin_release(); |
| 81 | + main 0x20001a38 0) 0 | 193520000 ns | z_swap_irqlock() { |
| 82 | + main 0x20001a38 0) 0 | 193987840 ns | __set_BASEPRI() { |
| 83 | + main 0x20001a38 0) 0 | 194474640 ns | /* --> Scheduler switched OUT from thread 'main' */ |
| 84 | + thread-none none-thread 0) 0 | 195178000 ns | /* <-- Scheduler switched IN thread 'thread-none' */ |
| 85 | + thread-none none-thread 0) 0 | 195851520 ns | z_thread_entry() { |
| 86 | + thread-none none-thread 0) 0 | 196312600 ns | k_sched_current_thread_query() { |
| 87 | + thread-none none-thread 0) 0 | 196774680 ns | z_impl_k_sched_current_thread_query(); |
| 88 | + thread-none none-thread 0) 0 | 197694480 ns | }; /* k_sched_current_thread_query */ |
| 89 | + thread_A 0x200000d8 0) 7 | 198160000 ns | thread_A() { |
| 90 | + thread_A 0x200000d8 0) 7 | 198443400 ns | get_sem_and_exec_function() { |
| 91 | + thread_A 0x200000d8 0) 7 | 198727440 ns | k_sem_take() { |
| 92 | + thread_A 0x200000d8 0) 7 | 199011840 ns | z_impl_k_sem_take() { |
| 93 | + thread_A 0x200000d8 0) 7 | 199397520 ns | k_spin_lock() { |
| 94 | + thread_A 0x200000d8 0) 7 | 199784200 ns | __get_BASEPRI(); |
| 95 | + thread_A 0x200000d8 0) 7 | 200557840 ns | __set_BASEPRI_MAX(); |
| 96 | + thread_A 0x200000d8 0) 7 | 201333640 ns | __ISB(); |
| 97 | + thread_A 0x200000d8 0) 7 | 202111360 ns | z_spinlock_validate_pre(); |
| 98 | + thread_A 0x200000d8 0) 7 | 202891000 ns | z_spinlock_validate_post(); |
| 99 | + thread_A 0x200000d8 0) 7 | 203664760 ns | }; /* k_spin_lock */ |
| 100 | + thread_A 0x200000d8 0) 7 | 204058000 ns | k_spin_unlock() { |
| 101 | + thread_A 0x200000d8 0) 7 | 204450840 ns | __set_BASEPRI(); |
| 102 | + thread_A 0x200000d8 0) 7 | 205231640 ns | __ISB(); |
| 103 | + thread_A 0x200000d8 0) 7 | 206009600 ns | }; /* k_spin_unlock */ |
| 104 | + thread_A 0x200000d8 0) 7 | 206291600 ns | }; /* z_impl_k_sem_take */ |
| 105 | + thread_A 0x200000d8 0) 7 | 206572920 ns | }; /* k_sem_take */ |
| 106 | +
|
| 107 | + ... (truncated) ... |
| 108 | +
|
| 109 | +Statistical Mode (Profiling) |
| 110 | +============================ |
| 111 | + |
| 112 | +In statistical mode (enabled with :kconfig:option:`CONFIG_INSTRUMENTATION_MODE_STATISTICAL`), the |
| 113 | +subsystem accumulates timing statistics for each unique function executed between the trigger and |
| 114 | +stopper points. This provides total execution time per function and helps identify performance |
| 115 | +bottlenecks. The subsystem tracks up to |
| 116 | +:kconfig:option:`CONFIG_INSTRUMENTATION_MODE_STATISTICAL_MAX_NUM_FUNC` unique functions. |
| 117 | + |
| 118 | +.. code-block:: console |
| 119 | + :caption: Example of statistical mode output (top 10 most expensive functions). See |
| 120 | + :ref:`zaru_usage` for more details. |
| 121 | +
|
| 122 | + $ ./scripts/instrumentation/zaru.py profile -n 10 |
| 123 | +
|
| 124 | + 9.45% 0000061d main |
| 125 | + 6.00% 0000049d k_msleep |
| 126 | + 5.98% 00000469 k_sleep |
| 127 | + 5.95% 0000aea1 z_impl_k_sleep |
| 128 | + 5.93% 0000ad6d z_tick_sleep |
| 129 | + 5.66% 00000431 k_sem_take |
| 130 | + 5.65% 00007e65 z_impl_k_sem_take |
| 131 | + 5.51% 0000ac29 z_pend_curr |
| 132 | + 2.83% 000063ed sys_clock_isr |
| 133 | + 2.67% 0000d361 sys_clock_announce |
| 134 | +
|
| 135 | +Configuration |
| 136 | +************* |
| 137 | + |
| 138 | +Enable instrumentation with: |
| 139 | + |
| 140 | +.. code-block:: kconfig |
| 141 | +
|
| 142 | + CONFIG_INSTRUMENTATION=y |
| 143 | + CONFIG_INSTRUMENTATION_MODE_CALLGRAPH=y # For tracing |
| 144 | + CONFIG_INSTRUMENTATION_MODE_STATISTICAL=y # For profiling |
| 145 | +
|
| 146 | +The instrumentation subsystem uses :ref:`retained memory <retention_api>` to persist trigger/stopper |
| 147 | +function addresses across reboots. This must be configured in the devicetree: |
| 148 | + |
| 149 | +.. code-block:: devicetree |
| 150 | +
|
| 151 | + / { |
| 152 | + sram@2003FC00 { |
| 153 | + compatible = "zephyr,memory-region", "mmio-sram"; |
| 154 | + reg = <0x2003FC00 DT_SIZE_K(1)>; |
| 155 | + zephyr,memory-region = "RetainedMem"; |
| 156 | +
|
| 157 | + retainedmem { |
| 158 | + compatible = "zephyr,retained-ram"; |
| 159 | + status = "okay"; |
| 160 | +
|
| 161 | + instrumentation_triggers: retention@0 { |
| 162 | + compatible = "zephyr,retention"; |
| 163 | + status = "okay"; |
| 164 | + reg = <0x0 0x10>; |
| 165 | + }; |
| 166 | + }; |
| 167 | + }; |
| 168 | + }; |
| 169 | +
|
| 170 | + /* Adjust main SRAM to exclude retained region */ |
| 171 | + &sram0 { |
| 172 | + reg = <0x20000000 DT_SIZE_K(255)>; |
| 173 | + }; |
| 174 | +
|
| 175 | +See the :zephyr:code-sample:`instrumentation` sample for complete configuration examples. |
| 176 | +Additional options include buffer sizes, trigger functions, and function/file exclusion lists (see |
| 177 | +Kconfig options starting with :kconfig:option-regex:`CONFIG_INSTRUMENTATION_*`). |
| 178 | + |
| 179 | +.. _zaru_usage: |
| 180 | + |
| 181 | +``zaru.py`` Usage |
| 182 | +***************** |
| 183 | + |
| 184 | +The ``zaru.py`` command-line tool (located in :zephyr_file:`scripts/instrumentation/zaru.py`) |
| 185 | +provides an interface for controlling instrumentation and extracting data from the target over UART. |
| 186 | + |
| 187 | +The tool offers several commands: |
| 188 | + |
| 189 | +- ``status``: Check if the target device supports callgraph (tracing) and statistical (profiling) |
| 190 | + modes. |
| 191 | +- ``trace``: Capture and display function call traces. |
| 192 | +- ``profile``: Capture and display function profiling data. |
| 193 | +- ``reboot``: Reboot the target device. |
| 194 | + |
| 195 | +You can get help for each command by running ``zaru.py <command> --help``. |
| 196 | + |
| 197 | +By default, ``zaru.py`` attempts to connect to the target device using ``/dev/ttyACM0``. You can |
| 198 | +specify a different serial port using the ``--serial`` option: |
| 199 | + |
| 200 | +.. code-block:: console |
| 201 | +
|
| 202 | + $ ./scripts/instrumentation/zaru.py --serial /dev/ttyACM1 status |
| 203 | +
|
| 204 | +The ``--build-dir`` option can be used to specify the Zephyr build directory, which is needed to |
| 205 | +locate the ELF file for symbol resolution. If not provided, ``zaru.py`` will attempt to find it |
| 206 | +automatically. |
| 207 | + |
| 208 | +See the :zephyr:code-sample:`instrumentation` sample documentation for detailed usage instructions. |
| 209 | + |
| 210 | +Limitations and Considerations |
| 211 | +****************************** |
| 212 | + |
| 213 | +Compiler support |
| 214 | + The instrumentation subsystem requires GCC with ``-finstrument-functions`` support. Other |
| 215 | + compilers are not supported. |
| 216 | + |
| 217 | +Stack size requirements |
| 218 | + Instrumentation adds overhead to every function call, which increases stack usage. You will likely |
| 219 | + need to increase thread stack sizes to accommodate the additional space required by instrumentation |
| 220 | + handlers and nested function calls. |
| 221 | + |
| 222 | +Execution overhead |
| 223 | + All function calls incur instrumentation overhead. Code size will increase due to added |
| 224 | + instrumentation calls, and performance will be impacted. |
| 225 | + |
| 226 | +Initialization constraints |
| 227 | + Code that runs before RAM initialization (e.g., early boot functions) is not captured as it runs |
| 228 | + before the instrumentation subsystem is initialized. |
| 229 | + |
| 230 | +To reduce overhead, use trigger/stopper functions to instrument only code regions of interest, and |
| 231 | +exclude performance-critical functions via |
| 232 | +:kconfig:option:`CONFIG_INSTRUMENTATION_EXCLUDE_FUNCTION_LIST` and |
| 233 | +:kconfig:option:`CONFIG_INSTRUMENTATION_EXCLUDE_FILE_LIST`. |
| 234 | + |
| 235 | +API Reference |
| 236 | +************* |
| 237 | + |
| 238 | +.. doxygengroup:: instrumentation_api |
| 239 | + |
| 240 | +.. _Perfetto: https://perfetto.dev/ |
0 commit comments