|
| 1 | +/// |
| 2 | +/// Copyright (c) 2025 Arm Limited. |
| 3 | +/// |
| 4 | +/// SPDX-License-Identifier: MIT |
| 5 | +/// |
| 6 | +/// Permission is hereby granted, free of charge, to any person obtaining a copy |
| 7 | +/// of this software and associated documentation files (the "Software"), to |
| 8 | +/// deal in the Software without restriction, including without limitation the |
| 9 | +/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or |
| 10 | +/// sell copies of the Software, and to permit persons to whom the Software is |
| 11 | +/// furnished to do so, subject to the following conditions: |
| 12 | +/// |
| 13 | +/// The above copyright notice and this permission notice shall be included in all |
| 14 | +/// copies or substantial portions of the Software. |
| 15 | +/// |
| 16 | +/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| 17 | +/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 18 | +/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| 19 | +/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| 20 | +/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| 21 | +/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
| 22 | +/// SOFTWARE. |
| 23 | +/// |
| 24 | +namespace arm_compute |
| 25 | +{ |
| 26 | +/** |
| 27 | +@page profiling Profiling |
| 28 | + |
| 29 | +@tableofcontents |
| 30 | + |
| 31 | +@section profiling_overview Overview |
| 32 | +The Arm Compute Library includes a built-in profiling system based on the [Perfetto](https://perfetto.dev/) tracing framework. It allows developers to collect detailed performance data for CPU and GPU workloads executed via the library. |
| 33 | +The profiler can capture timing information for individual functions, kernels, and operations, providing insights into execution time, resource usage, and potential bottlenecks. The collected data can be visualized using the Perfetto UI, enabling developers to analyze performance characteristics and optimize their applications effectively. |
| 34 | + |
| 35 | + |
| 36 | +@section profiling_build How to Build |
| 37 | + |
| 38 | +Profiling is controlled at build time via the SCons-based build system. The parameters below are consumed by the build system and mapped to preprocessor definitions. |
| 39 | + |
| 40 | +@subsection profiling_build_flags Build Flags |
| 41 | + |
| 42 | +- `profile=1`: Enable the profiler (defines `ACL_PROFILE_ENABLE`). |
| 43 | +- `profile_level={0|1}`: Select profiling level (defines `ACL_PROFILE_LEVEL`). |
| 44 | + - 0: Basic CPU tracing (low overhead). |
| 45 | + - 1: Experimental GPU tracing (reconstructs GPU spans; higher overhead). |
| 46 | +- `profile_backend=perfetto`: Currently it's the only and the default profiler backend. |
| 47 | +- `profile_mode={kInProcessBackend|kSystemBackend}`: Perfetto mode (defines `ACL_PROFILE_MODE`). |
| 48 | +- `profile_size={16384|32768|65536|131072}`: Trace buffer size in KB (defines `ACL_ACL_PROFILE_SIZE_KB`). |
| 49 | + |
| 50 | +@subsection profiling_build_cmd Examples |
| 51 | + |
| 52 | +@code{.bash} |
| 53 | +# Linux In-Process profiling with GPU OpenCL(TM) shaders info. |
| 54 | +scons profile=1 profile_level=1 profile_mode=kInProcessBackend \ |
| 55 | + profile_size=16384 opencl=1 neon=1 os=linux arch=arm64-v8a -j$(nproc) |
| 56 | + |
| 57 | +# Android(TM) System profiling for CPU platforms. |
| 58 | +scons profile=1 profile_level=0 profile_mode=kSystemBackend\ |
| 59 | + profile_size=32768 opencl=0 neon=1 os=android arch=arm64-v8a -j$(nproc) |
| 60 | +@endcode |
| 61 | + |
| 62 | +@section profiling_modes System vs In‑Process |
| 63 | + |
| 64 | +- `kInProcessBackend`: |
| 65 | + - Tracing engine lives inside the process; no setup or external dependency. |
| 66 | + - Good for local runs; writes `acl.pftrace` at exit. |
| 67 | +- `kSystemBackend`: |
| 68 | + - Connects to system Perfetto daemon; supports multi-process/system-wide traces. |
| 69 | + - Requires traced and traced_probes binaries installed and running manually on linux platforms. |
| 70 | + - Requires a running Perfetto service and appropriate permissions. |
| 71 | + |
| 72 | +@section OS Compatibility Table |
| 73 | + |
| 74 | +| Platform | In-Process | System | |
| 75 | +|----------|--------------------|----------------| |
| 76 | +| Linux | OK (CPU+GPU) | OK (CPU) | |
| 77 | +| Android(TM) | OK (CPU+GPU) | OK (CPU) | |
| 78 | +| macOS | OK (CPU) | Not supported | |
| 79 | + |
| 80 | +@section profiling_limits Limitations |
| 81 | + |
| 82 | +- GPU timestamp reconstruction creates a synthetic GPU timeline and is currently incompatible with System mode (`profile_mode=kSystemBackend`). |
| 83 | +- GPU profiling at `profile_level=1` inserts OpenCL(TM) timing events; this can conflict with other GPU profiling tools. Do not enable simultaneously with external profilers (e.g. Arm NN GPU profiler). |
| 84 | +- At profile_level=1 GPU event timestamps are gathered only after the OpenCL(TM) scheduler (CLScheduler::sync()) waits for the queued work to finish, then projected onto the CPU timeline. This assumes the workload is GPU‑bound and the CPU thread is blocked waiting for the GPU. If the CPU does other work before calling sync, the reported GPU spans are shifted (delayed) by that extra CPU time, making them appear later than when they actually occurred. |
| 85 | + |
| 86 | +*/ |
| 87 | +} // namespace arm_compute |
0 commit comments