Skip to content

Commit 8069f67

Browse files
committed
docs: Added profiler section in the documentation
Resolves: COMPMID-8345 Change-Id: I718fe7dc951d37efd22f6ebe1d32e1af89ed49c2 Signed-off-by: Walid BEN ROMDHANE <walid.benromdhane@arm.com>
1 parent 1375d3b commit 8069f67

File tree

3 files changed

+89
-0
lines changed

3 files changed

+89
-0
lines changed

docs/Doxyfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -801,6 +801,7 @@ INPUT = ./docs/user_guide/introduction.dox \
801801
./docs/user_guide/conv2d_heuristic.dox \
802802
./docs/user_guide/operator_list.dox \
803803
./docs/user_guide/tests.dox \
804+
./docs/user_guide/profiling.dox \
804805
./docs/user_guide/advanced.dox \
805806
./docs/user_guide/release_version_and_change_log.dox \
806807
./docs/user_guide/errata.dox \

docs/DoxygenLayout.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ SOFTWARE.
3535
<tab type="user" url="@ref conv2d_heuristic" title="Convolution 2D heuristic"/>
3636
<tab type="user" url="@ref operators_list" title="Operator List"/>
3737
<tab type="user" url="@ref tests" title="Validation and benchmarks"/>
38+
<tab type="user" url="@ref profiling" title="Profiling"/>
3839
<tab type="user" url="@ref advanced" title="Advanced"/>
3940
<tab type="user" url="@ref versions_changelogs" title="Release Versions and Changelog"/>
4041
<tab type="user" url="@ref errata" title="Errata"/>

docs/user_guide/profiling.dox

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
///
2+
/// Copyright (c) 2025 Arm Limited.
3+
///
4+
/// SPDX-License-Identifier: MIT
5+
///
6+
/// Permission is hereby granted, free of charge, to any person obtaining a copy
7+
/// of this software and associated documentation files (the "Software"), to
8+
/// deal in the Software without restriction, including without limitation the
9+
/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
10+
/// sell copies of the Software, and to permit persons to whom the Software is
11+
/// furnished to do so, subject to the following conditions:
12+
///
13+
/// The above copyright notice and this permission notice shall be included in all
14+
/// copies or substantial portions of the Software.
15+
///
16+
/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17+
/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18+
/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19+
/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20+
/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21+
/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22+
/// SOFTWARE.
23+
///
24+
namespace arm_compute
25+
{
26+
/**
27+
@page profiling Profiling
28+
29+
@tableofcontents
30+
31+
@section profiling_overview Overview
32+
The Arm Compute Library includes a built-in profiling system based on the [Perfetto](https://perfetto.dev/) tracing framework. It allows developers to collect detailed performance data for CPU and GPU workloads executed via the library.
33+
The profiler can capture timing information for individual functions, kernels, and operations, providing insights into execution time, resource usage, and potential bottlenecks. The collected data can be visualized using the Perfetto UI, enabling developers to analyze performance characteristics and optimize their applications effectively.
34+
35+
36+
@section profiling_build How to Build
37+
38+
Profiling is controlled at build time via the SCons-based build system. The parameters below are consumed by the build system and mapped to preprocessor definitions.
39+
40+
@subsection profiling_build_flags Build Flags
41+
42+
- `profile=1`: Enable the profiler (defines `ACL_PROFILE_ENABLE`).
43+
- `profile_level={0|1}`: Select profiling level (defines `ACL_PROFILE_LEVEL`).
44+
- 0: Basic CPU tracing (low overhead).
45+
- 1: Experimental GPU tracing (reconstructs GPU spans; higher overhead).
46+
- `profile_backend=perfetto`: Currently it's the only and the default profiler backend.
47+
- `profile_mode={kInProcessBackend|kSystemBackend}`: Perfetto mode (defines `ACL_PROFILE_MODE`).
48+
- `profile_size={16384|32768|65536|131072}`: Trace buffer size in KB (defines `ACL_ACL_PROFILE_SIZE_KB`).
49+
50+
@subsection profiling_build_cmd Examples
51+
52+
@code{.bash}
53+
# Linux In-Process profiling with GPU OpenCL(TM) shaders info.
54+
scons profile=1 profile_level=1 profile_mode=kInProcessBackend \
55+
profile_size=16384 opencl=1 neon=1 os=linux arch=arm64-v8a -j$(nproc)
56+
57+
# Android(TM) System profiling for CPU platforms.
58+
scons profile=1 profile_level=0 profile_mode=kSystemBackend\
59+
profile_size=32768 opencl=0 neon=1 os=android arch=arm64-v8a -j$(nproc)
60+
@endcode
61+
62+
@section profiling_modes System vs In‑Process
63+
64+
- `kInProcessBackend`:
65+
- Tracing engine lives inside the process; no setup or external dependency.
66+
- Good for local runs; writes `acl.pftrace` at exit.
67+
- `kSystemBackend`:
68+
- Connects to system Perfetto daemon; supports multi-process/system-wide traces.
69+
- Requires traced and traced_probes binaries installed and running manually on linux platforms.
70+
- Requires a running Perfetto service and appropriate permissions.
71+
72+
@section OS Compatibility Table
73+
74+
| Platform | In-Process | System |
75+
|----------|--------------------|----------------|
76+
| Linux | OK (CPU+GPU) | OK (CPU) |
77+
| Android(TM) | OK (CPU+GPU) | OK (CPU) |
78+
| macOS | OK (CPU) | Not supported |
79+
80+
@section profiling_limits Limitations
81+
82+
- GPU timestamp reconstruction creates a synthetic GPU timeline and is currently incompatible with System mode (`profile_mode=kSystemBackend`).
83+
- GPU profiling at `profile_level=1` inserts OpenCL(TM) timing events; this can conflict with other GPU profiling tools. Do not enable simultaneously with external profilers (e.g. Arm NN GPU profiler).
84+
- At profile_level=1 GPU event timestamps are gathered only after the OpenCL(TM) scheduler (CLScheduler::sync()) waits for the queued work to finish, then projected onto the CPU timeline. This assumes the workload is GPU‑bound and the CPU thread is blocked waiting for the GPU. If the CPU does other work before calling sync, the reported GPU spans are shifted (delayed) by that extra CPU time, making them appear later than when they actually occurred.
85+
86+
*/
87+
} // namespace arm_compute

0 commit comments

Comments
 (0)