Skip to content

Commit b642940

Browse files
authored
Add GSoC final presentation and wrap-up blog (#346)
* Add GSoC final presentation and wrap-up blog * Update spellings
1 parent 66c9baf commit b642940

File tree

5 files changed

+121
-4
lines changed

5 files changed

+121
-4
lines changed

.github/actions/spelling/allow/terms.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ pubpic
138138
recomputations
139139
ROOFIT
140140
Sacado
141+
Sema
141142
SKLLVM
142143
SNL
143144
SNSFPI

_data/crconlist2025.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,11 +55,11 @@
5555
5656
This project achieved Clad support for both forward and reverse mode differentiation
5757
of common OpenMP directives (parallel, parallel for) and clauses (private,
58-
firstprivate, lastprivate, shared, atomic, reduction) by implementing OpenMP-related
58+
private, firstprivate, shared, reduction) by implementing OpenMP-related
5959
AST parsing and designing corresponding differentiation strategies. Additional
6060
contributions include example applications and comprehensive tests.
6161
62-
# slides: /assets/presentations/...
62+
slides: /assets/presentations/Jiayang_Li_GSoS25_final.pdf
6363

6464
- title: "Using ROOT in the field of Genome Sequencing"
6565
speaker:

_data/standing_meetings.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
date: 2025-11-13 15:00:00 +0200
88
speaker: "Abhinav Kumar"
99
link: "[Slides](/assets/presentations/Abhinav_Kumar_GSoC25_final.pdf)"
10-
- title: "Summary: Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
10+
- title: "Summary: Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
1111
date: 2025-11-13 15:30:00 +0200
1212
speaker: "Maksym Andriichuk"
1313
link: "[Slides](/assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf)"
@@ -453,4 +453,7 @@
453453
date: 2025-11-13 16:20:00 +0200
454454
speaker: "Aditya Pandey"
455455
link: "[Slides](/assets/presentations/Aditya_Pandey_GSoC2025_final.pdf)"
456-
456+
- title: "Final Presentation: Enable Automatic Differentiation of OpenMP Programs with Clad"
457+
date: 2025-11-13 15:40:00 +0200
458+
speaker: "Jiayang Li"
459+
link: "[Slides](/assets/presentations/Jiayang_Li_GSoS25_final.pdf)"
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: "Wrapping Up GSoC 2025: Enable Automatic Differentiation of OpenMP Programs with Clad"
3+
layout: post
4+
excerpt: "A summary of my GSoC 2025 project focusing on OpenMP support to Clad, enabling automatic differentiation of multi-threaded C++ programs."
5+
sitemap: false
6+
author: Jiayang Li
7+
permalink: blogs/gsoc25_jiayangli_wrapup_blog/
8+
banner_image: /images/blog/gsoc-banner.png
9+
date: 2025-11-14
10+
tags: gsoc llvm clad openmp automatic-differentiation
11+
---
12+
13+
## 1. Overview
14+
15+
Clad is a source-to-source AD library, implemented as a Clang plugin, that constructs derivative code directly on the Abstract Syntax Tree (AST), performing AD at compile-time to generate precise C++ derivative functions.
16+
17+
The core goal of this GSoC project was: **To implement automatic differentiation in Clad for OpenMP-parallelized programs, ensuring the generated derivative code also preserves the parallel structure.** This allows for leveraging multi-core parallel acceleration while obtaining derivatives.
18+
19+
## 2. Technical Challenges
20+
21+
Extending automatic differentiation to OpenMP programs presented several key challenges:
22+
23+
1. **Complexity of OpenMP Features and Nodes in Clang AST:** OpenMP is represented in Clang through a series of specialized AST nodes and relies on `CapturedStmt` to capture external variables used within parallel regions. Constructing these nodes requires:
24+
- Correctly handling various OpenMP directives and clauses.
25+
- Properly establishing the capture list for the parallel region.
26+
- Ensuring the newly generated derivative function's AST is compatible with Clang's semantic checks and code generation pipeline.
27+
2. **Variable Scopes and Data Attributes in OpenMP Regions:** In OpenMP, variables can be declared with attributes like `shared`, `private`, `firstprivate`, etc. In the forward and reverse computation paths, each thread's read/write access and variable lifetimes differ. To ensure derivative correctness, we needed to:
28+
- Accurately model the visibility of each variable in different scopes within the original program.
29+
- Maintain this scope relationship when generating derivative code.
30+
- Correctly map forward-pass variables and intermediate results to the reverse-pass.
31+
3. **Thread-Safe Storage of the "Tape" (Intermediate Values):** Reverse-mode AD requires recording intermediate results for use during backpropagation. In a parallel environment:
32+
- Each thread must maintain its own stack of intermediate values.
33+
- Threads must not interfere with each other, avoiding race conditions.
34+
- The design must guarantee the thread-private nature of the tape and consistency in access order.
35+
4. **Determinism of Schedule Replay and Reverse Traversal Order:** Backpropagation for an OpenMP parallel loop demands that the reverse pass strictly reproduces the thread-iteration allocation of the forward pass. Furthermore, each thread must execute its iterations in the *reverse* order to ensure dependencies are met. This requires:
36+
- Theoretically understanding and formalizing iteration chunking and thread mapping under static scheduling.
37+
- Implementing a lightweight runtime helper interface to return identical iteration chunks for both forward and reverse passes.
38+
- Traversing these chunks in reverse order during the reverse pass.
39+
40+
The common theme of these challenges is the need to simultaneously manage Clang AST implementation details, OpenMP's parallel semantics, and the mathematical correctness of automatic differentiation.
41+
42+
## 3. Implementation Methods and Key Designs
43+
44+
### 3.1. Theoretical Basis and Overall Strategy
45+
46+
The theoretical foundation for this project was primarily drawn from the [paper](https://arxiv.org/pdf/2111.01861) on OpenMP AD by the Tapenade team. They provided a systematic demonstration and implementation of OpenMP AD on the Fortran platform, offering a complete theoretical framework for designing reverse-mode in a parallel context, including:
47+
48+
- How to organize forward and reverse computation order in a multi-threaded scenario.
49+
- How to handle data attributes corresponding to different OpenMP clauses.
50+
- How to safely replay iteration intervals under static scheduling.
51+
52+
Building on this, the project migrated these concepts to the Clad/Clang ecosystem. Since OpenMP constructs are converted to AST nodes during Clang's semantic analysis phase, the overall strategy was:
53+
54+
- Extend Clad's AST visitors with `Visit` methods for OpenMP-related nodes.
55+
- Use a specialized differentiator for OpenMP loops instead of reusing the logic for standard `for` loops.
56+
- Use Clang's native OpenMP construction interfaces to create new parallel regions and capture lists, allowing the generated derivative function to integrate naturally with the existing compiler pipeline.
57+
58+
### 3.2. Forward-Mode Support ([#1491](github.com/vgvassilev/clad/pull/1491))
59+
60+
A key characteristic of forward-mode is that the derivative propagation's execution order is identical to the original program. Therefore, for forward-mode with OpenMP, there were two crucial observations:
61+
62+
1. **The parallel structure can be directly reused:** The original OpenMP parallel region, loop partitioning, and thread scheduling can all be preserved.
63+
2. **Variable scope relationships** in the derivative function can mirror the original function; we only need to introduce corresponding derivative variables in the same scope.
64+
65+
Based on these observations, the forward-mode implementation primarily involved:
66+
67+
- Adding corresponding derivative variables and accumulation logic to the existing OpenMP parallel loop.
68+
- For variables with a `reduction` clause, synchronously adding a corresponding derivative reduction item in the derivative function. This ensures that derivative accumulation is also parallel-safe in a multi-threaded environment.
69+
- Utilizing existing Clad infrastructure to generate consistent forward-mode derivative functions for different parameter types (scalars, arrays, etc.).
70+
71+
Overall, OpenMP support for forward-mode was structurally-clear, relatively straightforward to implement, and was demonstrated during the midterm presentation.
72+
73+
### 3.3. Reverse-Mode Support ([#1641](https://github.com/vgvassilev/clad/pull/1641))
74+
75+
Reverse-mode was significantly more complex, mainly due to the need for fine-grained control over execution order and intermediate state. To address this, several key designs were implemented within Clad:
76+
77+
1. **Specialized "Canonical Loop" Differentiator for OpenMP Loops:** Stacking OpenMP logic directly onto the existing `VisitForStmt` would be overly intrusive and would not provide good control over loop partitioning among threads. Therefore, we introduced a helper function specifically for processing OpenMP loops, used to construct a canonical `for` loop form suitable for OpenMP task division. This "canonical loop" serves both for forward-pass analysis (e.g., iteration counts, tape layout) and as a structural guarantee for the subsequent reverse-pass replay.
78+
2. **Two-Pass Traversal to Build Forward and Reverse Parallel Regions and Capture Lists:** Because Clang's OpenMP implementation uses `CapturedStmt` to capture external variables, constructing a parallel region requires correctly building its capture list. In reverse-mode, we need to generate *both* a forward-pass OpenMP region and a reverse-pass OpenMP region. We adopted a "two-pass" strategy:
79+
- **First Pass:** Construct the forward-pass parallel region body according to the differentiation logic and use Clang's OpenMP Sema interface to finalize the region and build its capture list.
80+
- **Second Pass:** Construct the body of the reverse pass parallel region in a similar manner, but only for capturing variables; the function body still uses the one generated in the first pass.
81+
3. **Scope Transformation and Variable Attribute Mapping:** To correctly set the scope of the differential variable, we needed a clear understanding of variable scope relationships inside and outside OpenMP regions, as well as the correspondence of attributes like `private`, `shared`, and `firstprivate` between the forward and reverse phases. The project used a "scope transformation" mechanism to map the lifecycle of variables from the forward pass to the corresponding structures in the reverse pass, thus ensuring:
82+
- Each thread in the reverse pass accesses the intermediate values *it* produced during the forward pass.
83+
- The accumulation of reduction variables in the reverse pass follows correct inter-thread semantics.
84+
4. **Tape Storage and Schedule Replay:** Reverse-mode uses tapes to record intermediate results from each iteration. In the OpenMP context, these tapes were designed to be thread-private to avoid conflicts.
85+
- Each thread maintains an independent stack for intermediate values.
86+
- Their isolation is guaranteed using OpenMP's thread-private mechanism.
87+
- Simultaneously, to make the reverse-pass iteration match the forward pass exactly, the project designed a small runtime helper interface. This interface reproduces the iteration chunks each thread received during the forward pass (based on static scheduling). The reverse pass calls the same interface to get these chunks and traverses them in reverse order, achieving a "schedule replay" without logging the entire schedule.
88+
89+
Through these designs, reverse-mode for OpenMP scenarios was successfully implemented in Clad, migrating the concepts from theory to a practical compiler plugin.
90+
91+
## 4. Future Work
92+
93+
Due to time and project scope limitations, the current implementation primarily focuses on common OpenMP directives and clauses under static scheduling. Several areas can be extended and polished in the future:
94+
95+
1. **Support for Dynamic Scheduling:** The current schedule replay mechanism relies on the assumption of static scheduling. For dynamically scheduled loops, the runtime must record the actual iteration chunks executed by each thread, and the reverse pass must replay this record.
96+
2. **Support for More OpenMP Clauses and Directives:** The current work focused on common constructs like `parallel`, `for`, and `reduction`. This can be gradually expanded to:
97+
- More fine-grained parallel directives like `atomic` and `simd`.
98+
- More complex nested parallel structures.
99+
- Exploring specialized optimization strategies for these directives while guaranteeing mathematical correctness.
100+
3. **Explore AD for OpenMP Target Offloading Scenarios:** As OpenMP's support for accelerators like GPUs matures, a natural direction is extending AD to OpenMP `target` offloading scenarios, allowing code executed on accelerators to also be automatically differentiated.
101+
102+
## 5. Summary and Acknowledgments
103+
104+
Most of this project's work took place at the Clang AST and semantic analysis level, which is where I learned the most. I had never had the interesting opportunity to develop and debug a compiler before, but this project allowed me to dive deep into a large number of Clang's internal details. At the same time, translating theoretical concepts from papers into a concrete code implementation was an extremely challenging but fascinating task. This GSoC gave me my first systematic experience participating in an open-source compiler project from start to finish, completing the full loop from theory to engineering. This will be of long-term value for my future research and work in compilers, automatic differentiation, and high-performance computing.
105+
106+
Throughout the entire GSoC period, from application to completion, I experienced many things both inside and outside the project. I am extremely grateful to my mentors, Vassil Vassilev and Martin Vassilev, and collaborator, Petro. They remained patient and helpful when I encountered difficulties. I sincerely thank them for their continuous guidance and support!
107+
108+
## Related links
109+
110+
- [LLVM Project](https://github.com/llvm/llvm-project)
111+
- [Clad Repository](https://github.com/vgvassilev/clad)
112+
- [My GitHub](https://github.com/Errant404)
113+
- [Tapenade](https://gitlab.inria.fr/tapenade/tapenade)
1.68 MB
Binary file not shown.

0 commit comments

Comments
 (0)