Skip to content

Commit e3b6f95

Browse files
committed
Updated the content of each modules
1 parent ea59485 commit e3b6f95

File tree

12 files changed

+44
-189
lines changed

12 files changed

+44
-189
lines changed

HIP_Course.md

Lines changed: 0 additions & 20 deletions
This file was deleted.

README.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
44
[![CUDA](https://img.shields.io/badge/CUDA-12.9.1-76B900?logo=nvidia)](https://developer.nvidia.com/cuda-toolkit)
5-
[![ROCm](https://img.shields.io/badge/ROCm-latest-red?logo=amd)](https://rocmdocs.amd.com/)
5+
[![ROCm](https://img.shields.io/badge/ROCm-7.0-red?logo=amd)](https://rocmdocs.amd.com/)
66
[![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker)](https://www.docker.com/)
7-
[![Examples](https://img.shields.io/badge/Examples-70%2B-green)](modules/)
7+
[![Examples](https://img.shields.io/badge/Examples-71-green)](modules/)
88
[![CI](https://img.shields.io/badge/CI-GitHub%20Actions-2088FF?logo=github-actions)](https://github.com/features/actions)
99

1010
**A comprehensive, hands-on educational project for mastering GPU programming with CUDA and HIP**
@@ -35,7 +35,7 @@
3535
**GPU Programming 101** is a complete educational resource for learning modern GPU programming. This project provides:
3636

3737
- **9 comprehensive modules** covering beginner to expert topics
38-
- **70+ working code examples** in both CUDA and HIP
38+
- **71 working code examples** in both CUDA and HIP
3939
- **Cross-platform support** for NVIDIA and AMD GPUs
4040
- **Production-ready development environment** with Docker
4141
- **Professional tooling** including profilers, debuggers, and CI/CD
@@ -197,10 +197,11 @@ This architectural knowledge is essential for writing efficient GPU code and is
197197
|---------|-------------|
198198
| 🎯 **Complete Curriculum** | 9 progressive modules from basics to advanced topics |
199199
| 💻 **Cross-Platform** | Full CUDA and HIP support for NVIDIA and AMD GPUs |
200-
| 🐳 **Docker Ready** | Complete containerized development environment |
201-
| 🔧 **Production Quality** | Professional build systems, testing, and profiling |
200+
| 🐳 **Docker Ready** | Complete containerized development environment with CUDA 12.9.1 & ROCm 7.0 |
201+
| 🔧 **Production Quality** | Professional build systems, auto-detection, testing, and profiling |
202202
| 📊 **Performance Focus** | Optimization techniques and benchmarking throughout |
203203
| 🌐 **Community Driven** | Open source with comprehensive contribution guidelines |
204+
| 🧪 **Advanced Libraries** | Support for Thrust, MIOpen, and production ML frameworks |
204205

205206
## 🚀 Quick Start
206207

@@ -224,7 +225,7 @@ cd modules/module1 && make && ./build/01_vector_addition_cuda
224225
For direct system installation:
225226

226227
```bash
227-
# Prerequisites: CUDA 11.0+ or ROCm 5.0+, GCC 7+, Make
228+
# Prerequisites: CUDA 12.0+ or ROCm 7.0+, GCC 9+, Make
228229

229230
# Clone and build
230231
git clone https://github.com/AIComputing101/gpu-programming-101.git
@@ -265,7 +266,7 @@ Our comprehensive curriculum progresses from fundamental concepts to production-
265266
| [**Module 8**](modules/module8/) | 🚀 Expert | 10-12h | **Domain Applications** | ML, Scientific Computing | 4 |
266267
| [**Module 9**](modules/module9/) | 🚀 Expert | 6-8h | **Production Deployment** | Libraries, Integration, Scaling | 4 |
267268

268-
**📈 Progressive Learning Path: 70+ Examples • 50+ Hours • Beginner to Expert**
269+
**📈 Progressive Learning Path: 71 Examples • 50+ Hours • Beginner to Expert**
269270

270271
### Learning Progression
271272

@@ -313,7 +314,7 @@ Module 5: Performance Tuning
313314
### Software Requirements
314315

315316
#### Operating System Support
316-
- **Linux** (Recommended): Ubuntu 22.04 LTS, RHEL 8/9, SLES 15 SP5
317+
- **Linux** (Recommended): Ubuntu 22.04/24.04 LTS, RHEL 8/9, SLES 15 SP5
317318
- **Windows**: Windows 10/11 with WSL2 recommended for optimal compatibility
318319
- **macOS**: macOS 12+ (Metal Performance Shaders for basic GPU compute)
319320

@@ -322,7 +323,7 @@ Module 5: Performance Tuning
322323
- **Driver Requirements**:
323324
- Linux: 550.54.14+ for CUDA 12.4+
324325
- Windows: 551.61+ for CUDA 12.4+
325-
- **ROCm Platform**: 6.0+ (Docker uses ROCm latest)
326+
- **ROCm Platform**: 7.0+ (Docker uses ROCm 7.0)
326327
- **Driver Requirements**: Latest AMDGPU-PRO or open-source AMDGPU drivers
327328
- **Kernel Support**: Linux kernel 5.4+ recommended
328329

@@ -338,6 +339,8 @@ Module 5: Performance Tuning
338339
- **Profiling**: Nsight Compute, Nsight Systems (NVIDIA), rocprof (AMD)
339340
- **Debugging**: cuda-gdb, rocgdb, compute-sanitizer
340341
- **Libraries**: cuBLAS, cuFFT, rocBLAS, rocFFT (for advanced modules)
342+
- **ML Libraries**: Thrust (NVIDIA), MIOpen (AMD) for deep learning applications
343+
- **System Management**: NVML (NVIDIA), ROCm SMI (AMD) for hardware monitoring
341344

342345
### Performance Expectations by Hardware Tier
343346

@@ -381,28 +384,42 @@ Experience the full development environment with zero setup:
381384
- 📦 Isolated and reproducible builds
382385
- 🧹 Easy cleanup when done
383386

387+
**Container Specifications:**
388+
- **CUDA**: NVIDIA CUDA 12.9.1 on Ubuntu 22.04
389+
- **ROCm**: AMD ROCm 7.0 on Ubuntu 24.04
390+
- **Libraries**: Production-ready toolchains with debugging support
391+
384392
**[📖 Complete Docker Guide →](docker/README.md)**
385393

386394
## 🔧 Build System
387395

396+
Our advanced build system features automatic GPU vendor detection and optimized configurations:
397+
388398
### Project-Wide Commands
389399
```bash
390-
make all # Build all modules
400+
make all # Build all modules with auto-detection
391401
make test # Run comprehensive tests
392402
make clean # Clean all artifacts
393-
make check-system # Verify GPU setup
403+
make check-system # Verify GPU setup and dependencies
394404
make status # Show module completion status
395405
```
396406

397407
### Module-Specific Commands
398408
```bash
399409
cd modules/module1/examples
400-
make # Build all examples in module
410+
make # Build all examples with vendor auto-detection
401411
make test # Run module tests
402412
make profile # Performance profiling
403413
make debug # Debug builds with extra checks
404414
```
405415

416+
### Advanced Build Features
417+
- **Automatic GPU Detection**: Detects NVIDIA/AMD hardware and builds accordingly
418+
- **Production Optimization**: `-O3`, fast math, architecture-specific optimizations
419+
- **Debug Support**: Full debugging symbols and validation checks
420+
- **Library Management**: Automatic detection of optional dependencies (NVML, MIOpen)
421+
- **Cross-Platform**: Single Makefile supports both CUDA and HIP builds
422+
406423
## Performance Expectations
407424

408425
| Module Level | Typical GPU Speedup | Memory Efficiency | Code Quality |

diagnose_rocm.sh

Lines changed: 0 additions & 142 deletions
This file was deleted.

modules/module1/content.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Module 1: Foundations of GPU Programming with CUDA and HIP
22
*Heterogeneous Data Parallel Computing*
33

4-
> Environment note: Examples are validated in containers using CUDA 12.9.1 (Ubuntu 22.04) and ROCm latest (rocm/dev-ubuntu-22.04:latest). Using Docker is recommended for a consistent setup.
4+
> Environment note: Examples are validated in containers using CUDA 12.9.1 (Ubuntu 22.04) and ROCm 7.0 (Ubuntu 24.04). The advanced build system automatically detects your GPU vendor and optimizes accordingly. Using Docker is recommended for a consistent setup.
55
66
## Learning Objectives
77
After completing this module, you will be able to:
@@ -127,7 +127,7 @@ nvidia-smi
127127
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
128128
sudo dpkg -i cuda-keyring_1.1-1_all.deb
129129
sudo apt-get update
130-
sudo apt-get -y install cuda-toolkit-12-4
130+
sudo apt-get -y install cuda-toolkit-12-6
131131

132132
# Add to PATH
133133
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
@@ -145,9 +145,9 @@ nvidia-smi
145145

146146
**Step 1: Install ROCm**
147147
```bash
148-
# Ubuntu 22.04
149-
wget https://repo.radeon.com/amdgpu-install/6.0/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb
150-
sudo apt install ./amdgpu-install_6.0.60000-1_all.deb
148+
# Ubuntu 22.04/24.04
149+
wget https://repo.radeon.com/amdgpu-install/7.0/ubuntu/jammy/amdgpu-install_7.0.60000-1_all.deb
150+
sudo apt install ./amdgpu-install_7.0.60000-1_all.deb
151151
sudo amdgpu-install --usecase=hiplibsdk,rocm
152152

153153
# Add user to video group

modules/module2/content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Module 2: Advanced GPU Memory Management and Optimization
22
*Mastering GPU Memory Hierarchies and Performance Optimization*
33

4-
> Environment note: Examples are tested in Docker containers with CUDA 12.9.1 and ROCm latest (rocm/dev-ubuntu-22.04:latest). Prefer Docker for reproducible builds.
4+
> Environment note: Examples are tested in Docker containers with CUDA 12.9.1 (Ubuntu 22.04) and ROCm 7.0 (Ubuntu 24.04). The improved build system automatically optimizes memory access patterns. Prefer Docker for reproducible builds.
55
66
## Learning Objectives
77
After completing this module, you will be able to:

modules/module3/content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Module 3: Advanced GPU Algorithms and Parallel Patterns
22
*Mastering High-Performance Parallel Computing Algorithms*
33

4-
> Environment note: Use the provided Docker images (CUDA 12.9.1, ROCm latest) for consistent toolchains across platforms.
4+
> Environment note: Use the provided Docker images (CUDA 12.9.1 on Ubuntu 22.04, ROCm 7.0 on Ubuntu 24.04) with automatic GPU detection for consistent toolchains across platforms.
55
66
## Learning Objectives
77
After completing this module, you will be able to:

modules/module4/content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Module 4: Advanced GPU Programming - Multi-GPU, Streams, and Scalability
22

3-
> Environment note: Examples are validated with CUDA 12.9.1 and ROCm latest in Docker containers. Multi-GPU sections may require appropriate hardware and drivers.
3+
> Environment note: Examples are validated with CUDA 12.9.1 (Ubuntu 22.04) and ROCm 7.0 (Ubuntu 24.04) in Docker containers. Multi-GPU sections may require appropriate hardware and drivers. Auto-detection build system optimizes for your platform.
44
55
## Overview
66

modules/module5/content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Module 5: Performance Considerations and GPU Optimization
22

3-
> Environment note: Examples and profiling workflows are validated using Docker images with CUDA 12.9.1 and ROCm latest (rocm/dev-ubuntu-22.04:latest) for consistent toolchains.
3+
> Environment note: Examples and profiling workflows are validated using Docker images with CUDA 12.9.1 (Ubuntu 22.04) and ROCm 7.0 (Ubuntu 24.04) for consistent toolchains. Enhanced build system includes profiling integrations.
44
55
## Table of Contents
66
1. [Introduction to GPU Performance Optimization](#introduction)

modules/module6/content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Module 6: Fundamental Parallel Algorithms - Comprehensive Guide
22

3-
> Environment note: The examples and benchmarks in this module are tested in Docker with CUDA 12.9.1 and ROCm latest to ensure reproducibility.
3+
> Environment note: The examples and benchmarks in this module are tested in Docker with CUDA 12.9.1 (Ubuntu 22.04) and ROCm 7.0 (Ubuntu 24.04) to ensure reproducibility. Recent algorithm fixes improve performance.
44
55
## Introduction
66

modules/module7/content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Module 7: Advanced Algorithmic Patterns - Comprehensive Guide
22

3-
> Environment note: Use the provided Docker environment (CUDA 12.9.1, ROCm latest) for consistent builds and tools across platforms.
3+
> Environment note: Use the provided Docker environment (CUDA 12.9.1 on Ubuntu 22.04, ROCm 7.0 on Ubuntu 24.04) for consistent builds and tools across platforms. Recent algorithmic pattern fixes included.
44
55
## Introduction
66

0 commit comments

Comments
 (0)