Title: Fundamentals of Accelerated Computation Using CUDA C/C++
University: Armenian Slavonic University – Lectures & Labs (14 Days)
Instructor: Gagik Hakobyan
- CUDA programming model overview
- Host vs Device
- GPU architecture fundamentals
- Thread hierarchy overview
- SIMD architecture and instructions pipeline
- Threads, blocks, grids: structure and enumeration
- Launch configuration and kernel invocation
- Thread indexing patterns
- Warp definition and behavior
- Control flow:
if
,else
,for
,while
- Loop unrolling
- Divergence impact and avoidance
- Paged, pinned, and mapped memory
- Unified memory
- Allocation strategies
- Bank conflicts
- Synchronized memory access
- Shared, constant, and pitched memory
- Memory padding
- Global memory usage
- Streams and concurrent execution
- Events and synchronization
- Streamed read/write patterns
cudaMemcpy
: sync vs async- Async kernel launches
- Stream dependencies
- Event-based timing
- Warp shuffle functions
- Intra-warp communication
- Parallel reduction
- Performance tuning
- Warp vote functions
- Inter-thread data exchange
- Cooperative operations
- Hamming distance matching
- Bitwise ops
- Matrix multiplication
- Texture memory
- Surface memory
- Filtering & addressing
- Zoom/image processing
- Graph recording
- Kernel + memory op capture
- Graph launch
- L1/L2 cache
- Persistent cache
- Memory throughput
- cuRAND (random generation)
- cuBLAS (linear algebra)
- cuFFT (FFT)
- Monte Carlo π estimation
- Main Page
- Lecture 1 – Introduction & Vector Add
- Lecture 2 – Memory & Kernel Basics
- Lecture 3 – Control Flow & Atomics
- Lecture 4 – Warp Programming (Advanced)
- Lecture 5 – Libraries (Skip)
- Lecture 6 – Streams & Host Code
# Enable debugging and break on kernel launch
cuda-gdb
set cuda break_on_launch application
cuda device sm warp lane block thread
# Use 'step' to go line by line
---
## 📝 CUDA Exam Topics
The final exam covers both theory and practical knowledge. Key areas include:
- **Kernels & Launch** — syntax, launch parameters, thread indexing
- **Warp & Operations** — warp execution, divergence, shuffle/vote intrinsics
- **Shared Memory** — access, `__syncthreads()`, optimization
- **Paged vs Pinned Memory** — allocation, performance
- **Atomic Ops & Global Memory** — preventing race conditions
- **Mapped Memory** — zero-copy, host/device mapping
- **Memory Transfers & Async Execution** — `cudaMemcpy`, stream overlap
- **Streams & Events** — concurrency, timing, dependencies
- **CUDA Graphs** — record, launch, optimize workflows
- **Texture Memory** — filtering, binding, addressing
- **Bank Conflicts & Cache** — tuning L1/L2, avoiding conflicts
🧠 Tip: Practice writing and debugging CUDA kernels. Focus on memory strategies and performance tuning.