+
Skip to content

gagikh/cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Course (Compute Unified Device Architecture)

Title: Fundamentals of Accelerated Computation Using CUDA C/C++
University: Armenian Slavonic University – Lectures & Labs (14 Days)
Instructor: Gagik Hakobyan


📘 Course Outline

Day 1: CUDA Basics and Programming Model

  • CUDA programming model overview
  • Host vs Device
  • GPU architecture fundamentals
  • Thread hierarchy overview

Day 2: Thread Hierarchy & Execution Model

  • SIMD architecture and instructions pipeline
  • Threads, blocks, grids: structure and enumeration
  • Launch configuration and kernel invocation
  • Thread indexing patterns

Day 3: Warp-Level Execution and Control Flow

  • Warp definition and behavior
  • Control flow: if, else, for, while
  • Loop unrolling
  • Divergence impact and avoidance

Day 4: CUDA Memory Types and Management

  • Paged, pinned, and mapped memory
  • Unified memory
  • Allocation strategies

Day 5: Memory Conflicts and Shared Memory

  • Bank conflicts
  • Synchronized memory access
  • Shared, constant, and pitched memory
  • Memory padding

Day 6: Streams and Events

  • Global memory usage
  • Streams and concurrent execution
  • Events and synchronization
  • Streamed read/write patterns

Day 7: Asynchronous Execution Techniques

  • cudaMemcpy: sync vs async
  • Async kernel launches
  • Stream dependencies
  • Event-based timing

Day 8: Warp-Level Intrinsics – Reduction

  • Warp shuffle functions
  • Intra-warp communication
  • Parallel reduction
  • Performance tuning

Day 9: Warp-Level Data Exchange

  • Warp vote functions
  • Inter-thread data exchange
  • Cooperative operations

Day 10: Practical Algorithms

  • Hamming distance matching
  • Bitwise ops
  • Matrix multiplication

Day 11: Textures and Surfaces

  • Texture memory
  • Surface memory
  • Filtering & addressing
  • Zoom/image processing

Day 12: CUDA Graph API

  • Graph recording
  • Kernel + memory op capture
  • Graph launch

Day 13: Cache Behavior and Optimization

  • L1/L2 cache
  • Persistent cache
  • Memory throughput

Day 14: CUDA Libraries

  • cuRAND (random generation)
  • cuBLAS (linear algebra)
  • cuFFT (FFT)
  • Monte Carlo π estimation

📚 Recommended Resources

CUDA Programming Guides

Oxford CUDA Course by Mike Giles

Labs & Exercises

Extra Reading

NVIDIA Course Materials


🛠 CUDA Debugging Tips

# Enable debugging and break on kernel launch
cuda-gdb
set cuda break_on_launch application
cuda device sm warp lane block thread
# Use 'step' to go line by line

---

## 📝 CUDA Exam Topics

The final exam covers both theory and practical knowledge. Key areas include:

- **Kernels & Launch** — syntax, launch parameters, thread indexing  
- **Warp & Operations** — warp execution, divergence, shuffle/vote intrinsics  
- **Shared Memory** — access, `__syncthreads()`, optimization  
- **Paged vs Pinned Memory** — allocation, performance  
- **Atomic Ops & Global Memory** — preventing race conditions  
- **Mapped Memory** — zero-copy, host/device mapping  
- **Memory Transfers & Async Execution**`cudaMemcpy`, stream overlap  
- **Streams & Events** — concurrency, timing, dependencies  
- **CUDA Graphs** — record, launch, optimize workflows  
- **Texture Memory** — filtering, binding, addressing  
- **Bank Conflicts & Cache** — tuning L1/L2, avoiding conflicts

🧠 Tip: Practice writing and debugging CUDA kernels. Focus on memory strategies and performance tuning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载