![PyTorch Developer Podcast artwork](https://is3-ssl.mzstatic.com/image/thumb/Podcasts115/v4/5d/4e/01/5d4e0127-9482-b8e3-6f59-59eaf50a21d9/mza_11274935194810526674.jpg/100x100bb.jpg)
Just enough CUDA to be dangerous
PyTorch Developer Podcast
English - May 11, 2021 13:00 - 16 minutes - 15.1 MB - ★★★★★ - 35 ratingsTechnology deep learning machine learning pytorch Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed
Previous Episode: Inference mode
Next Episode: Functionalization
Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won't know how to write a kernel after this episode, but you'll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.
Further reading:
PyTorch docs on CUDA semantics https://pytorch.org/docs/stable/notes/cuda.htmlThe book I was recommended for learning CUDA when I first showed up at PyToch: Programming Massively Parallel Processors https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861The environment variable that makes CUDA synchronous is CUDA_LAUNCH_BLOCKING=1. cuda-memcheck is also useful for debugging CUDA problems https://docs.nvidia.com/cuda/cuda-memcheck/index.html