Previous Episode: Inference mode
Next Episode: Functionalization

Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won't know how to write a kernel after this episode, but you'll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.

Further reading:

PyTorch docs on CUDA semantics https://pytorch.org/docs/stable/notes/cuda.htmlThe book I was recommended for learning CUDA when I first showed up at PyToch: Programming Massively Parallel Processors https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861The environment variable that makes CUDA synchronous is CUDA_LAUNCH_BLOCKING=1. cuda-memcheck is also useful for debugging CUDA problems https://docs.nvidia.com/cuda/cuda-memcheck/index.html