PyTorch Developer Podcast artwork

PyTorch Developer Podcast

82 episodes - English - Latest episode: 3 months ago - ★★★★★ - 35 ratings

The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.

Technology deep learning machine learning pytorch
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

XLA

June 17, 2021 13:00 - 15 minutes - 14.6 MB

What's PyTorch XLA? Why should you care? How is it implemented? How does PyTorch XLA trade off functionality versus ease of performance debugging? What are some new developments in this space? Further reading. XLA's repo has lots of really good docs. Check out https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md and also the main https://github.com/pytorch/xla/blob/master/README.md Alex Suhan's RFC about lazy core https://github.com/pytorch/rfcs/pull/18

TH

June 16, 2021 13:00 - 11 minutes - 10.2 MB

What is TH? Why might you care? What is so horrible about it? What the heck is the generic/ folder? Why are we porting everything to C++? What are some downsides of having ported all our TH code to C++? Further reading.  The TH to ATen porting guide has lots of explanations of old school TH idioms https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide Old notes about refcounting in TH https://github.com/pytorch/pytorch/blob/master/aten/src/README.md

TorchScript

June 15, 2021 13:00 - 19 minutes - 18.2 MB

There is a really good TorchScript overview at https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md and in this 20min podcast, I want to give you some of the highlights from this document.

CMake

June 14, 2021 13:00 - 17 minutes - 16.3 MB

Why is PyTorch's build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it... Further reading. The official CMake documentation is a great help and well worth reading https://cmake.org/documentation If you work in torch/csrc chances are you'll need to edit this file https://github.com/pytorch/pytorch/blob/master/tools...

torchdeploy

June 11, 2021 13:00 - 13 minutes - 12.5 MB

torchdeploy is a way of running multiple Python interpreters inside the same process. It can be used to deploy Python PyTorch programs in situations where the GIL is a problem, not the CPython interpreter. How does it work, and what kind of challenges does it pose for people who want to write code that calls from C++ to Python? Further reading. How the torchdeploy build system works https://dev-discuss.pytorch.org/t/torch-deploy-the-build/238 Description of the single interpreter per Tens...

C++ frontend

June 10, 2021 13:00 - 17 minutes - 15.7 MB

What's the C++ frontend? Why is avoiding templates so important? Why is Tensor a reference type? How do we simulate keyword arguments in C++? Where did the nn Module support in the C++ API come from? Why did we reimplement all modules in C++? How are modules implemented in C++? What are some performance challenges of writing Python in C++, and how are we working around them? Further reading. C++ frontend tutorial https://pytorch.org/tutorials/advanced/cpp_frontend.html Writing Python in C...

PyObject preservation

June 09, 2021 13:00 - 16 minutes - 14.9 MB

Given two separately refcounted objects, how can you arrange for each of them to stay live so long as the other is live? Why doesn't just having a strong-strong or strong-weak reference between the two objects work? What is object resurrection in CPython? What's a finalizer and why does it make things more complicated? How does Python GC work? Further reading. PyObject preservation PR https://github.com/pytorch/pytorch/pull/56017 Sam Gross's original PoC, which works fine if the two objec...

Mobile selective build

June 08, 2021 13:00 - 16 minutes - 14.7 MB

What is mobile selective build? Why are we so obsessed with reducing binary size? How does selective build work? Why doesn't static linking just work? Why can't you just read out the ops used in a TorchScript model to determine what operators you actually need? What are the tradeoffs of statically determining the operator dependency graph versus tracing? What's up with the SELECTIVE_NAME macro? How the heck does selective build work at all when you have multiple mobile apps in a single Buck ...

torch.nn

June 07, 2021 13:00 - 14 minutes - 13.1 MB

What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What's the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules? Further reading: Implementation of nn.Module https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py nn.Module is complicated and th...

Code generation

June 04, 2021 13:00 - 16 minutes - 15.4 MB

Why does PyTorch use code generation as part of its build process? Why doesn't it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation? Further reading. Top level file for the new code generation pipeline https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.py Out of tree external backend code generation from Brian Hirsh: https://github.com...

Why is autograd so complicated

June 03, 2021 13:00 - 15 minutes - 14.4 MB

Why is autograd so complicated? What are the constraints and features that go into making it complicated? What's up with it being written in C++? What's with derivatives.yaml and code generation? What's going on with views and mutation? What's up with hooks and anomaly mode? What's reentrant execution? Why is it relevant to checkpointing? What's the distributed autograd engine? Further reading. Autograd notes in the docs https://pytorch.org/docs/stable/notes/autograd.html derivatives.yaml...

__torch_function__

June 02, 2021 13:00 - 17 minutes - 15.6 MB

What is __torch_function__? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is __torch_function__ a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it? Further reading. __torch_function__ RFC: https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-fu...

TensorIterator

June 01, 2021 13:00 - 17 minutes - 16.3 MB

You walk into the whiteboard room to do a technical interview. The interviewer looks you straight in the eye and says, "OK, can you show me how to add the elements of two lists together?" Confused, you write down a simple for loop that iterates through each element and adds them together. Your interviewer rubs his hands together evilly and cackles, "OK, let's make it more complicated." What does TensorIterator do? Why the heck is TensorIterator so complicated? What's going on with broadcast...

native_functions.yaml

May 28, 2021 13:00 - 15 minutes - 14.2 MB

What does native_functions.yaml have to do with the TorchScript compiler? What multiple use cases is native_functions.yaml trying to serve? What's up with the JIT schema type system? Why isn't it just Python types? What the heck is the (a!) thingy inside the schema? Why is it important that I actually annotate all of my functions accurately with this information? Why is my seemingly BC change to native_functions.yaml actually breaking people's code? Do I have to understand the entire compile...

Serialization

May 27, 2021 13:00 - 17 minutes - 15.7 MB

What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What's the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some...

Continuous integration

May 26, 2021 13:00 - 16 minutes - 15.5 MB

How is our CI put together? What is the history of the CI? What constraints are under the CI? Why does the CI use Docker? Why are build and test split into two phases? Why are some parts of the CI so convoluted? How does the HUD work? What kinds of configurations is PyTorch tested under? How did we decide what configurations to test?  What are some of the weird CI configurations? What's up with the XLA CI? What's going on with the Facebook internal builds?  Further reading. The CI HUD for ...

Stacked diffs and ghstack

May 25, 2021 13:00 - 12 minutes - 11.1 MB

What's a stacked diff? Why might you want to do it? What does the workflow for stacked diffs with ghstack look like? How to use interactive rebase to edit earlier diffs in my stack? How can you actually submit a stacked diff to PyTorch? What are some things to be aware of when using ghstack? Further reading. The ghstack repository https://github.com/ezyang/ghstack/ A decent explanation of how the stacked diff workflow works on Phabricator, including how to do rebases https://kurtisnusbaum...

Shared memory

May 24, 2021 13:00 - 10 minutes - 9.85 MB

What is shared memory? How is it used in your operating system? How is it used in PyTorch? What's shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What's the point of PyTorch's shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory? Further reading. Implementati...

Automatic mixed precision

May 21, 2021 13:00 - 14 minutes - 12.9 MB

What is automatic mixed precision? How is it implemented? What does it have to do with mode dispatch keys, fallthrough kernels? What are AMP policies? How is its cast caching implemented? How does torchvision also support AMP? What's up with Intel's CPU autocast implementation? Further reading. Autocast implementation lives at https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cpp How to add autocast implementations to custom operators that are out of tree https://...

Conjugate views

May 20, 2021 13:00 - 15 minutes - 14.2 MB

What are complex numbers? What is conjugation? Why is conjugation so common in linear algebra? Why would we like conjugation to behave similarly to transposition (and why is matrix multiply with a transposed input so fast?) What is a conjugate view? How is it implemented? What's the relationship between views, laziness and call-by-name evaluation? Further reading. Pull request that adds conjugate views https://github.com/pytorch/pytorch/pull/54987 The idea of conjugate views originally ca...

History and constraints of Tensor

May 19, 2021 13:00 - 14 minutes - 13.5 MB

What historical constraints and design choices lead to the design of Tensor/Storage (and their Impl variants) as they are today? Why do we use intrusive refcounting? Why are we trying to get rid of virtual methods on TensorImpl? Why are there so many frickin' bitfields? Further reading. PyTorch internals blog post http://blog.ezyang.com/2019/05/pytorch-internals/ Writing Python in C++, a manifesto https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto) At time of writ...

How new operators are authored

May 18, 2021 13:00 - 15 minutes - 14.2 MB

What's the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch's system so it can be run end-to-end? What should I expect if I'm writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them? Further reading. The README for the native/ directory, where all kernels get put https://github.com/pytorch/pyt...

The life and death of Variable

May 17, 2021 13:00 - 15 minutes - 14.2 MB

What is a Variable? Why did it exist as a wrapper in the first place? Why did it get removed? How did we remove it? What are some of the lingering consequences of its removal? Further reading: The release notes of PyTorch 0.4 do a good job explaining the user visible consequences of the removal, at the time, including how we "simulate" concepts on Variable that don't make sense anymore https://pytorch.org/blog/pytorch-0_4_0-migration-guide/ Part 1: Removal of Variable wrapper in C++ https...

Backend extensibility

May 14, 2021 13:00 - 15 minutes - 14 MB

What's the current state of backend extensibility? How did PyTorch evolve from being a CPU and CUDA only framework to also support AMD ROCm and XLA? What are some problems with adding an out-of-tree backend, and what's some work to make it better? Further reading: Script for HIPifying PyTorch's source when enabling ROCm https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py PyTorch/XLA https://github.com/pytorch/xla/ Brian Hirsh's spec on what out-of-tree backend c...

The road to structured kernels

May 13, 2021 13:00 - 16 minutes - 15.2 MB

Structured kernels are a new way to write kernels in PyTorch. Why did they take so long? What finally convinced us that we should do them? Why did it end up taking me the better part of a year to only be half done with them? Further reading: Structured kernels RFC https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md Taxonomy of PyTorch operators by shape behavior http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/ Bra...

Functionalization

May 12, 2021 13:00 - 14 minutes - 12.9 MB

Functionalization is the process by which we remove mutation from autograd graphs in PyTorch, leaving us with a purely functional graph that we can execute in the normal way. Why do we need to do functionalization? What makes it not so easy to do? How do we do it? And how does it compare to mutation removal that you might see in a compiler? Further reading: Section 3.1 of this paper on PyTorch AD https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf predates our implement...

Just enough CUDA to be dangerous

May 11, 2021 13:00 - 16 minutes - 15.1 MB

Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won't know how to write a kernel after this episode, but you'll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels. Further reading: PyTorch docs on CUDA semantics...

Inference mode

May 10, 2021 13:00 - 14 minutes - 13.2 MB

What's inference mode? Why doesn't my code run fast if I use no_grad or make sure requires_grad=False? How come inference mode is safe but AutoNonVariableTypeMode is not? Further reading: Inference mode RFC https://github.com/.../rfc0011/RFC-0011-InferenceMode.md Inference mode docs for C++ frontend users https://github.com/.../cpp/source/notes/inference_mode.rst Tracking issue for Python frontend support https://github.com/pytorch/pytorch/issues/56608

Vectorization

May 07, 2021 13:00 - 14 minutes - 13.4 MB

What is vectorization? How do you use it in PyTorch? What are some of the traps and pitfalls of writing vectorized code in PyTorch? Further reading: native/cpu README https://github.com/.../aten/src/ATen/native/cpu/README.md Vec256 classes https://github.com/.../tree/master/aten/src/ATen/cpu/vec256 AVX512 support tracking issue https://github.com/pytorch/pytorch/issues/56187

Dynamic library structure

May 06, 2021 13:00 - 14 minutes - 13.6 MB

Why is PyTorch split into so many libraries? What's the point of these splits? What do Windows, mobile and CUDA have to do with the library splits? Further reading: c10 folder architecture description https://github.com/.../wiki/Software-Architecture-for-c10 Implementation of the TORCH_API visibility macros https://github.com/.../blob/master/c10/macros/Export.h An example of virtual call based hook to break library structure https://github.com/pytorch/pytorch/blob/master/c10/core/impl/De...

History and constraints of the dispatcher

May 05, 2021 05:20 - 17 minutes - 16.2 MB

Why is the dispatcher the way it is today? How did evolve over time, and what constraints got added so that it is the kind of complicated piece it is today? Further reading: How the dispatcher actually works http://blog.ezyang.com/.../lets-talk-about-the-pytorch.../ Zachary DeVito's original version of ATen, before it got merged back into PyTorch mainline https://github.com/zdevito/ATen The multiple dispatch patch: https://github.com/pytorch/pytorch/pull/25653

Binding C++ objects to Python

May 04, 2021 01:19 - 13 minutes - 12.2 MB

In this episode, we will discuss how to bind a C++ object in Python? We'll try to answer the following questions: How does pybind11 do it? What's different about how we implement it for Tensor? What are some downsides of the approach? Note from the future: I recorded and then decided I didn't like my follow up episode about how to preserve PyObjects even when they go dead in Python. Maybe some day! Further reading: Python bindings for Tensor in PyTorch https://github.com/.../csrc/autograd...