What's the current state of backend extensibility? How did PyTorch evolve from being a CPU and CUDA only framework to also support AMD ROCm and XLA? What are some problems with adding an out-of-tree backend, and what's some work to make it better?

Further reading:

Script for HIPifying PyTorch's source when enabling ROCm https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.pyPyTorch/XLA https://github.com/pytorch/xla/Brian Hirsh's spec on what out-of-tree backend codegen looks like https://github.com/pytorch/xla/issues/2871