Functionalization is the process by which we remove mutation from autograd graphs in PyTorch, leaving us with a purely functional graph that we can execute in the normal way. Why do we need to do functionalization? What makes it not so easy to do? How do we do it? And how does it compare to mutation removal that you might see in a compiler?

Further reading:

Section 3.1 of this paper on PyTorch AD https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf predates our implementation of inplace autograd but accurately reports the subtleties and correctly predicts the implementation strategy we ended up takingRFC to generalize the functionalization mechanism to be available to arbitrary backends https://github.com/pytorch/rfcs/pull/19Code that handles lazily updating views when the base is updated https://github.com/pytorch/pytorch/blob/e5e095cbe4dbc5a601f98e6134dcbd59c6342d7d/torch/csrc/autograd/variable.cpp#L556-L603