Serialization

PyTorch Developer Podcast

English - May 27, 2021 13:00 - 17 minutes - 15.7 MB - ★★★★★ - 35 ratings
Technology deep learning machine learning pytorch Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Continuous integration

Next Episode: native_functions.yaml

What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What's the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some design principles for the serialization format? Why are there two implementations of serialization in PyTorch? Is the fact that PyTorch uses pickling for serialization mean that our serialization format is insecure?

Further reading.

TorchScript serialization design doc https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/docs/serialization.mdEvolution of serialization formats over time https://github.com/pytorch/pytorch/issues/31877Code pointers:Tensor __reduce_ex__ https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/_tensor.py#L97-L178Python side serialization https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/serialization.py#L384-L499C++ side serialization https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit/serialization