I spoke to Trevor Chow about existential risks from AI and techniques to align artificial intelligence with human goals. Specifically we talked about

An introduction to existential risk from Artificial Intelligence
Existing methods for alignment of AI models
Why RLHF might fail in large language models
Whether interpretability research might scale?
New methods being developed to make larger models safer
Regulatory frameworks for the future of AI

---

Send in a voice message: https://podcasters.spotify.com/pod/show/pradyumna-sp/message

Twitter Mentions