Today’s guest is Stuart Russell, and when it comes to AI you might just say, “he wrote the book.” In fact, Stuart is a co-author of the standard textbook that is used to teach AI at universities across the world.  He has also written multiple books for general audiences, testifying to both his range and prolific body of work. 

Stuart is currently a Professor of Computer Science at UC Berkeley (where he is also Director of the Center for Human Compatible AI) and has been a renowned voice in the AI field for years. 

In his latest book —  “Human Compatible: AI and the Problem of Control,” Stuart asserts that if we continue to design AI based on optimizing for fixed objectives (the standard approach), it will evolve in the context of superhuman AI to create disastrous consequences that unfold outside of our control. Stuart also explains this as "the King Midas Problem” of AI. 

Thankfully, he proposes a new approach — derived from inverse reinforcement learning and designated  “provably beneficial AI”— that just might save us from this fate. In this model, AI is designed to 1) optimize for human preferences yet 2) is inherently uncertain about those preferences and 3) deferential to human behavior in figuring those out over time.

So how do we get to a place where this model becomes the industry standard? Stuart walks us through the practical mechanics of standing this up. We’ll discuss the behavioral and collective challenge of identifying human preferences and steps that must first happen through research to change the industry’s foundation for building AI.

We also couldn’t end the conversation without briefly touching on the opportunity to promote human thriving in a new paradigm for the future of work.  Whether you’re a casual observer or have been working in the field for years, my guess is you will come away from this conversation with a better understanding of how we should — how we must — think about controlling systems with capabilities that exceed our own. 

Show Notes: 

3:15 - the economic potential of general purpose AI

7:50 - explanation of the standard AI model

12:40 - fixed objective failure in social media context

16:45 - introduction to provably beneficial AI

25:10 - understanding human preferences through behavior

37:15 - considering a normative framework for negotiating shared preferences

42:00 - standing up beneficial AI in practice 

51:15 - mistakes made by regulators

53:25 -  how to consider an “off switch” in the context of integrated systems

56:10 -   maximizing human potential in future of work