AI has made humbling progress in the last decade, matching or exceeding human abilities in many domains. This lecture will identify drivers of this progress and explore what lies ahead. There is a breakneck race to build autonomous, goal-driven AI systems, though these systems could cause large-scale harm if humans fail to control them. Owain will outline the challenge of developing safe AI systems. There has been meaningful progress on this challenge, however we currently lack a detailed solution or plan.
Language models can behave in deceptive and harmful ways even after applying our best safety techniques. Owain will present failure cases in public models such as Claude and Gemini, showing that safety does not generalize to all contexts. Aligned models can also be corrupted. Training on small, narrow datasets can transform them from reliably helpful to broadly malicious ("emergent misalignment"). By looking at the internal mechanisms of these models, we can start to gain a richer understanding of how this corruption occurs.
Frontier language models reason aloud before answering questions. Currently, this reasoning is in plain English, allowing us to catch models in attempts to cheat or take shortcuts. This visibility may be fragile. Future models may reason in opaque formats or encode malicious plans inside normal-looking text. Current models exhibit what Owain refers to as "subliminal learning". Preferences can be transferred between models via apparently meaningless numbers. This includes innocuous preferences (like a love for eagles) but also malicious preferences towards humans.
Considered the “Godfather of AI,” Geoffrey Hinton is a distinguished British-Canadian computer scientist and cognitive psychologist, celebrated for his ground-breaking work in artificial neural networks. As a leading figure in tackling AI's most profound and disruptive dilemmas, Hinton has played pivotal roles at Google and the University of Toronto. With over 800,000 citations and collaborations with the most influential minds in AI, his career continues to be a relentless pursuit of innovation and excellence in the field.
Owain Evans is a leading machine learning researcher specializing in AI alignment and AGI risk. His current work focuses on emergent misalignment, deception, and situational awareness in advanced AI systems. He is the founder of Truthful AI, a research non-profit based in Berkeley, and an affiliate of the Center for Human-Compatible AI at UC Berkeley. Previously, he conducted alignment research at the University of Oxford and earned his PhD at MIT. His work has been featured in the Economist, BBC News, and the Financial Times. He has served as an advisor to nonprofits and foundations in the AI Safety space. A frequent speaker at major academic and industry events, he has presented at over 20 conferences and is recognized for his expertise and leadership in the field of AI safety.
John W. H. Bassett Theatre is located inside the Metro Toronto Convention Centre (MTCC) in downtown Toronto. The MTCC is a large complex, so it might be helpful to enter through the North Building at 255 Front Street West, where the John Bassett Theatre is located. Signs within the building will direct you to the theatre.
The MTCC and John W. H. Bassett Theatre are fully accessible, with ramps and elevators for those requiring assistance.
Featuring expert perspectives from the following leaders in AI research and innovation:
Owain Evans
Berkeley AI Expert
Founder of Truthful AI
Alexey Rubtsov
Senior Research Associate, Global Risk Institute
Tenured Associate Professor of Mathematical Finance, TMU
David Duvenaud
Associate Professor in Computer Science and Statistics, University of Toronto
Founding Member, The Vector Institute
Canada Chair, CIFAR AI
Director, The AI Safety Foundation
Moderated by award-winning Canadian journalist, Farah Nasser.
Featuring expert perspectives from the following leaders in AI research and innovation:
Owain Evans
Berkeley AI Expert
Founder of Truthful AI
Jodie Wallis
Global Chief AI Officer, Manulife
Duncan Cass-Beggs
Executive Director of the Global AI Risks Initiative, CIGI
Moderated by award-winning Canadian journalist, Farah Nasser.