A year ago, Jan Leike co-led OpenAI’s since-disbanded superalignment team with the company’s co-founder and chief scientist, Ilya Sutskever. As the most ambitious of the company’s three safety teams, the superalignment group was focused on ensuring that if AI systems surpass human-level intelligence, they remain under human control. But in May, Leike made a dramatic exit, accusing OpenAI of prioritizing “shiny products” over safety. He wrote that his team had been struggling to access the computing power needed for its research—despite OpenAI dedicating 20% of its total computing resources to “solving the problem of superintelligence alignment,” which was the remit of Leike’s team. (OpenAI’s CEO Sam Altman responded by thanking Leike and saying “we’re committed to doing [more].” In August, Altman said the company was committed to allocating “at least 20% of the computing resources to safety efforts across the entire company.”)
[time-brightcove not-tgx=”true”]Leike now helps lead alignment efforts at competing firm Anthropic. Although a lot has changed in the past few months, Leike’s mission to solve the alignment problem remains unaltered. For Leike, it’s not about which company wins. It’s about ensuring humanity navigates what he sees as an impending transformation driven by superintelligent AI systems.
“I think we made a lot of good progress in the last year,” Leike tells TIME—pointing in particular to an area known to alignment researchers as scalable oversight, which investigates techniques for empowering humans to give better feedback to AI models on complex tasks. The hope is that these tactics will allow humans to guide future systems, even as they do things we cannot fully comprehend. In the future, Leike says he believes aligning larger systems will increasingly be automated by smaller, trusted models, as the science of alignment becomes “more and more mature.”
“You always feel like you’re in a race against time,” he says. “But I’m optimistic we can figure this problem out.”
*Disclosure: OpenAI and TIME have a licensing and technology agreement that allows OpenAI to access TIME’s archives. Investors in Anthropic also include Salesforce, where TIME co-chair and owner Marc Benioff is CEO.