On Thursday, the Financial Times reported that OpenAI has dramatically minimized its safety testing timeline.
Also: The top 20 AI tools of 2025 - and the No. 1 thing to remember when you use them
Eight people who are either staff at the company or third-party testers told FT that they had "just days" to complete evaluations on new models -- a process they say they would normally be given "several months" for.
Evaluations are what can surface model risks and other harms, such as whether a user could jailbreak a model to provide instructions for creating a bioweapon. For comparison, sources told FT that OpenAI gave them six months to review GPT-4 before it was released -- and that they only found concerning capabilities after two months.
Also: Is OpenAI doomed? Open-source models may crush it, warns expert
Sources added that OpenAI's tests are not as thorough as they used to be and lack the necessary time and resources to properly catch and mitigate risks. "We had more thorough safety testing when [the technology] was less important," one person, who is currently testing o3, the full version of o3-mini, told FT. They also described the shift as "reckless" and "a recipe for disaster."
The sources attributed the rush to OpenAI's desire to maintain a competitive edge, especially as open-weight models from competitors, like Chinese AI startup DeepSeek, gain more ground. OpenAI is rumored to be releasing o3 next week, which FT's sources say rushed the timeline to under a week.
The shift emphasizes the fact that there is still no government regulation for AI models, including any requirements to disclose model harms. Companies including OpenAI signed voluntary agreements with the Biden administration to conduct routine testing with the US AI Safety Institute, but records of those agreements have quietly fallen away as the Trump administration has reversed or dismantled all Biden-era AI infrastructure.
Also: OpenAI research suggests heavy ChatGPT use might make you feel lonelier
However, during the open comment period for the Trump administration's forthcoming AI Action Plan, OpenAI advocated for a similar arrangement to avoid navigating patchwork state-by-state legislation.
Outside the US, the EU AI Act will require that companies risk test their models and document results.
Also: The head of US AI safety has stepped down. What now?
"We have a good balance of how fast we move and how thorough we are," Johannes Heidecke, head of safety systems at OpenAI, told FT. Testers themselves seemed alarmed, though, especially considering other holes in the process, including evaluating the less-advanced versions of the models that are then released to the public or referencing an earlier model's capabilities rather than testing the new one itself.
Other experts in the field share the sources' anxiety.
As Shayne Longpre, an AI researcher at MIT, told , evolving AI systems are getting more access to data streams and, with the ongoing explosion of AI agents, software tools. This means "the surface area for flaws in AI systems is growing larger and larger," he explained. Longpre recently co-authored a call from researchers at MIT and Stanford that asked AI companies to "invest in the needs of third-party, independent researchers" to better serve AI testing.
Also: This new AI benchmark measures how much models lie
"As [AI systems] become more capable, they are being used in new, often dangerous, and unexpected ways, from AI therapists dispensing medical advice, acting as human companions and romantic partners, or writing critical software security code. De-risking these systems can take significant time, and require subject matter expertise from dozens of disciplines," Longpre noted.
With more people using AI tools every day, Longpre notes internal testing teams aren't sufficient. "More time to investigate these systems for AI safety and security issues is important. But even more important is the need to prioritize truly third-party access and testing: only the broader community of users, academics, journalists, and white-hack hackers can scale to cover the surface area of flaws, expertise, and diverse languages these systems now serve."
Also: The Turing Test has a problem - and OpenAI's GPT-4.5 just exposed it
To support this, Longpre suggests companies create bug bounties and disclosure programs for multiple types of AI flaws, make red-teaming available to a wider range of testers, and provide those testers' findings with legal protections.
Want more stories about AI? Sign up for Innovation, our weekly newsletter.