The US government can now review every frontier AI model built in America before you ever get to use it. On Tuesday, the National Institute of Standards and Technology announced that Google DeepMind, Microsoft, and xAI have signed formal agreements to give pre-release access to their most powerful AI models for national security testing. Combined with existing deals involving OpenAI and Anthropic, that means every major American AI lab now participates in voluntary government evaluation — before launch.
This isn’t a law. Nobody was forced. And that’s what makes it significant. The AI industry just voluntarily built the checkpoint that regulators couldn’t pass through Congress.
What CAISI Actually Does With These Models
The agreements route through the Center for AI Standards and Innovation (CAISI), a division within the US Department of Commerce that sits under NIST. Under the new deals, Google, Microsoft, and xAI will hand over unreleased versions of their frontier models for what NIST calls “pre-deployment evaluations and targeted research to better assess frontier AI capabilities and advance the state of AI security.”
Translation: the government gets to stress-test these models for dangerous capabilities — cyberattack potential, biosecurity risks, weapons design assistance, deception — before they reach the public. CAISI has already completed more than 40 such evaluations, including on models that remain unreleased to this day. Some of those models never shipped. Draw your own conclusions about what was found.
Mythos Made This Inevitable
The timing is not subtle. Last month, Anthropic disclosed that its unreleased Mythos model — described as its most capable ever — performed so aggressively on cybersecurity benchmarks that the company chose not to release it publicly. Mythos reportedly identified thousands of previously unknown zero-day vulnerabilities across every major operating system and web browser in just weeks of testing.
That disclosure sent shockwaves through Washington. The UK’s National Health Service immediately ordered hundreds of open-source GitHub repositories made private. The White House began weighing a formal review process for frontier AI. And now, three more labs have signed onto a framework that makes pre-release government review the de facto industry standard.
Anthropic’s decision to withhold Mythos essentially proved the thesis that AI models can become dangerous enough to warrant government oversight before deployment — not after. Every lab that signed these agreements is implicitly acknowledging that thesis.
The Surprise Is xAI
Google and Microsoft signing government review agreements is predictable. Both companies have massive defense contracts, regulatory exposure in the EU, and institutional incentives to play nice with Washington. The real story is Elon Musk’s xAI joining the same framework.
Musk has spent years railing against government regulation of AI. He sued OpenAI partly over governance concerns. He’s called AI safety regulation an existential threat to innovation. And now his own company is voluntarily handing unreleased Grok models to a government lab for testing.
There are two ways to read this. The charitable interpretation: Musk genuinely believes frontier models need external review, and xAI’s participation is consistent with his stated concerns about AI risk. The cynical interpretation: xAI needs government contracts — the Pentagon just cleared eight companies for classified AI deployment, and xAI is one of them — and signing a CAISI agreement is the cost of admission.
Both interpretations are probably true simultaneously. That’s how Washington works.
Voluntary Today, Mandatory Tomorrow
Here’s what nobody in the AI industry wants to say out loud: voluntary pre-release review is the precursor to mandatory pre-release review. Once every major lab participates, the political argument for codifying it into law becomes trivially easy. “The industry already does this” is the most persuasive sentence a regulator can write in a rule proposal.
The EU is already further along this path. The AI Act requires conformity assessments for high-risk AI systems. China mandates algorithmic reviews before deployment. The US has been the outlier — relying on executive orders and voluntary commitments rather than legislation. These CAISI agreements fill the gap without Congress having to do anything, which is exactly how the most durable regulatory frameworks tend to start.
For startups and smaller AI companies, this creates a two-tier system. Labs with the resources and relationships to engage with CAISI get a de facto seal of approval. Everyone else operates without one. That’s not necessarily a problem today — the agreements are voluntary, and CAISI doesn’t block releases. But if these voluntary reviews become mandatory, the compliance burden could be the moat that separates frontier labs from everyone else.
What 40+ Secret Evaluations Tell You
The most underreported detail in NIST’s announcement is the number: 40 evaluations already completed, including on unreleased models. That means the US government has been quietly reviewing frontier AI capabilities for months, possibly longer, with results that are not public.
Some of those evaluations likely covered Mythos. Some likely covered models from OpenAI and Google that were subsequently modified or shelved. The government now has a dataset of frontier AI capabilities that no external researcher, journalist, or policymaker has access to. They know which models were too dangerous to ship. They know which capabilities were dialed back. And none of that information is available to the public that these models will eventually serve.
This is the trade-off the industry has accepted: transparency with the government in exchange for opacity with everyone else. Whether that’s the right bargain depends entirely on whether you trust NIST more than you trust the labs themselves.
The Verdict
The era of “ship it and see what happens” in frontier AI is over. Every major American AI lab now voluntarily submits its most powerful models for government review before the public gets access. The system is informal, non-binding, and entirely dependent on goodwill — which means it will either become law within two years or collapse the first time a lab decides the competitive cost of waiting for review is too high.
For now, though, the signal is clear: the companies building the most powerful AI systems on Earth have collectively decided that someone outside their walls should check their work before release. That Elon Musk’s company is among them tells you everything about where the industry consensus has landed — even the loudest critic of AI regulation just signed up to be regulated.