Restricted AI Models and Opaque Benchmarks Threaten the Emerging AI Insurance Market

A new category of deliberately restricted frontier AI models, combined with unreliable evaluation methods, is undermining insurers' ability to price AI risk accurately, according to Gallagher Re.

By: R&I Editorial Team | June 10, 2026

A Gallagher Re report warns that Anthropic’s restricted release of its Mythos large language model raises a fundamental underwriting question: how can insurers assess and price AI risk when the most advanced models are unavailable for independent scrutiny?

The release of Mythos under a restricted access arrangement establishes a fourth category of frontier AI model, sitting between open-source, open-weight and fully proprietary approaches and accessible only to a select group of vetted partners. The implications extend to coverage availability, premium pricing and the long-term insurability of AI-related risks.

Restricted Access Revives the Dynamics Behind the Last Cyber Hard Market

Anthropic’s stated rationale for limiting distribution of Mythos is the model’s reported capability at detecting software vulnerabilities, creating exploits and chaining attacks across operating systems and browsers.

The UK AI Security Institute evaluated the model and found it performed strongly on offensive cyber tasks. Gallagher Re draws a direct parallel to the conditions that preceded the 2017 WannaCry and NotPetya attacks, when leaked NSA-developed exploits lowered barriers to entry for sophisticated attacks, increased the automation and scalability of destructive campaigns and reduced friction in facilitating ransomware payments.

The report identifies AI-enabled attacks as replicating all three ingredients, with one additional factor: speed. Data cited in the report shows mean time-to-exploitation for disclosed vulnerabilities has compressed from more than two years in 2018 to roughly 10 hours in 2026.

Gallagher Re also notes that restriction may only delay, not eliminate, the threat. The UK AI Security Institute’s own testing found that OpenAI’s GPT-5.5 performed comparably to Mythos on the same offensive cyber evaluations, meaning models with similar capabilities are already publicly available. Gallagher Re describes Deepseek v4 as positioning Chinese labs as roughly six months behind their U.S. counterparts, suggesting the window of any protection afforded by Anthropic’s access controls is narrow.

Benchmark-Based Evaluation Falls Short of What Insurers Need

A separate but related problem identified by Gallagher Re is that standard AI model evaluation methods are poorly suited to insurance underwriting.

Most models are assessed using static benchmarks that measure accuracy on controlled datasets, but the report argues these tools are “not designed to predict real-world loss.” Several widely used benchmarks are now considered saturated, with top scores clustering near the ceiling: Gemini 3.1 Pro scores approximately 95% on GPQA Diamond, which measures advanced scientific reasoning and knowledge, and Claude 4.5 Sonnet scores approximately 98% on HumanEval, which measures programming capability, leaving little ability to differentiate risk quality between insureds.

The report identifies five specific gaps in current evaluation practice:

Benchmarks measure performance rather than behavior, meaning high-scoring models can still hallucinate, make inconsistent decisions or misinterpret instructions in ways that generate legal and regulatory liability.
Models are increasingly trained on benchmark material itself, inflating scores without improving real-world reliability.
Narrow evaluation encourages behavioral homogenization across models, which raises concentration risk for insurers whose portfolios may be heavily exposed to a small number of shared foundation models.
Current methods also do not measure whether failures could be correlated across multiple insureds simultaneously.
Finally, the practical input space for deployed AI is effectively infinite, and static guardrails cannot cover it.

Early Signs of Progress, but Structural Risks Remain

Gallagher Re points to emerging evaluation approaches that begin to address some of these gaps. Epoch AI combines internal and external benchmarks to reduce contamination risk. Artificial Analysis’s Omniscience Index scores hallucination and knowledge calibration as well as correctness.

The report notes that running Artificial Analysis’s Intelligence Index on Claude Opus 4.6 cost just under $5,000, signaling that comprehensive behavioral testing carries real costs but also real value for underwriters.

If restricted-access models become the norm and independent evaluators are excluded from testing them alongside proprietary models, the report concludes, insurers will be left pricing uncertainty rather than risk. That outcome, Gallagher Re argues, “almost always leads to higher premiums, narrower coverage or both.”

Obtain the full report here. &

The R&I Editorial Team can be reached at [email protected].

How Energy Sector Volatility Is Reshaping D&O Risk

Geopolitical shocks, federal policy whiplash and surging demand from data centers are accelerating exposures for directors and officers in the energy sector. Choosing the right risk transfer partner should be top of mind for many.

By: Berkshire Hathaway Specialty Insurance | July 1, 2026

The energy sector has rarely faced a more dynamic stretch. In just the past five to six years, oil prices went negative during COVID, the Russia-Ukraine war elevated global energy prices and instability in Venezuela portended major changes to that country’s oil output.

Then a blockade of the Strait of Hormuz sent commodity prices soaring within weeks. Layer on top of that an unprecedented buildout of AI-driven data centers and the race to power them, and what used to be considered “shock” events now look more like the baseline.

“Geopolitical and regulatory shocks are the base case going forward,” said Dan Sanford, Vice President, Energy Practice Leader – Executive and Professional Lines at Berkshire Hathaway Specialty Insurance (BHSI).

“They’ve always been there, but the frequency has accelerated.”

For directors and officers across exploration and production, oilfield services, midstream, downstream, utilities, renewables and emerging technologies like modular nuclear reactors, that volatility translates directly into heightened litigation risk.

A Sector Transformed by War, AI and Capital

Dan Sanford, Vice President, Energy Practice Leader – Executive and Professional Lines at Berkshire Hathaway Specialty Insurance (BHSI)

The traditional commodity-exposed side of energy continues to wrestle with pricing it cannot control. “Even the best, most sophisticated operators in any part of the energy sector that are exposed to commodities are price takers, not price setters,” Sanford said. “You cannot allocate capital long term based on any sort of pricing thesis of elevated pricing.”

That tension is on full display today. With West Texas Intermediate Crude trading at elevated levels, public exploration and production companies are being pushed to grow even as shareholders demand continued capital discipline. New production takes six-plus months to bring online, takeaway constraints in basins like the Permian limit how much oil can be produced and no one knows where prices will land by the time that petroleum hits the market.

“What really matters at the end of the day is how these companies, their executives, and their boards executed compared to what they promised along the way,” Sanford said.

For as we know, when companies fail to back up their promises, shareholder litigation may soon follow.

Meanwhile, the power generation and transmission side of the sector has been transformed almost overnight. Utilities, solar and wind developers, and more recent entrants like geothermal or modular nuclear startups that once operated on long-term investment horizons are now racing to compete for the massive demand crunch driven by AI hyperscalers and industrial users.

“Funding is everywhere now,” Sanford said. “It seems to just be a race to get power generation contracted and established as quickly as possible.”

Federal policy has added another layer of complexity over the past 5 years. Tax subsidy programs that historically covered 20% to 30% of project costs jumped to 50% under the Inflation Reduction Acts, with a heavy tilt toward solar and wind.

Recent legislation has reversed course, requiring qualifying projects to be initiated by July 5, 2026, or completed by December 31, 2027, to receive credits.

“From a D&O and shareholder litigation standpoint, this represents a significant risk because everyone is scrambling to meet those deadlines,” Sanford said. “The outcome is binary. You either get the tax credit or you don’t.” Add utility interconnection backlogs for front-of-the-meter projects, and the runway to deliver shrinks even further.

BHSI has already seen claims tied to tax subsidies and the accounting behind whether projects ultimately qualified. “It’s challenging to have full confidence that all the subsidies and credits promised to shareholders will actually be realized,” Sanford said.

Best Practices for D&O Risk Mitigation

In an environment where shareholder litigation is more material and more expensive than ever, Sanford emphasized that even strong, well-managed companies get sued. The goal is to make the defense as strong as possible.

“We think a measured approach where thoughtful documentation is key makes companies more defensible for any future litigation,” Sanford said. He pointed to several priorities:

Be thoughtful about disclosure. Materiality always matters, but claims have arisen from both disclosing too much and disclosing too little. Guidance, 10-Ks, 10-Qs and interim communications all warrant careful review.

Use third-party advisors – and document appropriately. Engaging law firms, consultants and industry specialists is valuable, but the proper documentation of what companies did with that advice is critical in improving a legal defense.

Right-size your limits. Capacity in the market is not the issue; pricing and adequacy are. Valuations for both private and public energy and energy-adjacent companies have multiplied rapidly, and settlement damages are calculated as a percentage of shareholder value lost.

“As market caps increase, so do the potential amounts for settlement damages, which means companies should be purchasing higher limits,” Sanford said. “There are a lot of companies not buying adequate limits based on where their market cap has gone because it’s happened so fast.”

He credited brokers with industry specialization, particularly those who pair strong energy and technology practices, for helping customers understand and address that need as the two sectors increasingly converge.

A Risk Transfer Partner Built for This Moment

BHSI has leaned into industry specialization as the energy and technology sectors blur together. Sanford and his team work in lockstep with their technology industry counterparts as well as with a dedicated energy claims specialist, ensuring that historical knowledge of energy-specific claims informs every customer conversation.

“Claims is our product. We are in the business of paying claims,” Sanford said. Underwriters are notified and involved throughout the claim process, and claims professionals attend customer meetings, dinners and industry events alongside underwriters.

To help customers stay ahead of emerging exposures, BHSI offers a preferred counsel endorsement that gives primary D&O and Side A customers four hours of annual consultation time with partners at a select group of preeminent securities defense law firms. Topics can range from shareholder derivative claim primers and indemnification provisions to D&O defense strategy and considerations around redomiciling from Delaware to states like Texas.

“That time is theirs,” Sanford said. “We think that makes our customers a better D&O risk, and improved corporate governance will benefit them outside of the D&O context as well.”

BHSI also hosts regionalized events that bring these law firm partners together with general counsels, risk professionals and finance leaders to provide their thought leadership on a range of current developments. And for smaller, rapidly scaling private energy companies – particularly tech-enabled ones – BHSI has built a roadmap designed to provide steady pricing, limits and retention paths from early funding rounds through IPO.

“There is now a much greater pipeline of developmental stage energy companies, and we’re seeing and expect to continue to see an increased pace of IPOs and consolidation,” Sanford said. “Things are moving much faster in this space.”

Looking Ahead

With the July 2026 and December 2027 tax credit deadlines looming, M&A consolidation accelerating, and contract risk growing around front-of-the-meter and behind-the-meter power generation, directors and officers in the energy sector have more to navigate than ever.

“It’s critical to recognize that we’re operating in a very volatile environment,” Sanford said. “Cross your t’s and dot your i’s. Be educated and maintain appropriate documentation. Stay in touch with your carrier and broker, and take advantage of any legal help you can access right now.”

To learn more, visit https://www.bhspecialty.com/.

Berkshire Hathaway Specialty Insurance (www.bhspecialty.com) provides commercial property, casualty, healthcare professional liability, executive and professional lines, transactional liability, surety, marine, travel, programs, accident and health, employer stop loss, homeowners, and multinational insurance. The actual and final terms of coverage for all product lines may vary. It underwrites on the paper of Berkshire Hathaway’s National Indemnity group of insurance companies, which hold financial strength ratings of A++ from AM Best and AA+ from Standard & Poor’s. Based in Boston, Berkshire Hathaway Specialty Insurance has offices in Atlanta, Boston, Chicago, Columbia, Dallas, Houston, Indianapolis, Irvine, Los Angeles, New York, Plymouth Meeting, San Francisco, San Ramon, Seattle, Stevens Point, Adelaide, Auckland, Barcelona, Brisbane, Brussels, Calgary, Cologne, Dubai, Dublin, Frankfurt, Hamburg, Hong Kong, Kuala Lumpur, London, Lyon, Macau, Madrid, Manchester, Melbourne, Milan, Munich, Paris, Perth, Singapore, Stockholm, Sydney, Toronto, and Zurich.

For more information, contact [email protected].

The information contained herein is for general informational purposes only and does not constitute an offer to sell or a solicitation of an offer to buy any product or service. Any description set forth herein must not be relied upon as coverage and does not include all policy terms, conditions, and exclusions. Please refer to the actual policy for complete details of coverage and exclusions.

This article was produced by the R&I Brand Studio, a unit of the advertising department of Risk & Insurance, in collaboration with Berkshire Hathaway Specialty Insurance. The editorial staff of Risk & Insurance had no role in its preparation.

Berkshire Hathaway Specialty Insurance (www.bhspecialty.com) provides commercial property, casualty, healthcare professional liability, executive and professional lines, surety, travel, programs, accident and health, medical stop loss, and homeowners insurance. The actual and final terms of coverage for all product lines may vary. It underwrites on the paper of Berkshire Hathaway's National Indemnity group of insurance companies, which hold financial strength ratings of A++ from AM Best and AA+ from Standard & Poor's.

Restricted AI Models and Opaque Benchmarks Threaten the Emerging AI Insurance Market

Restricted Access Revives the Dynamics Behind the Last Cyber Hard Market

Benchmark-Based Evaluation Falls Short of What Insurers Need

Early Signs of Progress, but Structural Risks Remain

Trending Stories

Weaknesses in Your Cyberattack Resilience Plans? It Might Be Time for a Tabletop Exercise

4 Key Factors Impacting the U.S. Commercial Property Insurance Markets

The Predict & Prevent™ Podcast Episode 5: Harnessing Data to Better Predict and Prevent Losses

Hey, Claims Team, Nuclear Verdicts Got You Down? Partner with Legal to Get Ahead of Ballooning Costs

Insurance-Linked Securities Market Soars Amid Capital Influx

Andrew Pryde Appointed as Group Chief Risk Officer at SiriusPoint Ltd.

2025 Theo Award Winner: PepsiCo

Predict & Prevent® Podcast 26: How AI and Cameras Prevent Cold Chain Disasters

Restricted AI Models and Opaque Benchmarks Threaten the Emerging AI Insurance Market

Restricted Access Revives the Dynamics Behind the Last Cyber Hard Market

Benchmark-Based Evaluation Falls Short of What Insurers Need

Early Signs of Progress, but Structural Risks Remain

Share this article!

Trending Stories

Weaknesses in Your Cyberattack Resilience Plans? It Might Be Time for a Tabletop Exercise

4 Key Factors Impacting the U.S. Commercial Property Insurance Markets

The Predict & Prevent™ Podcast Episode 5: Harnessing Data to Better Predict and Prevent Losses

Hey, Claims Team, Nuclear Verdicts Got You Down? Partner with Legal to Get Ahead of Ballooning Costs

More from Risk & Insurance

Insurance-Linked Securities Market Soars Amid Capital Influx

Andrew Pryde Appointed as Group Chief Risk Officer at SiriusPoint Ltd.

2025 Theo Award Winner: PepsiCo

Predict & Prevent® Podcast 26: How AI and Cameras Prevent Cold Chain Disasters

Sponsored Content by BHSI

How Energy Sector Volatility Is Reshaping D&O Risk

A Sector Transformed by War, AI and Capital

Best Practices for D&O Risk Mitigation

A Risk Transfer Partner Built for This Moment

Looking Ahead

Share this article!