LLM digest: May 2025

Thank you very much for sponsoring my work. Here's the first monthly update from me - I want to highlight the most important new developments relevant to LLMs and my other work in a format that only takes a few minutes to read.

I'd love to hear your feedback on if this is useful and how I can improve it!

For May 2025:

AI-assisted search is genuinely useful now
MCP support added by 3 big players within 8 days of each other
Gemini 2.5 Flash and Veo 3 at Google I/O
Anthropic launched Claude 4
I shipped tool support for my LLM tool

AI-assisted search is genuinely useful now

I've wanted an AI research assistant for a couple of years now. Until recently the products on the market just weren't very good - they had generally bad taste in search queries and hallucinated too often to be useful.

That's changed in the past few months, first with the various Deep Research products but more recently with OpenAI's o3 and o4-mini and then Claude 4.

I wrote more about this in AI assisted search-based research actually works now.

The key trick here is that o3, o4-mini and the Claude 4 models can all now run searches as part of their "reasoning" process. This means they can fetch search results and then consider if those results were relevant and useful and issue further searches until they find what they need.

This trick - running tools in the reasoning phase - works incredibly well. Anthropic have made the same capability (intermixing tools and reasoning) available via their API. Hopefully OpenAI will follow.

Model Context Protocol is suddenly everywhere

MCP, Model Context Protocol, is Anthropic's open standard for exposing tools (and other things, but most people just use it for tools) to LLMs.

There's been a lot of buzz around if for a few months now, but it's mostly been quite difficult for people to use - you have to configure client software like Claude Desktop or Cursor.

In the last two weeks, three major LLM API platforms - OpenAI (May 21st), Anthropic (May 22nd) and now Mistral (May 27th) - have shipped MCP as a feature of their core APIs.

This means you can stand up an MCP server online and then call those vendor APIs passing in the MCP details as an available tool, and the vendors will call your server as part of responding to your prompt.

This makes MCP far easier to integrate with as a developer building apps on top of LLMs. It also makes the dream of MCP as a standard for interoperability between vendors suddenly feel a lot more credible.

Also close to my heart: all three of OpenAI, Mistral and Anthropic now offer a server-side sandboxed Python Code Interpreter tool available via their APIs! It's interesting to see them converging on the same set of key features, which I noted in my review of Mistral's latest API update.

Gemini 2.5 Flash and Veo 3 at Google I/O

This year's Google I/O was almost entirely about AI. They launched a whole bunch of things, but the two that stood out for me were the new Gemini 2.5 Flash and their generative video model, Veo 3.

(I'm delighted to report they also featured a pelican riding a bicycle for a split second in the keynote!)

gemini-2.5-flash-preview-05-20 is one of my new favorite models. It's very cheap - just 15 cents to fill its entire 1 million token input - and appears extremely capable. I've been running huge prompts through it - entire codebases - and getting excellent answers to complex queries for less than ten cents a time. I wrote about one example here, where I dumped a 200,000 token Python API library through it to reverse-engineer and document the underlying HTTP API (results here).

Veo 3 is an extraordinary video generation model - by far the best I've seen. It can produce audio as well, which is the first video model I've tried that does that. The internet is flooded with examples of what it can do.

The other release at I/O that caught my eye is Gemini Diffusion, an experimental new architecture which provides an impressive performance boost - it wrote me a working web app chat interface in just a few seconds. Here's my video of Gemini Diffusion in action.

Anthropic launched Claude 4

Anthropic's latest two models are Claude Opus 4 and Claude Sonnet 4. They launched them at their Code with Claude event in San Francisco - I live blogged the keynote from the event.

I followed that up with two deep dives, first into the Claude 4 system card which is a fascinating piece of real-world science fiction describing some wonderfully weird testing scenarios, including one where Claude 4 threatened to blackmail an engineer in order to avoid being replaced by a new model!

I also wrote about the Claude 4 system prompt, effectively the missing manual for Claude 4 which explain its personality and tooling capabilities in detail, and provides an intriguing example of what prompt engineering looks like from the best practitioners.

I shipped tool support for my LLM tool

This is a feature I've been planning for over e year now: my LLM command-line tool and Python library finally has support for tools!

I previewed this as an alpha a few weeks ago at the annual PyCon US conference, where I presented a workshop on Building software on top of Large Language Models. I've shared the full handout from that workshop, which should be enough to rerun the tutorial on your own.

That alpha is now a stable release, which I described in detail in Large Language Models can run tools in your terminal with LLM 0.26.

I'm really pleased with how this turned out. I think it's the quickest way to get started hacking around with tools across multiple models - far less moving parts than MCP!

Thanks to a flurry of plugin upgrades LLM now supports tool calls across OpenAI, Anthropic, Gemini, Mistral, Ollama, llama-server and GitHub Models.

That's it for May!

Thanks for reading! Please reply with any feedback on how I can do this better. It this newsletter was useful please feel free to forward it to friends who might find it useful too, especially if they might be convinced to sign up to sponsor me for the next one!

Thanks for your support,

Simon Willison https://simonwillison.net/

(I'm also now available for consulting calls over Zoom or similar, you can contact me at contact@simonwillison.net)

simonw/2025-05-may.md Secret

LLM digest: May 2025