(2025-08-29) Webb The Destination For Ai Interfaces Is Do What I Mean

Matt Webb: The destination for AI interfaces is Do What I Mean. David Galbraith has a smart + straightforward way to frame how AI will change the user interface.

First he imagines taking creating prompts and wrapping them up as buttons:
The best prompts now are quite simple, leaving AI to handle how to answer a question. Meanwhile AI chat suffers from the same problem from command lines to Alexa - how can i remember what to ask?

Which honestly would be amazing on its own: I have a few prompts I use regularly including Diane, my transcription assistant, and I have nowhere to keep them or run them or share them except for text files and my terminal history.

And then he uses the concept of buttons to explain how a full AI interface can be truly different:
AI buttons are different from, say Photoshop menu commands in that they can just be a description of the desired outcome rather than a sequence of steps

The buttons concept is not essential for this insight (though it’s necessary for affordances); the final insight is what matters.
I would perhaps say “intent” rather than “semantic.”

there are some intents which are easy to say but can’t be simply met using the bureaucracy of interface elements like buttons, drop-downs, swipes and lists. There are cognitive ergonomic limits to the human interface with software

So removing the interface bureaucracy is not about simplicity but about increasing expressiveness and capability.
What does it look like if we travel down the road of intent-maxing?

There’s a philosophy from the dawn of computing, DWIM a.k.a. Do What I Mean.

Coined by computer scientist Warren Teitelman in 1966 and here explained by Larry Masinter in 1981: DWIM embodies a pervasive philosophy of user interface design.
DWIM is an embodiment of the idea that the user is interacting with an agent who attempts to interpret the user’s request from contextual information.

Now, arguably it should come back and ask for clarifications more often, and in particular DWIM (and AI) interfaces are more successful the more they have access to the user’s context (current situation, history, environment, etc).
But it’s a starting point.

A DWIM AI-powered UI needs maximum access to context (to interpret the user and also for training) and to get as close as possible to the point of intent.

It’s interesting to consider what a philosophy of Do What I Mean might lead to in a physical environment rather than just phones and PCs, say with consumer hardware.

Freed from interface bureaucracy, you want to optimise for capturing user intent with ease, expressiveness, and resolution

I’ve talked before about... voice, gesture, and gaze for everything.

But honestly as a vision you can’t do better than Put-That-There (1982!!!!) by the Architecture Machine Group at MIT.

One observation is that I don’t think this necessarily leads to a DynamicLand-style programmable environment; Put-There-There works as a multimodal intent interface even without end-user programming.


Edited:    |       |    Search Twitter for discussion