This article was written with ideas and help from my friend Faye Li

At their Dev Day conference in Singapore, nearly every presentation from the OpenAI team demonstrated projects that featured GPT manipulating user interfaces. Depending on the conversation, it could place pins on a map, show search result cards, or create graphs from database queries.

OpenAI has yet to release these demos, but a team has partially recreated the maps demo here.

How Were They Manipulating the UI?

In short: Function calling. It’s not a sexy agentic framework, but it doesn’t have to be. This meme from Hamel Hussain of Parlance Labs sums it up nicely:

Bell curve meme; in the middle "We need agents! Let's use such-and-such framework" while the ends say "Do the simple thing. LLM + Function calls"

The frontend app had elements that could change in pre-determined ways. The OpenAI team liked using React and React components for these, but there is no reason why other frameworks or UI kits wouldn’t also work. These manipulations were wrapped as tools shared with the LLM during each function call. Both OpenAI’s completion and Realtime APIs send tool call commands alongside text messages. A tool call might look like this:

tool: {
    name: "Drop Pin",
    params: [
        {arg: "lat", value: "-41.056"},
        {arg: "lon", value: "78.094"}
    ]
}

Then the frontend has an event listener that listens for these tool calls and then triggers the UI component to change itself:

// Server-side for react components
  useEffect(() => {
    const handleToolCall = (event) => {
      const { tool } = event.detail;

      switch (tool.name) {
        case 'Drop Pin':
          handleDropPin(tool.params);
          break;
        case 'Change UI Color':
          handleChangeColor(tool.params);
          break;
        default:
          console.log('Unhandled tool call:', tool);
      }
    };
// Browser-side for others
  window.addEventListener('ai-tool-call', (event) => {
    const { tool } = event.detail;

    switch (tool.name) {
      case 'Drop Pin':
        handleDropPin(tool.params);
        break;
      case 'Change UI Color':
        handleHighlightElement(tool.params);
        break;
      default:
        console.log('Unhandled tool call:', tool);
    }
  });

These features power magical demos: A user asks for nearby coffee shops and a map with pins appears. They ask which of them also serve sandwiches and have more than 4 stars and the map of pins turns into a list of cards. The LLM has great capabilities to present information in novel and exciting ways.

After everybody picked their jaws off of the floor at demo day, the first question they asked was “How can I get my app to do this? And the answer, as always, is “It depends.” Not every use case can benefit from LLMs manipulating the UI, and not every UI can be manipulated to the degree that we see in demos. To help break down the possibilities, I’ve divided the spectrum into three levels:

This could have been an email

Join my mailing list to get content like this directly in your inbox

    I won't spam you, promise. Unsubscribe at any time.

    Level 1: Repainting a Lego Block

    The lowest UI manipulation tier involves small configuration changes: * Highlight a UI element * Change background color * Play a sound * Drop a map pin * Open a new tab

    These manipulations are small yet effective. They’re easy to add to any product without building a dedicated LLM app. Most clients can leverage these features without much planning or need to change their existing product.

    Level 2: Building a Lego Set

    This tier configures and deploys pre-built UI elements using contextual data: * Create restaurant cards * Generate bar charts of restaurant ratings

    Most clients aim for this level. It requires significant preparation: * Deciding which UI elements to expose to the LLM * Designing interfaces to accept any UI element combination * Engineering prompts to help the LLM use available elements

    The results can transform user interaction. At their Singapore event, OpenAI showed an Application Tracking System using GPT-4 mini. It parsed resumes into a noSQL database and created charts showing candidates’ experience and skills. Users could ask, “Show me the top 10 most experienced candidates” or “What are the 5 most common undergrad degrees?” The LLM would generate the correct query, select an appropriate chart, and display it.

    Another good example is the bot showcased in the below talk by Hamel Hussain and Emil Sedgh where they describe a chatbot for a Real Estate agency client. The bot was able to take photos and information about a property and create Instagram and other social media posts highlighting the best parts.

    Level 3: Building Lego Blocks

    The highest tier lets LLMs create and configure custom UI elements: * Vercel’s v0 * Bolt.new * Claude and ChatGPT Artifacts

    Unlike Level 2, which requires developers to scope UI manipulation tightly, this level lets LLMs write HTML or JavaScript to create 100% custom UI elements. Current uses focus on app development agents. Future applications might include creating personalized interfaces for each unique user.

    These systems use more than function calling. LLMs call system-level tools like “read file” and “write file” and execute framework tools such as vercel build to compile projects.

    If Level 2 seems magical, these are miraculous. Ask v0 to “make a breakout clone game” and receive a working app instantly.

    Screenshot from the breakout copy game I asked v0 to make

    Most Chatbots Remain at Level 0

    Despite OpenAI’s UI manipulation demos, these features haven’t spread widely. Many chatbots sit on top of websites without integration, keeping UI manipulations out of reach. Teams that can move fast and implement even the most basic UI manipulations will gain an advantage over slower-moving products.

    ❤️

    Gordy