With all the hype about “AI automating your job away,” there is a lot of talk about the “what” but not the “how.” Agents are today’s best shot at getting an AI to do useful work for you, and in this series I want to go over the latest advances in the open-source AI agents space, domain by domain.

What are AI Agents?

Simply put, an AI agent is a foundational model (like GPT 4 or Llama 2) that has been specially prompted with a task to complete, a set of human-designed tools to complete it with, and can execute prompts multiple times to generate a solution.

A diagram showing the difference between direct prompting and ReAct prompting

This diagram, taken from REACT: Synergizing Reasoning and Acting in Large Language Models by researchers at Google and Princeton University shows the difference between direct prompting and the ReAct prompting framework that many agents use.

How are Agents different from ChatGPT?

1. Agents are not conversational.

Unlike ChatGPT, you can’t hold a conversation with an AI agent. Agents typically use models that are fine-tuned to follow instructions rather than hold a conversation and often are initialized without a conversational memory.

2. Agents are task-specific

The most successful agents are closed-domain tools, while chat models tend to be generalists. An agent designed to write software will write a very poor new article, and vice-versa.

3. Agents take their time

While chat models undergo great effort to reduce the time it takes to reply to a user, agents do not bind themselves to such goals. An agent can take much longer, sometimes up to an hour, to complete its task and return an answer.

4. Agents automate, not inform

When designing chatbots, I talk a lot about information vs automation features, where information features are tasks the chatbot can complete just by telling the user something (like giving them the opening hours fora store), and automation features are tasks the chatbot can complete by interacting with the real world (like making a reservation at a restaurant). Agents are designed to take on challenging automation features and are capable of much more than just telling the user something.

Why Software Development?

Software development has seen some of the most advanced agents pop up in the past months, and I think there are a few key reasons for this:

1. Fine-tuned foundational models already exist

OpenAI released its Codex model in August 2021- a full year before ChatGPT. Code was one of the first domains to have a current-gen LLM fine0tuned for it, and the models keep getting better and better. Starcoder, released in May 2023, is currently the best coding LLM and can write code in over 80 programming languages.

2. Established project pipelines

Development teams, especially in business, have been practicing Continuous Development / Continuous Integration or CI/CD for years. The process is essentially a way to standardized how code from an individual contributor can be systematically checked, tested, and then automatically deployed. A great thing about the CI/CD process is that it does not care who wrote the code- AI or human- it only cares that the code is correct and passes tests.

3. Automation-ready industry

Developers have been automating their jobs away since VIM macros. While other industries might be reluctant to give automation a try, software developers have been embracing it with passion.

4. Agent builders are already domain experts

The adage “write about what you know” applies to AI agent developers as well as creative authors. Because AI Agents need to be programmed by developers, and developers are familiar with their own problems and shortcomings, a happy alignment forms where the ones building the tool are also the ones most knowledgeable about its potential use cases.

For each of the agents, we’ll start off with the following info: * Developer- who was the lead developer or company behind this agent? * Base LLM- which foundational model powers this agent? * Languages- what programming languages does the agent output code in? * Developed in- what programming languages did the developer use when building the agent? * Try it out- where you can try out a hosted demo of the agent.

Let’s get started!

prompt: a factory that makes clouds, oil paining by MC Escher by OpenAI's DALLE-2

Webapp Factory

prompt: GPT Engineer, clipart by OpenAI's DALLE-2

GPT Engineer

IX, from their Github

IX

Smol Developer

AIder

Further Work

cool AI art, full version of the cover image

While all of these agents have demonstrated sufficient code-generation capability, they all struggle to enforce quality throughout their generated content. By their nature, an LLM cannot effectively check its own work, so these systems seek other methods to review generated code, such as:

  • Asking the LLM to write tests for generated functions, and then running the test
  • Using a different LLM to check the first LLM’s work
  • Waiting for a human to review its code and then incorporating their feedback

Only the third option can guarantee that only good code makes it through, but this applies to humans as well. A robust CI/CD pipeline will always include at least one code review before new code is deployed, so while this runs counter to agents’ promise of fully-automated software development, it integrates with the existing development processes. I look forward to AI junior developers to that can automatically raise pull requests for new code they generate, as this will free up human developers to take on more creative tasks.


❤️ Gordy