With all the hype about “AI automating your job away,” there is a lot of talk about the “what” but not the “how.” Agents are today’s best shot at getting an AI to do useful work for you, and in this series I want to go over the latest advances in the open-source AI agents space, domain by domain.
What are AI Agents?
Simply put, an AI agent is a foundational model (like GPT 4 or Llama 2) that has been specially prompted with a task to complete, a set of human-designed tools to complete it with, and can execute prompts multiple times to generate a solution.
This diagram, taken from REACT: Synergizing Reasoning and Acting in Large Language Models by researchers at Google and Princeton University shows the difference between direct prompting and the ReAct prompting framework that many agents use.
How are Agents different from ChatGPT?
1. Agents are not conversational.
Unlike ChatGPT, you can’t hold a conversation with an AI agent. Agents typically use models that are fine-tuned to follow instructions rather than hold a conversation and often are initialized without a conversational memory.
2. Agents are task-specific
The most successful agents are closed-domain tools, while chat models tend to be generalists. An agent designed to write software will write a very poor new article, and vice-versa.
3. Agents take their time
While chat models undergo great effort to reduce the time it takes to reply to a user, agents do not bind themselves to such goals. An agent can take much longer, sometimes up to an hour, to complete its task and return an answer.
4. Agents automate, not inform
When designing chatbots, I talk a lot about information vs automation features, where information features are tasks the chatbot can complete just by telling the user something (like giving them the opening hours fora store), and automation features are tasks the chatbot can complete by interacting with the real world (like making a reservation at a restaurant). Agents are designed to take on challenging automation features and are capable of much more than just telling the user something.
Why Software Development?
Software development has seen some of the most advanced agents pop up in the past months, and I think there are a few key reasons for this:
1. Fine-tuned foundational models already exist
OpenAI released its Codex model in August 2021- a full year before ChatGPT. Code was one of the first domains to have a current-gen LLM fine0tuned for it, and the models keep getting better and better. Starcoder, released in May 2023, is currently the best coding LLM and can write code in over 80 programming languages.
2. Established project pipelines
Development teams, especially in business, have been practicing Continuous Development / Continuous Integration or CI/CD for years. The process is essentially a way to standardized how code from an individual contributor can be systematically checked, tested, and then automatically deployed. A great thing about the CI/CD process is that it does not care who wrote the code- AI or human- it only cares that the code is correct and passes tests.
3. Automation-ready industry
Developers have been automating their jobs away since VIM macros. While other industries might be reluctant to give automation a try, software developers have been embracing it with passion.
4. Agent builders are already domain experts
The adage “write about what you know” applies to AI agent developers as well as creative authors. Because AI Agents need to be programmed by developers, and developers are familiar with their own problems and shortcomings, a happy alignment forms where the ones building the tool are also the ones most knowledgeable about its potential use cases.
For each of the agents, we’ll start off with the following info: * Developer- who was the lead developer or company behind this agent? * Base LLM- which foundational model powers this agent? * Languages- what programming languages does the agent output code in? * Developed in- what programming languages did the developer use when building the agent? * Try it out- where you can try out a hosted demo of the agent.
Let’s get started!
Webapp Factory
- Developer: Julian Bilcke for HuggingFace
- Base LLM: WizardCoder-15B
- Languages: Javascript, HTML, CSS, Node.js
- Developed in: Typescript, Node.js
- Try it out here
GPT Engineer
- Developer: Anton Osika
- Base LLM: OpenAI GPT3.5 or 4
- Languages: Python, HMTL, CSS, Javascript, Node.js
- Developed in: Python
- Watch a demo
IX
- Developers: Peter Krenesky and Ikko Eltociear Ashimine
- Base LLM: OpenAI GPT 4
- Languages: Python, Javascript, Typescript, HTML, CSS
- Developed in: Python
- Watch a demo
Smol Developer
- Developers: Smol AI
- Base LLM: OpenAI GPT 4
- Languages: Python, Javascript, HTML, CSS, Node.js
- Developed in: Python
- Watch a demo
AIder
- Developers: Paul Gauthier
- Base LLM: OpenAI GPT 3.5⁄4)
- Languages: Python, HMTL, CSS, Javascript, Node.js
- Developed in: Python
- Watch a Demo
Further Work
While all of these agents have demonstrated sufficient code-generation capability, they all struggle to enforce quality throughout their generated content. By their nature, an LLM cannot effectively check its own work, so these systems seek other methods to review generated code, such as:
- Asking the LLM to write tests for generated functions, and then running the test
- Using a different LLM to check the first LLM’s work
- Waiting for a human to review its code and then incorporating their feedback
Only the third option can guarantee that only good code makes it through, but this applies to humans as well. A robust CI/CD pipeline will always include at least one code review before new code is deployed, so while this runs counter to agents’ promise of fully-automated software development, it integrates with the existing development processes. I look forward to AI junior developers to that can automatically raise pull requests for new code they generate, as this will free up human developers to take on more creative tasks.
❤️ Gordy