Why AI Agents Are Not Ready to Get Real Jobs Done

Introduction

Hello, I decided to write this brief, straightforward article to share my real-world experiences after spending 6 months developing solutions with AI agents. I hope it can provide you a general overview if you intend to start a project with agents.

What are AI agents?

AI Agent, it may sound strange if you are not familiar with the development tools of AI technologies or didn’t have time to follow these improvements. AI agents are expanded versions of large language models that are specialized for specific tasks. They can use tools, call predefined functions and APIs, or work together to complete a given task. They are kind of a complementary part of LLMs which can provide you better memorization, training iteration for better results, and a more extended knowledge base.

A Brief Overview of the AI Agent Ecosystem

There are many different Agentic AI tools that have emerged in the last few months. I’m writing this article on 12/2/24 and the current ecosystem is as described above.

Everything mentioned above can be categorized under Agent Frameworks & Vertical Agents. For instance:

Enhance memory functionality with tools like Mem0.
Build your marketing team with Crew AI.
Test its capabilities using LangSmith.
Work with PDFs using embeddings powered by Chroma.

Many agent frameworks or agent hostings come with their long-term memory, service connections (like Zapier), and storage solutions for Retrieval-Augmented Generation (RAG).

Secrets of How Big Companies Use

AutoGen-like big frameworks belong to Microsoft. Big tech companies are supporting open-source AI agent projects financially. Even, there are a few reports that big tech companies like Microsoft and Nvidia integrated AI agents into their companies. However, I couldn’t find any detailed information on the internet about how big tech companies use these technologies.

LLMs Don’t Fit Well with Existing Frameworks Needs

Post Most AI agent systems are not the most efficient solutions for real-world use cases or at least when you use them in the same way as their examples. Let me explain the main problem I noticed in my AI agent development experience with an example.

Imagine, we want to create a Copywriter Agent, and naturally, one of its skills would be writing blog posts. If we break down the process of writing a blog post, it could look something like this:

Agent: Copywriter Agent

Responsibilities:

- We should have a starting point (our topic)
- Starting with creating an impressive Headline
- Creating subtitles
- Creating a tag structure according to topic importance and relations (like subtitles)

To achieve a perfectly written blog post, we need to complete each step in a detailed and structured way. However, neither GPT-4o nor Claude Sonnet models are fully capable of defining these steps effectively. The results are often too generic and far from being practical for real-world applications.

Even though GPT-4o-like models are designed with reasoning capabilities and the ability to break tasks into smaller steps, their performance declines significantly when multiple tasks are assigned simultaneously. This leads to a decrease in the quality of the output, making them less reliable for complex, multi-step processes.

Agent: Blog post Headline Creator

Responsibilities:

- Creating engaging headlines
- Use some patterns like "5 Ways to", "Here is Why"

In brief, since current LLMs are not capable of effectively processing highly complex prompts behind the scenes, most Agentic framework examples fail to produce optimal results. These frameworks are expected to improve as LLMs become more capable of handling and reasoning through more intricate prompts.

Proven Results: Real Use Cases Where I Succeeded (ish)

I have worked with some companies that wanted to automate some of their processes. We conducted some research and development with them to implement AI agent solutions. Before sharing the list of use cases, I want to remind you that there should be an approval process for almost all agent jobs to be done before taking action to prevent AI chaos. Here are the 3 most common use cases we implemented for their companies:

Researching potential customers
Using specific coding tools
Personalized offer templates

Summary

As a contributor to several Agent Frameworks, I’m genuinely excited about their potential. These AI agent frameworks can still be useful for generating generic results, but for more in-depth work, you often need fine-tuning, Retrieval-Augmented Generation (RAG), or similar improvements on your model. For truly deeper and nuanced tasks, however, the human brain remains indispensable.

I have to say that this is fully human-crafted and does not use any AI agent—or maybe it’s written by a well-designed AI agent to test if it works or not, who knows.

Stay tuned for my next post if you are an AI enthusiast. I will be writing about how to get more deeper and nuanced results while working with AI agents.

Are you using AI agents in your projects? If so, what kind of real-world use cases have you implemented? Feel free to share your ideas in the comments below.