Our secret framework for building AI agents

June 5, 2024

Hey,

Most people don't build AI agents not because they can't use the no-code builder tools, but because their thinking is wrong.

They don't know what to build.

They don't know what it should do.

So they feel overwhelmed, and anxious, instead of "being in the zone" and building them in hours.

Hungarian-American psychologist Mihály Csíkszentmihályi called this the flow state, when a person is fully immersed in the activity and they lose their sense of time or self-conciusness and create effortlessly.

The flow state happens when the current skills match or are slightly below the challenge at hand.

If the challenge is bigger than the skills, the person feels anxious and frustrated.

If the challenge is less than the skills, the person feels boredom.

So in this email, I'd like to get you over the initial frustration, and explain to you the mental model that we use to build these agents for ourselves or our clients.

It's called the ITO framework, and it's really simple.

ITO stands for Input, Task, Output.

Every single task can be described with this framework, and we do this before even opening Make or Zapier to build things.

It's part of the Automation Roadmap that the client confirms and we get started building.

1/ Inputs

Every task has some initial input. Describe these in as much detail as possible.

For example, if this email was written by AI, it would have an initial input that's an idea, maybe a few bullet points:

ITO framework email:
- explain importance
- flow state vs. frustration
- input
- task
- output
- wrap up/summary

If you are automating tasks in a CRM, then the data from the CRM is the first input:

Input: Aid application request details from CRM.

Then as the task goes on, the subsequent steps might use the outputs of previous steps as inputs, or they might bring in new information.

2/ Tasks

Now this is the hardest one to define for most people. This is the black box, and most people just think "yeah put in AI" and it's done but it doesn't work that way.

You need to describe the task with at least one sentence. We're not writing AI prompts here, or anything, just simple plain English words.

For example, if the input is the second example from above, the task might be:

Input: Aid application request details from CRM.

Task: Analysis and categorisation of the type of application.

In this case the reason we're doing this is because we want the AI agent to do different things based on what type of application is received.

Defining the task might not be easy at first. If you are struggling to clearly define what the task is, you either:

Have to break it down to smaller task because it's still not an elementary task but a complex task of multiple tasks
Have a call with the client/team and get them to explain what it is they really want.

3/ Outputs

This is basically the 5th key element of a prompt: "What is considered good".

Again, it might be a simple few word description, it might be longer. In our example above, this is how it would be:

Input: Aid application request details from CRM.

Task: Analysis and categorisation of the type of application.

Output: Categorised request details

As you can see, this was a simple Text-Categorization AI task from the Task-Modality Matrix.

In this case, we know that what we want as an output is simply categorised request details.

You might need to provide a few examples, especially if humans did this task previously, so the AI will also understand what to do.

If you have these ideal outputs, you will then be able to put a few of them at the end of the AI prompt when you build the workflow.

For example, in one of the automations I build for a PR agency, one of the tasks was to get the name of the publishing portal into a Sheet cell. This is how I defined it and provided the ideal output:

medium: Which site published this? It’s usually the main domain, like if the URL is “https://index.hu/article-title”, the correct answer is “Index.hu”

This is a clear cut example of a one-shot prompting, where I provide ONE correct question-answer pair to the model.

LLMs like GPT-4 are really good at zero-shot prompting, where you don't provide any correct question-answer pair. In my case, I did it to ensure consistency in the responses.

Some models, especially smaller language models might require more than one correct example to perform well, that's called Few-shot prompting.

And if you are fine tuning a model for your use case, you are doing a many-shot prompting, and basically train the model on a bigger pool of data.

Our secret framework for building AI agents

1/ Inputs

2/ Tasks

3/ Outputs

Recommended Product

Prompt Master AI Course