Getting AI to complete tasks

AI’s ability to generate text or images now feels like it is common place. It comes up in every discussion and people have started to ask: “Did you use AI to write that?”. No surprise here frankly. Incumbents like Microsoft and Notion are moving fast, and have integrated AI natively into their products.

Here’s the thing though: we are just scratching the surface. I’m excited about the prospect of AI executing complex tasks. To do so, it needs to analyse a request, determine the right action and then execute it. Imagine being able to say “Conduct due diligence on the customer service market”, or “Help me prepare for my next sales meeting”.

My excitement is in part driven by the extent to which AI democratises technology. The language of LLMs and AI is natural language, not code. To get AI to execute complex actions, you need to learn how to communicate with AI. Interestingly, many of the techniques in this essay are based off of how humans execute actions.

Today’s essay is about techniques that help LLMs break down and execute complex tasks.

How we execute tasks

It’s worth thinking about how we execute tasks before jumping into how AI might do them. Consider the task of preparing for your next sales meeting. Roughly, it involves the following steps:

  1. Check your calendar to figure out who your next meeting is with
  2. Understand the previous interactions with this prospect by looking through your CRM
  3. A quick google search to understand any recent news about the prospect
  4. Prepare a sales presentation for the prospect based on their needs, size and the value you think you can add

We complete tasks by breaking them into sub-tasks. Some tasks can be done in parallel (e.g. the CRM look up and the Google search). Other tasks might be sequential. For example, can’t take on tasks 2 - 4 without knowing who your next meeting is with.

As you will see, many of the techniques borrow heavily from how humans execute tasks.

Chain of thought prompting

Chain of thought prompting instructs AI to think through a series of steps before answering the questions. It’s almost as if you are explaining logic and reasoning to a person so that they can solve a riddle themselves.

You could do this in one of two ways:

Zero-shot

In this case, you indicate within the prompt that the LLM should think step by step.

image

Interestingly, following the latest version of ChatGPT, this seems to have no effect. Meaning that the model has learnt to think step by step without the explicit instruction. The instruction is still useful if you want to model to reveal how it reasons through the question.

Few-shot

The second method to implement chain of thought prompting is to provide examples. The examples help the LLM figure out how to reason through the problem. Remember, LLMs like ChatGPT are predicting the next character under the hood (with some gross oversimplification).

As you will see in the other techniques, the “zero shot” and “with examples” method are the two most common techniques used to instruct an LLM to behave in a certain way.

Self ask

Self ask (paper here) instructs the LLM to ask to ask follow up questions if required. Again, think through a situation where you are solving a problem. Asking follow up questions is one of the best ways to get more details about the problem before jumping into a solution.

Because we interact with LLMs using chat like interfaces, self-ask works well. I’ve been using ChatGPT a lot while programming. It helps me discover new techniques and pick from a range of solutions for each problem. One of the most effective methods I’ve found for this is to chat and ask follow up questions on the problem. It seems impactful for AI to do this proactively before recommending a solution for you.

ReAct

ReAct is a prompting technique for complicated tasks by asking them to identify reasoning and actions (paper here). According to the paper, “reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with and gather additional information from external sources such as knowledge bases or environments.”

This framework essentially asks the model to think and take actions. Based on the output of actions, it will update its state of the world and it’s plan. Again, this mimics how we humans might break down tasks. If you pay close attention, you will notice that you update your “plan” to complete a task based on new information. Consider preparing for your next sales meeting. If you discovered that the company recently acquired another company, you might change your sales pitch.

ReAct attempts to do the same.

image
image

To close

To give you some sense of how quickly the AI landscape is changing, chain of thought prompting in the form above is already out of date.

One of the reasons AI is moving so fast is because natural language, and not code, is becoming the way we talk to these models. Some of my friends who I consider to be experts at NLP have been blown away by the power of off the shelf LLMs like GPT4.

We are the cusp of a Cambrian explosion. If you’re curious, I recommend two things:

a) Using ChatGPT as much as you can

b) Reading up on the latest techniques such as the above

The latter has been really useful for me because you quickly realise that you’re probably using these techniques in some form without explicitly choosing to do so. And if you’re wondering why this makes sense, it’s because these machines are built to mimic the way we think.