Conversational interfaces

Is a conversational interface the user interface of the future?

Those resistant to change will say no. It’s too slow, and provides a lot of cognitive load on the user. They’re right, it does. The optimists will say chat is the way to go because ChatGPT the fastest growing application of all time.

I’ve been spending a lot of time on this question like everyone else. Here’s what I think:

Our history with Chat

Conversational user interfaces aren’t new. There was a period during 2015 when they were all the rage. If you’re interested in digging deeper, I recommend reading the links in the tweet below. Revisiting these designs is a really interesting exercise.

Here are a few snippets from the articles caught my eye:

  1. “Text is the most socially useful communication technology. It works well in 1:1, 1:N, and M:N modes. It can be indexed and searched efficiently, even by hand.”
  2. The other obvious problem is that this interaction is as strict as the command line: If I type “23& 8ht 23M” it might not work. Natural Language Processing still isn’t good enough, or ubiquitous enough, to power an app that primarily interacts via messaging.”
  3. One of the problems with text interfaces is text entry. Keyboards suck, especially on mobile devices. Typing also introduces a discoverability question: How do you know what words are valid right now, or the right grammar to use? How do you make complex statements?”

The real question

The real question is not whether chat is the interface for the future; it is: what interface suits this customer and use case?

A couple of examples:

If you work in a remote company, talking to colleagues on Zoom or Google Meets is no substitute for real life interactions. This is an opportunity to create value for users. If you could speak to your colleagues as you would in an office, Zoom wouldn’t feel so rough. The answer here is certainly not chat, but it may not be video either. It may be some form of augmented or virtual reality.

If you’ve encountered an issue with something you’ve bought online, you have 3 options: email, chat and phone. Email is asynchronous but it will take a while to have your problem resolved. I typically avoid it because I forget to follow up and it usually takes 3-4 turns before resolution. Chat is frustrating because it takes a couple of minutes to resolve and I can’t do anything else in the mean time. If I do, I tend to forget and never come back to it. Phone is the most annoying because I know I will be put on hold.

Notice the pattern above — my frustration is not with the medium, it’s with the quality of interaction. If I could have my query resolved in under a minute, I’d choose email when I’m working, chat when I’m on public transport (resolve without distributing anyone else) and phone when I’m walking.

Steps matter

The number of steps a user needs to take to complete an action is important. Imagine I want to buy a black running shoe that is size 41.

image

Here are the steps I need to take before I can pay (I tried this):

  1. Go to the website
  2. Type in running shoes
  3. Choose a colour
  4. Choose the size

If you were presented with exactly what you wanted, would you want to go through these steps?

Of course, you will care about certain elements of this journey. For example, looking at several pictures of a shoe before buying. Or, explicitly clicking a button to confirm payment (vs. saying “Buy”). But this doesn’t mean you need a graphical user interface, it means you need a user interface that incorporates those specific elements.

Accessibility

Changing the medium with which you interact opens up new possibilities.

If you’re on a walk and AI was good enough for you to order groceries via voice (it probably is already), it opens up something you couldn’t do before. I have some of my best ideas when I’m walking my dog, I’d love the ability to jot down notes when I’m doing so.

This is important because it forces people to try new modalities. They might quickly find that they actually like it. Anyone walk around the house when they’re in deep thought?

A whole new experience

We’re going to see interfaces we’ve never been exposed to before.

You can listen to an article in someone’s voice. You might have a shopping assistant help you find stuff online. Some day, you might have a physical assistant that helps you clean your house.

Hologram assistant at Heathrow Airport
Hologram assistant at Heathrow Airport

I don’t really know how I feel about these. Some people will prefer them, and other’s wont. Humans are resistant to change. “It does not work” today does not mean “It wont work” 5 years from now.

Ask yourself this: do you feel as comfortable on Facetime or Google Meets compared to 10 years ago? When there’s a new form of interaction, we need time to get used to it.

One of the best examples of this is people sending voice notes vs. texts via Whatsapp. The feature has been available for years but has started to gain traction in the US only recently. For many people, sending voice messages feels like less of a burden, it’s less anxiety inducing and helps them communicate nuance better than with text.

There will also be cultural differences. For example, in parts of Europe, Asia and South America voices notes are extremely common.

A question of when

As an extension of the above, you should really be asking when rather than whether. New modalities unlock new opportunities.

This applies whether you are an AI company or not. Frankly, there will be no such thing as an “AI company” 2 years from now. Everyone will incorporate in one way or the other.

The pace of change will be very use case dependent. I’m likely to change my shopping habits faster than I’m like to change seeing a doctor or sending money.

What’s possible now?

And finally, the most important consideration for what the user interface of the future will look like: what’s possible today that was not possible before?

Here’s a list based on the snippets I quoted earlier from the 2015 Chatbot craze:

  1. Switching modalities (text, audio, video) is easier and better than it has ever been before.
  2. The ability for machines to understand semantics and process natural language is at least 10x better. Probably more.
  3. Writing software is easier than it was 10 years ago.

All of these point to flexible, dynamic user interfaces that do the best job for the user and the use case. Today, we have UIs based on a product. What is the UI was based on the user? What if the UI was based on the specific interaction?

If you’re thinking about unlocking new experiences for your product or business, and have ideas on how to do so, I’d love to chat. Drop me an email at krishna@kili.so.