At this year’s GALA WorldReady conference, the keynote speaker, Dutch tech innovator Deborah Nas talked about language as the core competence in the world of apps and AI. We don’t need to type commands into a machine anymore, we just tell it what we want to know or what we want it to do.
I had just come back home to Berlin from taking care of my very old parents for a week. Less than 12 hours before I listened to the keynote, I had been helping my dad get into bed for the night. This takes about 45 minutes My dad, for context, will turn 90 this summer. He has very severe Parkinson’s, which in his case means that his movements are painfully slow, that his limbs are stiff and that the nerves in his fingertips have lost their sensation. He has also lost most of his vision. He can eat on his own but that is about the extent of things that he doesn’t need assistance with.
However, most of the time his brain works just fine and he is still curious about the world. He understands more about artificial intelligence than most of my colleagues and he still believes that a letter to the editor can fix things in the world.
To be honest, he does mumble. And – like all Parkinson’s patients – he depends a lot on established routines. It’s not easy for him to learn something new. Everything has to be in the right place, and if the carers that rotate in and out of the house like the figures on a Bavarian cuckoo clock all had the same name, that would be perfect.
Now, I would think that a good dose of AI and voice-controlled machinery would make his life a lot easier. He would have a machine to which he can speak and which can tell him when the next Champions League game starts and on what TV channel it will be shown. It won’t be able to propel his wheelchair forward, but it could turn on the TV for him, and tell him who the players are on the pitch.
We did set him up with an iPhone. And he does know how to activate Siri. But have you ever tried to teach an 89-year-old Parkinson’s patient how to “talk” to Siri? How to say “Hey Siri” first and then wait for her to say her slightly passive-aggressive “hm-hm?” Then to ask the question – if he hasn’t forgotten it between thinking of it, finding his phone, and then activating Siri? To ask the question not too fast, because then Siri isn’t finished saying “hm-hm?” and not too slow because if he’s waited too long, Siri has gone to sleep again? Then ask the question several more times, more loudly, or more carefully each time, because Siri hasn’t understood what he wants? Then get Siri’s answer, pointing him to a bunch of websites, from which he cannot select because he can’t see what they are?
It took my dad so long to communicate his questions to Siri, because Siri doesn’t take a guess at what he COULD mean. At least not the kind of guess that I can take, having known my dad for almost sixty years now. And because Siri isn’t able to understand my dad when he mumbles, or when he has to think about his question for a moment because he can’t remember a name or a word. Siri doesn’t adapt to my dad. He had to adapt to her.
My dad has slowly given up trying to get more out of Siri than the temperature outside (not that he ever goes there anymore …), or the current state of his battery charge.
I recently installed ChatGPT on his phone, thinking that this would be a more conversational approach. Of course, my dad cannot SEE the app symbol on his phone and even if he did, his fingers can’t be relied on to touch the symbol to open the app reliably. And I wasn’t able to figure out how to voice-activate it, without it getting activated by my dad calling out to the carer that he needs to go to the bathroom. Or how to teach my dad what to say without saying it myself first, and then ChatGPT reacting to ME and not to HIM.
But I won’t need to, because it became quite clear very fast that ChatGPT was, well, too chatty. His answers were endless. My dad didn’t know how to interrupt him. And ChatGPT didn’t know how to stop and look at my dad and see that he wasn’t understanding what the machine was saying, like I can do, and then start again. Or ask him whether he was even listening or whether his mind had drifted. Mental overload for my dad in a matter of minutes. Plus, ChatGPT had no idea whether my dad’s phone was fully charged.
Neither Siri, nor ChatGPT (granted, we didn’t give the latter a whole lot of time), were of much help in giving my dad access to the world out there. An otherwise able person, someone younger, someone who was blind from birth, or someone whose fingers find their target reliably, might have managed. But if you rely purely on voice control alone, you have to learn how to talk like the machine, and not the other way around.
Knowing that you need to say “Hey Whoever” to talk to them isn’t an automatic thing. When my dad wants to talk to my mom and she’s sitting right next to him, he just starts talking. No “Hey Brigitte” necessary unless she’s forgotten to out her hearing aids in.
And this is the difference that we, who have grown up with voice-activated and voice-controlled technology, or we, who are learning to use it while we are still young and have an agile mind and control of most of our senses and faculties have learned: The difference between a normal conversation and a conversation with a machine. A machine that cannot read body language, that cannot see an ironic twinkle in your eye or an exasperated rolling of the eyes. A machine that will not understand, unless you’re a really skilled voice actor, whether you mean that you’ll “pass ON” a note or “PASS on” a note. A machine that will not ignore the fact that – like my dad – someone might say “kinetic energy” when he actually means “artificial intelligence”, or understand that sometimes when he says my name, he actually means my mom.
People make mistakes all the time when they speak. I hate to think what a voice-controlled car would have made out of my friend in high school, who had a serious issue with telling left and right apart, and it wasn’t even consistent. Often, when he said “right”, he meant “left”, but just when you thought you’d understood the trick, he would say “right” and actually mean it.
People say things they don’t REALLY mean. Just think how often you’ve said something like “if X says this one more time, I’ll strangle him” or “shit!”. I wouldn’t be happy with any device who takes me by my – literal – word.
With my voice alone, I have killed and maimed, I have invaded countries and robbed banks. I have said many unkind things about many people. I have yelled extremely nationalistic things while watching soccer games (sorry, Italy!). I have said stuff like, “oh, I love banana bread” just to make someone happy – please Siri stop recommending banana bread to me every time I ask her for a recipe!!!
If my dad asked ChatGPT when the next Champions League game is on TV, ChatGPT might tell him that it’s not shown on free TV but on Sky. Helpful as it is, ChatGPT might also tell my dad that he’d need a subscription to watch the game. My dad might say to himself “man, that would be nice”. Let’s just hope that ChatGPT understands wishful thinking and refrains from setting up my dad with one of Sky’s outrageously expensive monthly sports super-dooper plus packages …
No matter what we do to adjust our language to make the machine understand. Spoken, lived language, together with the context in which it’s spoken and the body that speaks it, might not be the right input tool when it comes to matters more important than “is my iPhone fully charged?”
