So, where are the humans supposed to be, exactly ...

I am a dubbing scriptwriter. In February 2024, someone from an AI dubbing company told me I’d be out of my job in a year. It’s now May 2025 and I’m still working. Maybe he meant, in two years? In a decade? I don’t know.

What I do know is that the rhetoric around AI in the creative industries has changed in that year.

The idea that was floating around around ’23, ’24, was that “content” could be “dubbed” into every language in the world at the push of a button. There would be automatic translation and those lines would be read by a synthetic voice and bingo. Job done. As I’ve said in other places, this idea stemmed from the fact that people who produced “dubs”, who had no idea about dubbing and the dubbing workflow, were talking to people who produced “content”, who also had no idea about dubbing and the dubbing workflow.

I put these words in quotation marks, because from my perspective as a dubbing scriptwriter – someone to whom words matter – they need to be put into context. From the perspective of a creative, they show exactly what is wrong with the idea of AI Dubbing as a concept. I do not translate content. I recreate a story. Voice actors do not read lines. They recreate a character. Dubbing is not post-production. It’s the creation of a new original. What many of the early samples of AI Dubbing showed wasn’t even dubbing. It was voice-over.

Yeah, I know. Time to get out of lala-land. Not everything is Wes Anderson and Christopher Nolan (why are those two directors, famous for a handmade, low-tech approach to filmmaking the first ones that pop into my head as great cinematic artists of our time? Never mind …). I have worked on plenty of mediocre, mass-produced films and series that I myself would have no problem calling “content.” If people who want to sell AI applications to me tell me that AI will relieve me of routine, mundane, boring stuff and free my mind for the important creative work, meant that avalanche of mindless, zombiefying, soul-crushing crap that is flooding the streamers and the YouTubes, I tend to agree. Great. AI Dubs are here and they are here to stay and if you ask me, go ride that wave. I don’t even care if you call it dubbing or not. You can have it. I don’t want to be the human in the loop in that operation.

But if I look at this kind of content, which is what AI Dubbing is being advertised with these days (plastic kid programming, low-budget “made in someone’s basement” stuff), saying that a full AI dub can be done in a year from 2024, which means, NOW, turned out to be false advertising.

Yes, synthetic voices sound very good, but the fact remains that they don’t come from the body and soul of an actor. AI dubbing for theatrical has massive problems, as the dub of “Black Dog” has shown. It wasn’t lip-sync, part if it wasn’t dubbed at all (the desperate yelling and shouting of an old lady to prevent the dogcatchers from taking her dog was left in Mandarin). The rendition of emotion was atrocious (in the same scene, a girl’s heart wrenching wailing over the dog being carted away sounded like that announcement in a parking garage that tells you what level you are at). The sound mix was random, close-ups sounding like they were coming from inside a trash can, voices from far away suddenly sounded so close that I was startled sitting in my cinema seat.

And even here, the dubbing script was not produced by an AI program, but by a human.

Is this what the “human in the loop” is all about? That mysterious being that was being touted throughout 2024 as the new creative job that AI dubbing has invented? Yes, the rhetoric around AI in the dubbing industry had shifted. Although to people who work with language, whose daily business it is to deal with the images that phrases invoke, “human in the loop” is dangerously close to the idea of the hamster wheel, it does sound better than “humans clean up the mess”. Because it was very obvious that – contrary to the big promises – the people that were supposed to be out of work in a year are still very much needed.

Automatic transcription needs to be checked, especially when it comes to names and accents. Automatic translation for spoken dialogue that has a relation to an image is still atrociously bad (try putting an excerpt from an as-rec dialogue of a movie into DeepL …). Translated scripts can be adjusted for length, but forget about the complicated gymnastics that are necessary for them to be really lip sync. And if you go the other way and adjust not the text, but the image, the results are simply eerie. Voice production either emulates the cadence of the original, lacking native-level prosody. The voices are just sound-alikes, they don’t recreate the vocal performance of the original or adjust it to the target culture.

So, humans are needed at all elements of the workflow to produce something that comes close to what’s increasingly being calls “traditional dubbing” (a terminology shift that has something of the renaming of “Star Wars” into “A New Hope”). And if you need someone you’re nice to them.

Which is why, throughout 2024, the rhetoric around AI in the creative industries and in the media also changed. Whether due to the need for humans, or due to the fact that humans resist being made redundant, it stopped being “AI will replace you” (with the more diplomatic proponents of this idea adding, “and you can go on doing something nice with your life”). It became “AI will help you.”

By 2025, it sounds like this: “AI enhances and supports – it doesn’t replace human judgement. We only use AI if it serves the idea, the story or the team, not just for the sake of it” (British TV Channel 4’s “AI Principles”). The CEO of Crunchyroll, Rahul Purini, recently said in an interview with Forbes, “We are not considering AI in the creative process, including our voice actors.”

If AI is used, it’s as a tool. No more human in the loop. It’s human in the cockpit. Creatives have been promoted from hamster to co-pilot.

In my industry, this would mean a shift from “AI dubbing” to “AI in dubbing.” And that sounds great. It sounds exactly what I’ve been hoping for. Except for one thing. This idea assumes that creatives in dubbing have agency. That we are in control of our tools. That we can choose them.

Instead, and even before AI has truly gained a foothold in the dubbing process, those tools are being thrown at us. I feel like a tennis player facing a ball machine that’s gone hyperspeed. With every project that I work on, I am being forced to use a different tool, a new cloud dubbing platform, a new matrix. Over many years as a professional, I have honed my own tools – and I’m not talking about my brain. I have a carefully calibrated setup of monitors and keyboards. I have a set of hotkeys and macros. I have a color-coded system for sentences that I’m leaving for now but will be revising during proofreading, separating lip-sync issues from content questions.

But within just the last year, I have had to throw almost all of this out the window (well, not the monitors …). I have had to work with “tools” that were not only not developed for me, my particular workflow, or even for dubbing scriptwriters. Most of these “tools” were originally developed for subtitling. They don’t let me scroll the video smoothly. Or they don’t let me decouple the image from the text that I am writing while I am writing it. They don’t let me annotate or leave notes to my co-creators. They don’t allow for any meaningful QC. They overwrite things that should be information (like having to overwrite the translation with my script – where is the director or the QCer going to be able to check my script against the translation and the original?) I could go on.

And only do all those “tools” work just about as well as giving a conductor a baseball bat instead of a baton, every single project that I work on has different ones. It’s like switching from stick-shift to automatic, from left-handed to right-handed driving, and back, every day, and everyone who has ever driven a car knows how many things have to be committed to the automatic part of your brain for the drive to go smoothly and for you to be able to sing along to the radio while at the same time not crash into a truck.

It’s not language that is the barrier in my work. It’s the “tools” that are a barrier to my creativity.

In order for tool to work for me, I have to have agency over the tool. The Channel 4 guidelines previously quoted go on to say, “We want to use AI to remove inefficiencies, free up time, and protect the space where real creative thinking happens. We want to use AI to reduce friction and speed up the tasks we all find mundane, because we see it as a tool to support our greatest assets – our people and our creativity.” That sounds awesome, but who is the “we”?

As long as I am not able to define what I deem to be “mundane”, I am not part of the “we”. As long as I do not define the space where “real creative thinking happens”, I am not part of the “we”. There is a reason why creatives have in the last one, two years, called for active participation in AI development and governance. Very few of us oppose innovation. All of us welcome things that help us be more creative. But until we truly are part of the conversation, phrases like “humans in the driver’s seat” are empty rhetoric.

Notes:

Channel 4 AI guidelines: https://assets-corporate.channel4.com/_flysystem/s3/2025-05/Channel%204%20AI%20principles.pdf

Interview with Crunchyroll CEO Rahul Purini: https://www.forbes.com/sites/robsalkowitz/2025/04/08/inside-crunchyrolls-latest-plans-to-expand-the-anime-universe/