More than twenty years ago, I wrote an essay that I called “Translating Hollywood”. My need to clarify some things about dubbing was triggered by a brilliant keynote on “American Pop Culture Hegemony” at the German Association of American Studies by Reinhold Wagnleitner in 2001, and by the call of a German academic, Bernd Ostendorf, a few years earlier, to “look at the role that the dubbing industry plays in the process of Americanization in a less dismissive fashion.”
At this point, I had worked as a translator in the dubbing industry for several years, and was just moving into dubbing scriptwriting. But it had irked me for some time that academia ignored the fact that the German public looks at films in the dubbed version, and that any idea that they might get from this particular form of American pop culture was maybe not the same thing as what America was thinking that it was exporting. So I put pen to paper and righteously explained a few things.
I had not thought about my essay for quite some time, until I came across a recent study called “Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing“, and was struck by a sentence toward the end: “Human dubbers display less respect for isochrony and especially lip sync than is suggested by qualitative literature, while being surprisingly unwilling to (…) sacrifice translation quality to hit other constraints.”
Wait a minute, I thought. That sounds familiar. And I dug out my old essay, in which, twenty years before, I had written, “Pisek, Pruys, and Whitman-Linsen [authors of academic volumes on dubbing] might be surprised that the dubbing scriptwriters I know spend significantly more conscious effort in coming up with the idiomatic sentence, while the creation of lip-synchrony takes part largely subconsciously, (…) (because) the audience is willing to suspend its disbelief as long as the German dialogue sounds natural and idiomatic.”
So basically from just a little knowledge of the industry, I said something for which the authors of the newer study had to analyze “319.57 hours of video from 54 professionally produced titles” on Amazon, amounting to “201,246 dialogue lines, from 688 episodes.”
Mind you, I have no problem with having my gut feeling proven by such extensive research.
But interestingly, the study contains several more duh-moments. For example, the authors “observe that on-screen dubs are more isochronic than off-screen, but to a surprisingly small degree” (my emphasis). I’ve seen this elsewhere in the literature, that people seem to assume that as soon as a character’s lips are not on-screen, or further away, every concern for synchronicity goes out the window. Isochrony means the length of an utterance, which is directly related to things like pauses between sentences, or parts of a sentence. That’s part of the script and the performance and the directing and that’s why dubbers respect that, off-screen or not.
I like how everything I’ve been doing – and teaching – is confirmed by a study of hundreds of minutes of dubbed material, but it’s also frustrating that it takes that much effort to be able to say something that any dubber could have told you if you’d just asked.
The study also finds evidence for “emotion and/or emphasis transfer from source to target.” Duh. What else is dubbing than the effort to transport the original emotions to the dubbed film?
By no means would I want to deny the study the respect any research effort of this size deserves. It’s a fascinating read. But it’s a shame that a study that proves all of my hunches about what dubbers do in real life appears in the context of developing automated dubbing. The study wants to identify where humans are different from machines – or, as the authors phrase it, “address weaknesses in current automatic dubbing approaches” – and to ultimately improve the machines, rather than assist the humans to become better.
As the authors say in their abstract, the influence of “channels other than the words of the translation” seems to be what causes this reticence to sacrifice quality at the altar of lip-sync (context is key, as everyone knows, another duh-moment). But what is perceived as lip-sync is extremely subjective. If I understand them correctly, the authors seem to be aware of that and try to get around this problem by, “rather than relying on the video tracks, (using) the notion of a viseme.” But ignoring the video in a study of a mode of audio-visual transfer seems to be somewhat counter-intuitive.
It should be emphasized that there is an endlessly debated and highly subjective tolerance level of asynchronous text and performance on the part of the viewers. That’s where taste and individuality come into play, rogue elements that are hard to grasp by any rational, quantitative means.
And if this play between a very subjective perception of lip-sync and decisions of taste – based on an awareness of the unwritten – is what differentiates a human dub from a machine dub, and if we know that many people all over the world appreciate dubbing as their main form of consuming films, then the industry doesn’t need robot scriptwriters. It already has what it needs. They are called “humans”.
PS – my essay doesn’t seem to be available online. But I’ll send you a pdf if you make human contact and send me a pm.
