There’s a voice on the internet that sounds just like me. In a way, it is me.
ElevenLabs, a startup software company, uses artificial intelligence to simulate natural-sounding language. If you need a realistic voiceover for a television commercial, short film, or an audiobook, ElevenLabs lets you use one of the human-ish voices in its library. While this technology isn’t ready to put voice actors out of business just yet — there are some definite hiccups — the technology is surprisingly effective.
In January, New Hampshire voters received a robocall from Joe Biden. Except it wasn’t Joe Biden. It was an AI-generated fake made by a political ally of then-Democratic challenger Dean Phillips. It was made with ElevenLabs.
In response, ElevenLabs banned the account responsible and in February changed its policy around impersonating public figures. While the company already banned impersonation of private figures without their consent, or anything intended to harm others, ElevenLabs added a list of “no-go” voices, specifically outlawing impersonation of presidential or prime minister candidates in the US and UK with the intention of adding additional “languages and election cycles.”
With elections in 64 countries this year, and OpenAI planning its own ElevenLabs-like tool too, the threat of mass confusion around elections feels palpable. AI has helped former Pakistani Prime Minister Imran Khandeliver a campaign address from prison, but it also caused problems for Indonesian political candidate Anies Baswedan, the victim of audio pretending to be a political backer chastising him. In addition to spreading disinformation, these tools could also be used to break voice authentication measures used by banks and other financial institutions.
Naturally, I had to give it a try.
This browser does not support the video element.
I paid $22 for the mid-tier version of ElevenLabs, which got me “professional voice cloning,” about two hours of downloadable AI-generated text-to-speech per month, and high-quality audio. To generate my voice clone, I had to upload 10-plus minutes of myself speaking, though it recommended 30 for the best results. After 12 minutes of gabbing into my USB microphone about the results of the NFL Draft, I hit upload.
The software took about two hours to do its magic before I was alerted that my cloned voice was ready to chat. Even just typing in, “Hey, this is Scott Nover,” I was amazed how much it sounded like me. I typed some more. The more text I included, the more the software struggled to sound human. There were unnatural pauses, odd diction, and my detached voice seemed upset about something.
This browser does not support the video element.
ElevenLabs lets you choose from a few different models, and it offers sliders to adjust the voice’s tone, similarity to the uploaded recording, and sassiness. I tinkered with the settings until I found the best ratios and walked away pretty impressed with what I created. It even let me change the language — my polyglot editor Matthew Kendrick says the Spanish sounds good, with a mild Mexican accent and a few odd word choices. The Mandarin is a little less convincing — mostly because white dudes usually can’t hit their tones so accurately.
This browser does not support the video element.
If you’ve been following this newsletter, you’ll know we’ve been covering the rise of generative AI and have warned, in particular, about the rise of AI-generated audio.
Soon, AI voices will be everywhere. Some, like my homemade ScottBot, won't matter much. But some—say, a homemade clone of the Ayatollah Khamenei issuing a fatwa or Vladimir Putin inciting violence in Ukraine—could cause major problems on the world stage. And every government should be prepared.