That robot sounds just like you

First, OpenAI tackled text with ChatGPT, then images with DALL-E. Next, it announced Sora, its text-to-video platform. But perhaps the most pernicious technology is what might come next: text-to-voice. Not just audio — but specific voices.

A group of OpenAI clients is reportedly testing a new tool called Voice Engine, which can mimic a person’s voice based on a 15-second recording, according to the New York Times. And from there it can translate the voice into any language.

The report outlined a series of potential abuses: spreading disinformation, allowing criminals to impersonate people online or over phone calls, or even breaking voice-based authenticators used by banks.

In a blog post on its own site, OpenAI seems all too aware of the potential for misuse. Its usage policies mandate that anyone using Voice Engine obtain consent before impersonating someone else and disclose that the voices are AI-generated, and OpenAI says it’s watermarking all audio so third parties can detect it and trace it back to the original maker.

But the company is also using this opportunity to warn everyone else that this technology is coming, including urging financial institutions to phase out voice-based authentication.

AI voices have already wreaked havoc in American politics. In January, thousands of New Hampshire residents received a robocall from a voice pretending to be President Joe Biden, urging them not to vote in the Democratic primary election. It was generated using simple AI tools and paid for by an ally of Biden's primary challenger Dean Phillips, who has since dropped out of the race.

In response, the Federal Communications Commission clarified that AI-generated robocalls are illegal, and New Hampshire’s legislature passed a law on March 28 that requires disclosures for any political ads using AI.

So, what makes this so much more dangerous than any other AI-generated media? The imitations are convincing. The Voice Engine demonstrations so far shared with the public sound indistinguishable from the human-uttered originals — even in foreign languages. But even the Biden robocall, which its maker admitted was made for only $150 with tech from the company ElevenLabs, was a good enough imitation.

But the real danger lies in the absence of other indicators that the audio is fake. With every other AI-generated media, there are clues for the discerning viewer or reader. AI text can feel clumsily written, hyper-organized, and chronically unsure of itself, often refusing to give real recommendations. AI images often have a cartoonish or sci-fi sheen, depending on their maker, and are notorious for getting human features wrong: extra teeth, extra fingers, and ears without lobes. AI video, still relatively primitive, is infinitely glitchy.

It’s conceivable that each of these applications for generative AI improves to a point where they’re indistinguishable from the real thing, but for now, AI voices are the only iteration that feels like it could become utterly undetectable without proper safeguards. And even if OpenAI, often the first to market, is responsible, that doesn’t mean all actors will be.

The announcement of Voice Engine, which doesn’t have a set release date, as such, feels less like a product launch and more like a warning shot.

More from GZERO Media

Fulton County Superior Court Judge Robert McBurney ruled Tuesday that certifying elections is a required duty of county election boards in Georgia, and they’re not allowed to refuse to finalize results based on suspicions of miscounts or fraud.
TNS/ABACA via Reuters Connect

On Tuesday, a judge in Georgia blocked a new rule requiring that election ballots be hand-counted in the state, a change that allies of former President Donald Trump wanted. Opponents of the rule, which the Georgia State Election Board passed in September, said it would cause unnecessary delays in results and lead to avoidable electoral pandemonium.

The Media Viability Accelerator is a free web analytics platform built by Internews and Microsoft on Azure, funded by USAID and Microsoft's Democracy Forward initiative. Using Azure AI, the MVA harnesses the power of big data and machine learning to provide performance insights while ensuring that participants retain control over their data. Through the MVA, media outlets can access a multilingual tool that visualizes performance data and receive actionable insights to improve performance. Read more in Microsoft On the Issues’ latest newsletter.

Palestinians walk during the evacuation of the Jabalia refugee camp and the Sheikh Radwan and Abu Iskandar neighborhoods in the northern Gaza Strip on October 12, 2024.
Mahmoud Issa/Reuters

Israel launched a new offensive in northern Gaza earlier this month, making it even more difficult to get aid in, and the UN’s human rights office warns that the IDF “appears to be cutting off North Gaza completely.”

Prime Minister of Poland, Donald Tusk gestures while speaking during the weekly Ministerial meeting in Warsaw.
Marek Antoni Iwanczuk / SOPA Images via Reuters Connect

Polish Prime Minister Donald Tusk in recent days unilaterally suspended the right to asylum for migrants crossing into Poland from neighboring Belarus.

Andrei Belousov, Russia's Defence Minister, attends a meeting with Zhang Youxia, Vice Chairman of China's Central Military Commission, in Beijing, China, in this still image taken from video released on October 15, 2024.
Russian Defence Ministry/Handout via REUTERS

Russian Defense Minister Andrei Belousov met with China’s top civilian defense official Zhang Youxia on Tuesday in Beijing, where both sides pledged to “continue working closely” to deepen military relations.

Read: “The Empty Space,” by Peter Brook. In this thin volume, first published in 1968, famed director Peter Brook divides theater into its “Deadly,” “Holy,” “Rough,” and “Immediate” forms.

Turkish citizens disembark naval ship TCG Bayraktar carrying people evacuated from Lebanon upon their arrival at a port in Turkey's Mediterranean coastal province of Mersin, Turkey, October 10, 2024.
REUTERS/Umit Bektas

25: Over 25% of Lebanon is facing Israeli evacuation orders, which were expanded to include 20 villages on Tuesday.

Walmart is fueling American jobs and strengthening communities by investing in local businesses. Athletic Brewing landed a deal with Walmart in 2021. Since then, co-founders Bill Shufelt and John Walker have hired more than 200 employees and built a150,000-square-foot brewery in Milford, CT. Athletic Brewing is one of many US-based suppliers working with Walmart. By 2030, the retailer is estimated to support the creation of over 750,000 US jobs by investing an additional $350 billion in products made, grown, or assembled in America. Learn more about Walmart’s commitment to US manufacturing.

In this photo illustration, the Saudi Arabian Airlines (Saudia) logo seen displayed on a smartphone with an Artificial intelligence (AI) chip and symbol in the background.
(Photo by Budrul Chukrut / SOPA Images/Sipa USA)

Sir Edward Byrne, recently named the head of King Abdullah University of Science and Technology in Saudi Arabia, or KAUST, signaled that the institution will prioritize US technology and cut off ties with China if it jeopardizes its access to chips made in the US.