scroll to top arrow or icon

Too scruffy for Zoom? Send in the AI

Too scruffy for Zoom? Send in the AI
Midjourney

Have you ever had to get in front of a camera, but you really, really didn’t want to? Maybe you were too tired, too lazy, too disheveled to film something that day. What if a proxy could handle that for you? Well, now that’s possible.

Using Synthesia, an AI-powered video tool, I created a virtual avatar of myself. It’s essentially a digital puppet constructed from my skin, with invisible strings that carefully lift my eyelids and my eyebrows, open and close my mouth to align with the words I want it to say. My ventriloquism is commanded by a text prompt – a string of words I have written for this virtual Scott to say aloud.

Synthesia is a British startup founded in 2017 by a global cohort of researchers from Stanford, University College London, Technical University of Munich, and Cambridge who have raised $156 million in venture capital. It’s a pricey tool — starting at $22 a month, with a $67 a month tier getting you more features and hours of video, and custom pricing for enterprise use — but the kind people at Synthesia allowed me to test it out for free.

Alright, my avatar will take it from here:

Synthesia

There’s a common term in science fiction and tech criticism called the “uncanny valley,” a phenomenon that occurs when humans see something that seems nearly human. It evokes an eerie feeling, one I felt watching the fake version of myself speak on screen.

Everything with Synthesia seems nearly right. My voice sounds nearly right, and my face nearly moves like it should when mouthing the words I wrote. But it’s not quite there yet, and that disparity could mean the difference between success and failure. Having an avatar you can effectively deploy for a sales presentation is great — but one that simply creeps out your clients is a waste. (The company also offers hundreds of premade avatars you can use if you don’t want to appear, in any form, “on camera.”)

But this is the simple, at-home version. It takes 10 minutes to film — I followed a script and recorded it at my kitchen table — and Synthesia had it ready for me a day later. Once you record a video using your avatar, it generates in mere minutes.

There’s a studio version too that costs $1,000 per year on top of a subscription. You can go to one of the company’s partner studios in Europe or North America and get an improved expressive avatar with a transparent background that you can drop into any presentation. It uses AI to read your text prompt and match the emotion it thinks you want to convey to your avatar’s face and voice.

On a Zoom call, Alexandru Voica, Synthesia’s head of corporate affairs and policy, walked me through the product’s many features and showed me a preview of where the technology is going. He said the company is almost exclusively focused on enterprise solutions for businesses, intending for the technology to be used for training videos, sales pitches, and marketing material. That said, he’s seen some consumer uses too including a social media account that used the avatars to make history-focused videos.

To prevent deception and misinformation, Synthesia has strict content standards. It doesn’t allow profanity, hate speech, or misinformation. “We’re not a marketplace of ideas. We don’t pretend to be a social media company. We’re pretty much an enterprise-focused video solution platform, therefore we don’t need to necessarily have these philosophical debates about harmful content and what’s misinformation and what’s not misinformation. We’ve set very robust rules in place,” Voica said. It doesn’t even allow you to record news content unless you’re a news organization with an enterprise subscription. And it checks that every avatar created is filmed by the person it claims to be to prevent nonconsensual deepfakes. That way, the content moderation happens at the point of creation, rather than trying to stop its distribution.

Synthesia, Voica maintains, is for work rather than personal use. That’s a different tone than many generative AI companies trying to prove their worth to consumers. Later this year, Voica said, Synthesia is releasing a choose-your-own-adventure platform for video creation that allows viewers to personalize the content they receive.

But crossing that uncanny valley — for the at-home avatars, at least — will be key for the company’s success. Readers of this newsletter will recall a few months ago when I tested out the ElevenLabs voice cloning technology, which I gave high marks.

Synthesia performs nearly as well for audio — it’s slightly more robotic and unnatural, but still very good. But, the person you see on the screen needs to seem either fully human or fully AI — and, while the technology may improve, nearly human might not be good enough.

GZEROMEDIA

Subscribe to GZERO's daily newsletter