31.9 C
Casper
Thursday, June 27, 2024

The Mona Lisa Rapping? New Microsoft AI Animates Faces From Photos

Must read

Move over, Mona Lisa smile. Microsoft’s AI brings faces in photos to life with speech and emotion! This new tech has entertainment and educational potential, but it also raises concerns about misuse.

The Mona Lisa can now do more than smile, thanks to new artificial intelligence technology from Microsoft.

Last week, Microsoft researchers detailed a new AI model they’ve developed that can take a still image of a face and an audio clip of someone speaking and automatically create a realistic-looking video of that person speaking. The videos—which can be made from photorealistic faces, cartoons, or artwork—are complete with compelling lip syncing and natural face and head movements.

In one demo video, researchers showed how they animated the Mona Lisa to recite a comedic rap by actor Anne Hathaway.

Outputs from the AI model, called VASA-1, are both entertaining and a bit jarring in their realness. Microsoft said the technology could be used for education, “improving accessibility for individuals with communication challenges,” or potentially creating virtual companions for humans. But it’s also easy to see how the tool could be abused and used to impersonate real people.

It’s a concern beyond Microsoft. As more tools to create convincing AI-generated images, videos and audio emerge, experts worry that their misuse could lead to new forms of misinformation. Some also worry the technology could further disrupt creative industries from film to advertising.

For now, Microsoft said it doesn’t plan to release the VASA-1 model to the public immediately. The move is similar to how Microsoft partner OpenAI is handling concerns around its AI-generated video tool, Sora: OpenAI teased Sora in February, but has so far only made it available to some professional users and cybersecurity professors for testing purposes.

“We are opposed to any behavior that creates misleading or harmful content about real persons,” Microsoft researchers said in a blog post. But they added that the company has “no plans to release” the product publicly “until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

Making faces move

Microsoft’s new AI model was trained on numerous videos of people’s faces while speaking. It’s designed to recognize natural face and head movements, including “lip motion, (non-lip) expression, eye gaze and blinking, among others,” researchers said. When VASA-1 animates a still photo, the result is a more lifelike video.

For example, in one demo video set to a clip of someone sounding agitated, apparently while playing video games, the face speaking has furrowed brows and pursed lips.

Also Read: The Future of Business Intelligence: 10 Trends Shaping the Data-Driven Landscape

The AI tool can also be directed to produce a video where the subject is looking in a certain direction or expressing a specific emotion.

Looking closely, there are still signs that the videos are machine-generated, such as infrequent blinking and exaggerated eyebrow movements. However, Microsoft said it believes its model “significantly outperforms” other similar tools and “paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.”

More articles

Latest news