OpenAI Unveils 'Voice Engine': Simulates Human Speech with Just 15 Seconds of Audio Samples

Spread the love


OpenAI, known for innovative advancements in AI technology with creations like Sora, its video generator, now introduces 'Voice Engine,' a pioneering voice cloning tool. This amazing audio model accurately reproduces the nuances of human speech, including intonation and unique speech patterns, using a brief 15-second sample of the original voice. Despite eager anticipation, OpenAI chose to keep this new feature under wraps, citing concerns over potential abuse and the proliferation of fake content online.

Remarkable efficiency and accuracy

“Incredibly, our voice engine can generate emotive and biological voices using just a 15-second sample,” the company said in a recent blog post.

Also Read: Microsoft and OpenAI to launch $100 billion AI data center project with 'Stargate' supercomputer

OpenAI's voice engine vs. industry standards

In contrast, existing AI voice platforms such as ElevenLabs typically require longer samples, with their instant voice cloning tool requiring at least one minute of audio for operation. About 10 minutes of continuous talk is recommended for optimal results, especially for professional-grade services.

OpenAI demonstrated the capabilities of the voice engine through various demonstrations, including a poignant example of a voice from a young patient who had lost the ability to speak due to a brain tumor, replicated using an old recording from a school project. The technology allowed her to communicate using her voice, a feat made possible by LifeSpan, a nonprofit affiliated with Brown University's medical school.

Also Read: iOS 18 at WWDC 2024: Features, AI upgrades, launch date, supported devices and more

Additionally, OpenAI revealed partnerships with companies like HeyGen, demonstrating how the voice engine facilitates natural-sounding translations from one language to another.

Also Read: Apple May Soon Offer 'Topographic Maps' On iPhone, Macbook: What It Is And All The Details

According to OpenAI, the voice engine was initially developed in late 2022 and is integrated into ChatGPT's Voice and Read Aloud feature, along with preset voices already available in OpenAI's text-to-speech API. With these latest developments, the company is treading carefully before a wider release.



Source link

Leave a Comment