blog

Voice Cloning AI

Mohamed Kamal

CEO and Cofounder of Unbias

August 18, 2023
Insights

It’s fascinating how fast technology evolves, and today I’d like to discuss an intriguing trend that has caught the attention of music enthusiasts worldwide: AI-generated music impersonations. This phenomenon has taken social media by storm, and it all revolves around voice-cloning models that generate celebrity-soundalike vocals. This article might take you 3 minutes to read.

At the core of this process is a specific AI model called SoftVC ViTS (Soft Voice Conversion), or Sovits for short. The model’s task is to understand and replicate the unique elements of an artist’s voice to create a new vocal track that closely resembles the original.

The SoftVC ViTS model is based on advanced deep learning techniques, and uses neural networks to analyze the minute nuances of an individual’s voice. It’s not merely the pitch or rhythm that’s being mimicked; the model learns the spectral characteristics, the dynamic variations, and even the emotional expressions embedded within the voice, creating a digital replica that can be transferred into new performances.

The task begins by acquiring audio samples of the artist’s voice. These samples are essential as they provide the unique characteristics from which the model will work. Once gathered, the voice cloning model is trained to understand and replicate the tone color of the original voice. The AI delves into the intricacies of the voice, learning how to reproduce the signature timbre, pitch, and rhythm.

After this learning phase, a new vocal performance is recorded, and the AI model applies its knowledge to map the singer’s timbre to the performance. The result is a vocal track that sounds strikingly similar to the artist’s voice, yet entirely AI-generated.

This synthesized voice is then mixed with a backing track using any Digital Audio Workstation (DAW), resulting in a complete musical piece that echoes the voice of the cloned artist. Listeners are captivated by this uncanny resemblance, achieving millions of listens before possibly getting taken down. For those who are technical, you can try the open source model here.

However, this innovation has sparked debates and controversies within the music industry. Protective of their treasures, recording industry giants have reacted with measures to block or control the usage of AI in music, leading to an ongoing discussion about ethics, ownership, and authenticity. While most of the open source models are being taken down, there is no clarity on proper use case for this technology yet. For now, it’s a fun signal processor (until it’s your voice that is cloned).

AI’s role in music is not isolated, but a part of a broader evolution that continues to challenge the recording industry’s control over distribution. From audio cassettes in the 1970s to AI in our present time, technology continues to shape our relationship with music.

Traditional musicianship is finding in AI an opportunity to explore new creative avenues, much like past technological advances inspired new artistic paths. AI’s impact on music is yet to be fully understood, but what’s clear is that it opens a new frontier in sound and creativity.

Unbias is building an AI product with marketing capabilities. Join our waitlist here to get early access.

blog

Voice Cloning AI

Mohamed Kamal

In-Depth: Predictive Streaming

Voice Cloning AI

Intuition And Data