Tips for the best voice clone quality
Recording environment, speech style, file requirements and fixes for common voice cloning problems.
Written By Umakhan Magomedov
Last updated 4 days ago
Voice clone quality depends on the recording you provide. These tips help you get a natural-sounding result that closely matches your voice.
Recording environment
Record in a quiet room with no background noise, echo or music
Avoid rooms with hard surfaces that create echo (bathrooms, empty rooms)
A small room with soft furnishings (carpet, curtains, sofa) works well
Keep your phone or microphone at a consistent distance from your mouth (15-30 cm)
How to speak
Speak at your natural pace, as you would in normal conversation
Pronounce words clearly without exaggerating
Read the reference text provided in the app — it is designed to capture a wide range of your voice characteristics
Do not whisper or speak unusually slowly: the model learns your natural voice
Aim for at least 30-60 seconds of clean, uninterrupted speech
ℹ️ The reference text in the app is specifically chosen to include a variety of sounds, intonations and sentence structures. Reading it fully gives the model more to work with.
If you upload a file instead of recording
Use a file with a single speaker and no background music or effects
Minimum: 10 seconds. Recommended: 30-60 seconds or more
Maximum file size: 20 MB
Supported formats: MP3, WAV, AAC, OGG
Avoid phone call recordings, heavily compressed audio or recordings with multiple speakers