Audio-only mode and Enhanced Cloning
Learn when to use Audio only mode and Enhanced Cloning in Translate Video, and how each setting affects quality and cost.
Written By Umakhan Magomedov
Last updated 4 days ago
Translate Video has two optional settings that change how the translation is processed: Audio only and Enhanced Cloning. Both are available in the settings panel before you tap Translate.
Audio only
In Audio only mode, the tool translates the spoken audio and overlays it on the original video. The video visuals are not modified and no lip sync is applied.
Use Audio only when:
The video does not show a face or the speaker is not visible
Lip sync accuracy is not important for your use case
You want a faster result at the same cost as standard mode
The video has multiple speakers at once (lip sync would be inaccurate anyway)
ℹ️ Audio only does not reduce the token cost compared to standard mode. Both cost 5 tokens/second. The difference is quality of lip sync, not price.
Enhanced Cloning
Enhanced Cloning uses a more accurate voice model to better match the original speaker. The dubbed voice is closer to the original in tone and character.
Use Enhanced Cloning when:
Voice authenticity matters (interviews, personal content, documentary)
The speaker has a distinctive voice you want to preserve in the translation
You are translating content where the speaker is on camera and viewers know the original voice
How to enable
Open Translate Video from the Tools tab.
Choose your file or paste a link.
Tap the Settings icon (gear icon) in the top right.
Toggle Audio only or Enhanced Cloning.
Tap Translate. The updated token estimate reflects your choice.