How Translate Video works
Learn how Translate Video translates the audio in a video file, syncs the dubbed voice and delivers the result in 5 to 15 minutes.
Written By Umakhan Magomedov
Last updated 4 days ago
Translate Video translates the spoken audio in a video and produces a new video file where the dubbed voice matches the on-screen speaker. Processing runs on the server and typically takes 5 to 15 minutes.
When to use
Translate a TikTok, Instagram Reel or YouTube video into your language
Localize an educational or training video for a different audience
Watch a foreign-language documentary or interview without subtitles
Share a video with someone who speaks a different language
What you can upload
Formats: MP4, MOV, AVI, MKV, WebM, M4V
Max file size: 100 MB
Sources:
File from your device or gallery
Link to YouTube, Instagram or TikTok
Direct video URL (.mp4, .mov and others)
How to run
Open the Translate Video tool from the Tools tab.
Tap Choose to pick a file, or tap Paste link to import from a URL.
Select the target language.
Tap Translate. The app compresses and uploads the video, then sends it for server processing.
You can close the app. When the video is ready, it appears in History and you receive an email notification.
ℹ️ Average processing time is 5 to 15 minutes depending on video length. Do not close the app until the upload finishes, then you can safely minimize it.
What you get
The result is a video file with the dubbed audio in the target language. It appears in History with one of these statuses:
From the completed result you can play the video, download it to your device, save it to your gallery or share it.
How much it costs
The cost depends on the video duration and the quality mode selected:
The estimated token cost for your video is shown before you tap Translate. For the full pricing reference, see How tokens work.
Settings
Tap the settings icon before translating to adjust the output:
Audio only: translates the audio track only, without lip sync. Use this when lip sync is not important or the face is not clearly visible.
Enhanced Cloning: doubles the cost but produces a more natural-sounding result with better voice similarity.