How Translate Audio works
Learn how Translate Audio recognizes speech, translates it, and generates voiced audio in your chosen language.
Written By Umakhan Magomedov
Last updated 4 days ago
Translate Audio takes audio from your device, a link or a messenger, recognizes the speech, translates it, and generates a voiced audio in the target language.

When to use
Understand a voice message from Telegram, WhatsApp or another messenger
Get a translation of a podcast, interview or audio recording
Transcribe and translate a lecture or meeting recording
Listen to the translated content as audio, not just read it
What you can upload
Formats: MP3, M4A, WAV, OGG, FLAC, AAC, OPUS, WebM. Video files are also accepted: the audio track is extracted automatically.
Max file size: 10 MB
Sources:
File from your device or gallery
Audio shared from a messenger: Telegram, WhatsApp, Viber, iMessage
Link to YouTube, Instagram or TikTok, or a direct audio/video URL
How to run
Open the Translate Audio tool from the Tools tab.
Choose a source: tap Choose to pick a file from your device, tap Paste link to import from a URL, or share audio directly from a messenger app.
Recognition starts automatically. The source text appears in the Source text tab as the audio is processed.
The app switches to the Translation tab once recognition is done. Select the target language at the top if needed.
The translated text and a voiced audio file appear in the Translation tab. Tap play to listen.
ℹ️ To share a voice message from Telegram or WhatsApp, open the message, tap Share and select VocaLingo. See How to share audio from messengers for step-by-step instructions per app.

What you get
After processing, results are organized into three tabs:
Source text: the original speech transcribed as text, with a mini audio player for the uploaded file.
Translation: the translated text and a playable audio file in the target language. You can edit the translation and regenerate the audio if needed.
Summary: a short summary of the translated content, generated on demand.

Source text

Translation
Results are saved to History automatically after translation. You can reopen any past result from the history icon without uploading the file again.
How much it costs
The tool uses tokens for three steps: speech recognition, text translation, and audio generation. The cost of audio generation depends on the voice provider selected in Settings. For the full pricing breakdown, see Token pricing for each tool.
Voice provider settings
Tap the settings icon in the top right corner to choose how the translated audio is generated:

MiniMax and Heygen clone the voice from the original audio, so the translation sounds closer to the original speaker. Heygen generally produces the most natural result but takes longer and costs more.