How Translate Audio works

Automatic speech recognition, translation and summary in Translate Audio.

Written By Umakhan Magomedov

Last updated 3 days ago

Translate Audio takes audio from your device, a link or a messenger, runs speech recognition, translates the text and generates a short summary automatically. You get source text, translation and summary without tapping a Translate button.

When to use

Understand a voice message from Telegram, WhatsApp or another messenger
Get a translation of a podcast, interview or audio recording
Transcribe and translate a lecture or meeting recording
Listen to the translated content as audio, not just read it

What you can upload

Formats: MP3, M4A, WAV, OGG, FLAC, AAC, OPUS, WebM. Video files are also accepted: the audio track is extracted automatically.

Size limits:

Device upload (audio)	100 MB
Paste link	2 GB — server download from YouTube, Instagram, TikTok or a direct URL

Full limits for all tools: Supported file formats, sizes and sources.

Sources:

File from your device or gallery (web, iOS, Android)
Audio shared from a messenger: Telegram, WhatsApp, Viber, iMessage (iOS and Android)
Link to YouTube, Instagram or TikTok, or a direct audio or video URL

How to run

Open Translate Audio from the Tools tab on web, iOS or Android.
Choose a source: tap Choose for a local file, Paste link for a URL, or share audio from a messenger.
Processing starts automatically: speech recognition, then translation, then summary. There is no separate Translate button.
Watch progress on three tabs: Source text, Translation and Summary.
On the Translation tab, tap play to hear the voiced translation. Edit the text or change the target language if needed.

ℹ️ To share a voice message from Telegram or WhatsApp, open the message, tap Share and select VocaLingo. Step-by-step guides: How to share audio from messengers.

What you get

Results are organized into three tabs:

Source text: transcribed original speech with a mini player for the uploaded file.
Translation: translated text and playable audio in the target language. You can edit the translation and regenerate audio.
Summary: a short structured summary of the content, generated as part of the automatic pipeline.

Results save to History automatically. Reopen any past result from the history icon without uploading the file again. Free accounts keep up to 3 entries per tool; Premium has unlimited history. See History: saving, restoring and deleting results.

Recognition, translation and voiceover settings

Tap the Settings icon to choose the speech recognition provider, translation model and voiceover provider. Changes to recognition and translation models apply on the next upload. Voiceover settings affect the next audio generation.

Full details on every option: Translate Audio settings.

How much it costs

The tool charges tokens for speech recognition, text translation, summary and optional voiceover. Costs depend on audio duration, text length and the providers you select. See Token pricing for each tool for tables and examples.

When to use

What you can upload

How to run

What you get

Recognition, translation and voiceover settings

How much it costs

Frequently asked questions

Which languages can I translate into?

Why does the first translation always use Gemini 3 even if I picked another model?

Can I upload a video file?

Can I change the target language after upload?

Does editing the translation affect the audio?

How many history entries can I keep?

Where do recognition and voiceover prices come from?

Related articles