How Speech to Text works

Upload audio, record in the app or paste a link to get a full transcript and a short summary.

Written By Umakhan Magomedov

Last updated 3 days ago

Speech to Text transcribes spoken audio into text. Upload a file, record directly in the app or import from a link, and get the full transcript in seconds.

When to use

Transcribe a meeting, lecture or interview recording
Convert a voice note into text for editing or searching
Get a transcript of a YouTube video or podcast
Prepare a text version of audio content for further analysis

What you can upload

Formats: MP3, M4A, WAV, OGG, FLAC, AAC, OPUS, WebM. Video files are also accepted: the audio track is extracted automatically.

Max file size: 10 MB

Sources:

File from your device
Record audio directly in the app
Link to YouTube, Instagram or TikTok

How to run

Open Speech to Text from the Tools tab.
Add audio: tap Choose to upload a file, Record audio to record, or Paste link to import from a URL.
Processing starts automatically. The transcript appears in the Text tab as it is recognized.

What you get

Results are organized in two tabs:

Text: the full transcript. You can copy the entire text or select parts of it.
Essence: a short summary of the audio content, generated on demand. Useful when you need the key points without reading the full transcript.

How much it costs

Speech to Text spends tokens for speech recognition. The Essence summary uses additional tokens when generated. For the full pricing breakdown, see Token pricing for each tool.

VocaLingo

How Speech to Text works

When to use

What you can upload

How to run

What you get

How much it costs

Frequently asked questions

When to use

What you can upload

How to run

What you get

How much it costs

Frequently asked questions

Does it detect the language automatically?

Can I record directly in the app?

What is the Essence tab?

Where are the results saved?

Related articles