How Speech to Text works

Upload audio, record in the app or paste a link to get a full transcript and a short summary.

Written By Umakhan Magomedov

Last updated 4 days ago

Speech to Text transcribes spoken audio into text. Upload a file, record directly in the app or import from a link, and get the full transcript in seconds.

When to use

  • Transcribe a meeting, lecture or interview recording

  • Convert a voice note into text for editing or searching

  • Get a transcript of a YouTube video or podcast

  • Prepare a text version of audio content for further analysis


What you can upload

Formats: MP3, M4A, WAV, OGG, FLAC, AAC, OPUS, WebM. Video files are also accepted: the audio track is extracted automatically.

Max file size: 10 MB

Sources:

  • File from your device

  • Record audio directly in the app

  • Link to YouTube, Instagram or TikTok


How to run

  1. Open Speech to Text from the Tools tab.

  2. Add audio: tap Choose to upload a file, Record audio to record, or Paste link to import from a URL.

  3. Processing starts automatically. The transcript appears in the Text tab as it is recognized.


What you get

Results are organized in two tabs:

  • Text: the full transcript. You can copy the entire text or select parts of it.

  • Essence: a short summary of the audio content, generated on demand. Useful when you need the key points without reading the full transcript.


How much it costs

Speech to Text spends tokens for speech recognition. The Essence summary uses additional tokens when generated. For the full pricing breakdown, see Token pricing for each tool.


Frequently asked questions

Related articles