How Speech to Text works
Upload audio, record in the app or paste a link to get a full transcript and a short summary.
Written By Umakhan Magomedov
Last updated 4 days ago
Speech to Text transcribes spoken audio into text. Upload a file, record directly in the app or import from a link, and get the full transcript in seconds.
When to use
Transcribe a meeting, lecture or interview recording
Convert a voice note into text for editing or searching
Get a transcript of a YouTube video or podcast
Prepare a text version of audio content for further analysis
What you can upload
Formats: MP3, M4A, WAV, OGG, FLAC, AAC, OPUS, WebM. Video files are also accepted: the audio track is extracted automatically.
Max file size: 10 MB
Sources:
File from your device
Record audio directly in the app
Link to YouTube, Instagram or TikTok
How to run
Open Speech to Text from the Tools tab.
Add audio: tap Choose to upload a file, Record audio to record, or Paste link to import from a URL.
Processing starts automatically. The transcript appears in the Text tab as it is recognized.
What you get
Results are organized in two tabs:
Text: the full transcript. You can copy the entire text or select parts of it.
Essence: a short summary of the audio content, generated on demand. Useful when you need the key points without reading the full transcript.
How much it costs
Speech to Text spends tokens for speech recognition. The Essence summary uses additional tokens when generated. For the full pricing breakdown, see Token pricing for each tool.