How Text to Speech works

Type or paste text, choose a voice and generate an audio file in seconds.

Written By Umakhan Magomedov

Last updated 4 days ago

Text to Speech converts written text into a spoken audio file. Type or paste your text, pick a voice and tap Generate to get the result in seconds.

When to use

  • Create a voiceover for a presentation, video or podcast

  • Listen to translated text as audio instead of reading it

  • Practice pronunciation by generating audio of a foreign-language phrase

  • Generate a narration track with your own cloned voice


How to run

  1. Open Text to Speech from the Tools tab.

  2. Go to the Text tab and type or paste your text. You can also tap Paste to paste from clipboard or use voice input.

  3. Go to the Voice tab and select a voice. You can choose from OpenAI voices, standard MiniMax voices or your own Custom Voices.

  4. Tap Generate. The audio file appears in the Result tab.

ℹ️ For OpenAI voices, you can add voice instructions on the Voice tab to adjust tone, pace and style. For example: "Speak slowly and warmly" or "Use a formal news presenter style."


What you get

The Result tab shows an audio player with the generated file. From there you can:

  • Play the audio directly

  • Download it to your device

  • Share it through the system share sheet

  • Generate again with different settings if needed

Each result is saved to History automatically.


How much it costs

The cost depends on the text length and the voice type chosen. OpenAI voices are billed per minute of estimated audio. Standard and Custom MiniMax voices are billed per character. Custom voices activated for the first time include a one-time setup fee of 150 tokens. For the full pricing breakdown, see Token pricing for each tool.


Frequently asked questions

Related articles