Speech to Text

The IBM Watson Speech to Text service uses speech recognition capabilities to convert Arabic, English, Spanish, French, Brazilian Portuguese, Japanese, Korean, German, and Mandarin speech into text.

This system is for demonstration purposes only and is not intended to process Personal Data. No Personal Data is to be entered into this system as it may not have the necessary controls in place to meet the requirements of the General Data Protection Regulation (EU) 2016/679

Drop an audio file here.

Watson Speech to Text supports .mp3, .mpeg, .wav, .opus, and .flac files up to 200mb.

Transcribe Audio

*Both US English broadband sample audio files are covered under the Creative Commons license.
The returned result includes the recognized text, word alternatives, and spotted keywords. Some models can detect multiple speakers; this may slow down performance.

Voice Model:

Keywords to spot: