IBM Watson QA + Speech Recognition + Speech Synthesis = A Conversation With Your Computer

Back in November I released a demo application here on my blog showing the IBM Watson QA Service for cognitive/natural language computing connected to the Web Speech API in Google Chrome to have real conversational interaction with a web application.  It’s a nice demo, but it always drove me nuts that it only worked in Chrome.  Last month the IBM Watson team released 5 new services, and guess what… Speech Recognition and Speech Synthesis are included!

These two services enable you to quickly add Text-To-Speech or Speech-To-Text capability to any application.  What’s a better way to show them off than by updating my existing app to leverage the new speech services?

So here it is:!

By leveraging the Watson services it can now run in any browser that supports getUserMedia (for speech recognition) and HTML5 <Audio> (for speech playback).

(Full source code available at the bottom of this post)

You can check out a video of it in action below:

If your browser doesn’t support the getUserMedia API or HTML5 <Audio>, then your mileage may vary.  You can check where these features are supported with these links: <Audio>getUserMedia

Warning: This is targeting desktop browsers – HTML5 Audio is a mess on mobile devices due to limited codec support and immature APIs.

So how does this all work?

Just like the QA service, the new Text To Speech and Speech To Text services are now available in IBM Bluemix, so you can create a new application that leverages any of these services, or you can add them to any existing application.

I simply added the Text To Speech and Speech To Text services to my existing Healthcare QA application that runs on Bluemix:

IBM Bluemix Dashboard


These services are available via a REST API. Once you’ve added them to your application, you can consume them easily within any of your applications.

I updated the code from my previous example in 2 ways: 1) take advantage of the Watson Node.js Wrapper that makes interacting with Watson a lot easier and 2) to take advantage of these new services services.

Watson Node.js Wrapper

Using the Watson Node.js Wrapper, you can now easily instantiate Watson services in a single line of code.  For example:

[js]var watson = require(‘watson-developer-cloud’);
var question_and_answer_healthcare = watson.question_and_answer(QA_CREDENTIALS);
var speechToText = watson.speech_to_text(STT_CREDENTIALS);[/js]

The credentials come from your environment configuration, then you just create instances of whichever services that you want to consume.

QA Service

The code for consuming a service is now much simpler than the previous version.  When we want to submit a question to the Watson QA service, you can now simply call the “ask” method on the QA service instance.

Below is my server-side code from app.js that accepts a POST submission from the browser, delegates the question to Watson, and takes the result and renders HTML using a Jade template. See the Getting Started Guide for the Watson QA Service to learn more about the wrappers for Node or Java.

[js]// Handle the form POST containing the question‘/ask’, function(req, res){

// delegate to Watson
question_and_answer_healthcare.ask({ text: req.body.questionText}, function (err, response) {
if (err)
console.log(‘error:’, err);
else {
var response = extend({ ‘answers’: response[0] },req.body);

// render the template to HTML and send it to the browser
return res.render(‘response’, response);

Compare this to the previous version, and you’ll quickly see that it is much simpler.

Speech Synthesis

At this point, we already have a functional service that can take natural language text, submit it to Watson,  and return a search result as text.  The next logical step for me was to add speech synthesis using the Watson Text To Speech Service (TTS).  Again, the Watson Node Wrapper and Watson’s REST services make this task very simple.  On the client side you just need to set the src of an <audio> instance to the URL for the TTS service:

[html]<audio controls="" autoplay="" src="/synthesize?text=The text that should generate the audio goes here"></audio>[/html]

On the server you just need to synthesize the audio from the data in the URL query string.  Here’s an example how to invoke the text to speech service directly from the Watson TTS sample app:

[js]var textToSpeech = new watson.text_to_speech(credentials);

// handle get requests
app.get(‘/synthesize’, function(req, res) {

// make the request to Watson to synthesize the audio file from the query text
var transcript = textToSpeech.synthesize(req.query);

// set content-disposition header if downloading the
// file instead of playing directly in the browser
transcript.on(‘response’, function(response) {
if ( {
response.headers[‘content-disposition’] = ‘attachment; filename=transcript.ogg’;

// pipe results back to the browser as they come in from Watson

The Watson TTS service supports .ogg and .wav file formats.  I modified this sample is setup only with .ogg files.  On the client side, these are played using the HTML5 <audio> tag.

Speech Recognition

Now that we’re able to process natural language and generate speech, that last part of the solution is to recognize spoken input and turn it into text.  The Watson Speech To Text (STT) service handles this for us.  Just like the TTS service, the Speech To Text service also has a sample app, complete with source code to help you get started.

This service uses the browser’s getUserMedia (streaming) API with on Node to stream the data back to the server with minimal latency. The best part is that you don’t have to setup any of this on your own. Just leverage the code from the sample app. Note: the getUserMedia API isn’t supported everywhere, so be advised.

On the client side you just need to create an instance of the SpeechRecognizer class in JavaScript and handle the result:

[js]var recognizer = new SpeechRecognizer({
ws: ”,
model: ‘WatsonModel’

recognizer.onresult = function(data) {

//get the transcript from the service result data
var result = data.results[data.results.length-1];
var transcript = result.alternatives[0].transcript;

// do something with the transcript
search( transcript, );

On the server, you need to create an instance of the Watson Speech To Text service, and setup handlers for the post request to receive the audio stream.

[js]// create an instance of the speech to text service
var speechToText = watson.speech_to_text(STT_CREDENTIALS);

// Handle audio stream processing for speech recognition‘/’, function(req, res) {
var audio;

if(req.body.url && req.body.url.indexOf(‘audio/’) === 0) {
// sample audio stream
audio = fs.createReadStream(__dirname + ‘/../public/’ + req.body.url);
} else {
// malformed url
return res.status(500).json({ error: ‘Malformed URL’ });

// use Watson to generate a text transcript from the audio stream
speechToText.recognize({audio: audio, content_type: ‘audio/l16; rate=44100’}, function(err, transcript) {
if (err)
return res.status(500).json({ error: err });
return res.json(transcript);

Source Code

You can interact with a live instance of this application at, and complete client and server side code is available at

Just setup your Bluemix app, clone the sample code, run NPM install and deploy your app to Bluemix using the Cloud Foundry CLI.

Helpful Links

  • rabimba

    I still have trouble running the demo successfully in any browser (I tried both Chrome and Firefox updated versions).

    I do get the microphone notification icon but nothing gets transcribed once I speak….

    • Andrew Trice

      Hi Rabimba, can you elaborate? What OS are you using, and do you get any error messages? Can you check the JavaScript console for any error messages? I just tested again, and it appears to be running for me OK in both Chrome and Firefox on OS X. Thanks

      • rabimba


        I tried using it both in Ubuntu (12.10) and Windows 8.1 with both Chrome (V42 64bit) and Firefox.

        One of the console logs I got from chrome is here

        • Andrew Trice

          OK, thanks. I’ll have to test it in other OSes. I just tried again, and no problems here on OS X. No errors shown in that log either, just normal debug messages. Any chance your microphone is muted at the OS level?

          • rabimba

            That really should not be the case!
            But I just tried it in my iMac and it worked like a charm :O (in chrome)

          • Andrew Trice

            Yeah, it should be identical across platforms. Not sure why it’s doing that, whether it’s a browser/platform issue, or issue in the code.

          • rabimba

            I just checked it again in WIndows. Not working.

          • Andrew Trice

            New version posted. Should work in any browser supporting getUserMedia and .ogg for HTML5 Audio (Chrome and Firefox).

          • rabimba

            Is it the same url?

            I still can’t get it to work in chrome/windows.

    • Andrew Trice

      I am running this without issue in Chrome on Windows. You sure your mic is working?

  • Daniel & Janet Kimura

    I’m looking for a partner to develop some tools using Watson. Do you know anyone interested in joining a startup?

  • Peter

    Thanks Andrew – I really like this example, and was myself trying to combine these three fundamental Watson services into a similar demo. I have built apps leveraging the Q&A, and then I played with the Speech services. You have combined all three very well into a really good app. But I have two issues. 1. I demonstrated the Q&A healthcare service to a Doctor and had the Doctor ask questions and the results were apparently less than usable. How can the results be improved and how can a user feedback to Watson to improve the accuracy? Cognitive systems need some feedback so how is Watson getting this? 2. In my browser the mic does not record the speech very well. It is not capturing and converting what I say into the same question in text. It seems to be quite unreliable and inaccurate. Is this a fault of my browser, laptop mic or other? I am using Chrome on my WIndows Lenovo laptop.
    But many thanks for pulling this demo together – I really like it. Just a couple of things to build on.
    Thanks very much.

    • Andrew Trice

      Hi Peter, Thanks for the feedback.

      1: The data corpus used in this service is designed to only answer very specific kinds of questions. You can read about the types of question is is trained to answer in detail at:

      So, if a physician or patient trying to “test” the system asks “why does my back hurt?”, Watson won’t be able to answer that b/c it is not trained to answer that kind of situation. This free/demo service pulls from the open/online data sets, and is trained only for relatively simple questions. In order to have more meaningful interaction, the QA service needs to be trained on a more complex data set, and trained to understand different kinds of questions. This is all very possible, but would require engagement with IBM. Watson is being used in the medical field with much success, but those applications are with much larger data sets, with a more highly tuned/specialized algorithm. You can see more examples of this online, go to the “Healthcare” section at:

      2: I think it’s a laptop/mic issue. I use an external mic, which seems to work pretty well, but the internal mic on my laptop does not work nearly as well. I’m sure the team is working on ways to improve extraction and dealing with low signals background noise, but I can’t say with any certainty b/c I am not involved with that team.

  • Brian L Donaldson

    Hi Andrew, I would like to start working on a two app ideas using Watson with Real Estate data. Right now Watson Q&A on Bluemix is only for Travel and Healthcare data. What steps can I take to get Real Estate data into Watson for my apps?

  • Jannis Busch

    Hey Andrew, unfortunately your application doesnt give me any output. Wehn i click “Ask” nothing happens. Could you fix that? (using chrome browser on Windows 7 device)

  • Daniel Comp

    Andrew – I thought you might appreciate an ATTA-BOY of recognition for the superb work and your (even better) attitude of collaboration. Thanks for creating the videos and sharing the tools. I’m a veteran web developer (since ’97), but on my first efforts with Bluemix. Thanks for heading me on this journey with some enthusiasm and naive confidence!