Tag Archives: Watson

Say What? Live video chat between iOS & WebRTC with Twilio & IBM Watson Cognitive Computing in Real Time

What I’m about to show you might seem like science fiction from the future, but I can assure you it is not. Actually, every piece of this is available for you to use as a service.  Today.

Yesterday Twilio, an IBM partner whose services are available via IBM Bluemix, announced several new SDKs, including live video chat as a service.  This makes live video very easy to integrate into your native mobile or web based applications, and gives you the power to do some very cool things. For example, what if you could add video chat capabilities between your mobile and web clients? Now, what if you could take things a step further, and add IBM Watson cognitive computing capabilities to add real-time transcription and analysis?

Check out this video from yesterday’s Twilio Signal conference keynote, where fellow IBM’ers Damion Heredia and Jeff Sloyer demonstrate exactly this scenario; the integration of the new Twilio video SDK between iOS native and WebRTC client with IBM Watson cognitive computing services providing realtime transcription and sentiment analysis.

If it doesn’t automatically jump to the IBM Bluemix Demo, skip ahead to 2 hours, 15 min, and 20 seconds.

Jeff and Damion did an awesome job showing of both the new video service and the power of IBM Watson. I can also say first-hand that the new Twilio video services are pretty easy to integrate into your own projects (I helped them integrate these services into the native iOS client (physician’s app) shown in the demo)!  You just pull in the SDK, add your app tokens, and instantiate a video chat.   Jeff is pulling the audio stream from the WebRTC client and pushing it up to Watson in real time for the transcription and sentiment analysis services.

IBM Watson QA + Speech Recognition + Speech Synthesis = A Conversation With Your Computer

Back in November I released a demo application here on my blog showing the IBM Watson QA Service for cognitive/natural language computing connected to the Web Speech API in Google Chrome to have real conversational interaction with a web application.  It’s a nice demo, but it always drove me nuts that it only worked in Chrome.  Last month the IBM Watson team released 5 new services, and guess what… Speech Recognition and Speech Synthesis are included!

These two services enable you to quickly add Text-To-Speech or Speech-To-Text capability to any application.  What’s a better way to show them off than by updating my existing app to leverage the new speech services?

So here it is: watsonhealthqa.mybluemix.net!

By leveraging the Watson services it can now run in any browser that supports getUserMedia (for speech recognition) and HTML5 <Audio> (for speech playback).

(Full source code available at the bottom of this post)

You can check out a video of it in action below:

If your browser doesn’t support the getUserMedia API or HTML5 <Audio>, then your mileage may vary.  You can check where these features are supported with these links: <Audio>getUserMedia

Warning: This is targeting desktop browsers – HTML5 Audio is a mess on mobile devices due to limited codec support and immature APIs.

So how does this all work?

Just like the QA service, the new Text To Speech and Speech To Text services are now available in IBM Bluemix, so you can create a new application that leverages any of these services, or you can add them to any existing application.

I simply added the Text To Speech and Speech To Text services to my existing Healthcare QA application that runs on Bluemix:

bluemix-dashboard
IBM Bluemix Dashboard

 

These services are available via a REST API. Once you’ve added them to your application, you can consume them easily within any of your applications.

I updated the code from my previous example in 2 ways: 1) take advantage of the Watson Node.js Wrapper that makes interacting with Watson a lot easier and 2) to take advantage of these new services services.

Watson Node.js Wrapper

Using the Watson Node.js Wrapper, you can now easily instantiate Watson services in a single line of code.  For example:

var watson = require('watson-developer-cloud');
var question_and_answer_healthcare = watson.question_and_answer(QA_CREDENTIALS);
var speechToText = watson.speech_to_text(STT_CREDENTIALS);

The credentials come from your environment configuration, then you just create instances of whichever services that you want to consume.

QA Service

The code for consuming a service is now much simpler than the previous version.  When we want to submit a question to the Watson QA service, you can now simply call the “ask” method on the QA service instance.

Below is my server-side code from app.js that accepts a POST submission from the browser, delegates the question to Watson, and takes the result and renders HTML using a Jade template. See the Getting Started Guide for the Watson QA Service to learn more about the wrappers for Node or Java.

// Handle the form POST containing the question
app.post('/ask', function(req, res){

    // delegate to Watson
    question_and_answer_healthcare.ask({ text: req.body.questionText}, function (err, response) {
        if (err)
            console.log('error:', err);
        else {
          var response = extend({ 'answers': response[0] },req.body);

          // render the template to HTML and send it to the browser
          return res.render('response', response);
        }
    });
});

Compare this to the previous version, and you’ll quickly see that it is much simpler.

Speech Synthesis

At this point, we already have a functional service that can take natural language text, submit it to Watson,  and return a search result as text.  The next logical step for me was to add speech synthesis using the Watson Text To Speech Service (TTS).  Again, the Watson Node Wrapper and Watson’s REST services make this task very simple.  On the client side you just need to set the src of an <audio> instance to the URL for the TTS service:

<audio controls="" autoplay="" src="/synthesize?text=The text that should generate the audio goes here"></audio>

On the server you just need to synthesize the audio from the data in the URL query string.  Here’s an example how to invoke the text to speech service directly from the Watson TTS sample app:

var textToSpeech = new watson.text_to_speech(credentials);

// handle get requests
app.get('/synthesize', function(req, res) {

  // make the request to Watson to synthesize the audio file from the query text
  var transcript = textToSpeech.synthesize(req.query);

  // set content-disposition header if downloading the
  // file instead of playing directly in the browser
  transcript.on('response', function(response) {
    console.log(response.headers);
    if (req.query.download) {
      response.headers['content-disposition'] = 'attachment; filename=transcript.ogg';
    }
  });

  // pipe results back to the browser as they come in from Watson
  transcript.pipe(res);
});

The Watson TTS service supports .ogg and .wav file formats.  I modified this sample is setup only with .ogg files.  On the client side, these are played using the HTML5 <audio> tag.

Speech Recognition

Now that we’re able to process natural language and generate speech, that last part of the solution is to recognize spoken input and turn it into text.  The Watson Speech To Text (STT) service handles this for us.  Just like the TTS service, the Speech To Text service also has a sample app, complete with source code to help you get started.

This service uses the browser’s getUserMedia (streaming) API with socket.io on Node to stream the data back to the server with minimal latency. The best part is that you don’t have to setup any of this on your own. Just leverage the code from the sample app. Note: the getUserMedia API isn’t supported everywhere, so be advised.

On the client side you just need to create an instance of the SpeechRecognizer class in JavaScript and handle the result:

var recognizer = new SpeechRecognizer({
  ws: '',
  model: 'WatsonModel'
});

recognizer.onresult = function(data) {

    //get the transcript from the service result data
    var result = data.results[data.results.length-1];
    var transcript = result.alternatives[0].transcript;

    // do something with the transcript
    search( transcript, result.final );
}

On the server, you need to create an instance of the Watson Speech To Text service, and setup handlers for the post request to receive the audio stream.

// create an instance of the speech to text service
var speechToText = watson.speech_to_text(STT_CREDENTIALS);

// Handle audio stream processing for speech recognition
app.post('/', function(req, res) {
    var audio;

    if(req.body.url && req.body.url.indexOf('audio/') === 0) {
        // sample audio stream
        audio = fs.createReadStream(__dirname + '/../public/' + req.body.url);
    } else {
        // malformed url
        return res.status(500).json({ error: 'Malformed URL' });
    }

    // use Watson to generate a text transcript from the audio stream
    speechToText.recognize({audio: audio, content_type: 'audio/l16; rate=44100'}, function(err, transcript) {
        if (err)
            return res.status(500).json({ error: err });
        else
            return res.json(transcript);
    });
});

Source Code

You can interact with a live instance of this application at watsonhealthqa.mybluemix.net, and complete client and server side code is available at github.com/triceam/IBMWatson-QA-Speech.

Just setup your Bluemix app, clone the sample code, run NPM install and deploy your app to Bluemix using the Cloud Foundry CLI.

Helpful Links

IBM Watson, Cognitive Computing & Speech APIs

IBM Watson is a cognitive computing platform that you can use to add intelligence and natural language analysis to your own applications.  Watson employs natural language processing, hypothesis generation, and dynamic learning to deliver solutions for natural language question and answer services, sentiment analysis, relationship extration, concept expansion, and language/translation services. ..and, it is available for you to check out with IBM Bluemix cloud services.

Watson won Jeopardy, tackles genetics,  creates recipes, and so much more.  It is breaking new ground on a daily basis.

The IBM Watson™ Question Answer (QA) service provides an API that give you the power of the IBM Watson cognitive computing system. With this service, you can connect to Watson, pose questions in natural language, and receive responses that you can use within your application.

In this post, I’ve hooked the Watson QA node.js starter project to the Web Speech API speech recognition and speech synthesis APIs. Using these APIs, you can now have a conversation with Watson. Ask any question about healthcare, and see what watson has to say. Check out the video below to see it in action.

You can check out a live demo at:

Just click on the microphone button, allow access to the system mic, and start talking.  Just a warning, lots of background noise might interfere with the API’s ability to recognize & generate a meaningful transcript.

This demo only supports Google Chrome only at the time of writing. You can check out where Web Speech is supported at caniuse.com.

You can check out the full source code for this sample on IBM Jazz Hub (git):

I basically just took the Watson QA Sample Application for Node.js and started playing around with it to see what I could do…

This demo uses the Watson For Healthcare data set, which contains information from HealthFinder.gov, the CDC, National Hear Lung, and Blood Institute, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of Neurological Disorders and Stroke, and Cancer.gov.  Just know that this is a beta service/data set – implementing Watson for your own enterprise solutions requires system training and algorithm development for Watson to be able to understand your data.

Using Watson with this dataset, you can ask conditional questions, like:

  • What is X?
  • What causes X?
  • What is the treatment for X?
  • What are the symptoms of X?
  • Am I at risk of X?

Procedure questions, like:

  • What should I expect before X?
  • What should I expect after X?

General health auestions, like:

  • What are the benefits of taking aspirin daily?
  • Why do I need to get shots?
  • How do I know if I have food poisoning?

Or, action-related questions, like:

  • How can I quit smoking?
  • What should I do if my child is obese?
  • What can I do to get more calcium?

Watson services are exposed through a RESTful API, and can easily be integrated into an existing application.  For example, here’s a snippet demonstrating how you can consume the Watson QA service inside of a Node.js app:

var parts = url.parse(service_url +'/v1/question/healthcare');
var options = {
host: parts.hostname,
port: parts.port,
path: parts.pathname,
method: 'POST',
headers: {
  'Content-Type'  :'application/json',
  'Accept':'application/json',
  'X-synctimeout' : '30',
  'Authorization' :  auth
}
};

// Create a request to POST to Watson
var watson_req = https.request(options, function(result) {
  result.setEncoding('utf-8');
  var response_string = '';

  result.on('data', function(chunk) {
    response_string += chunk;
  });

  result.on('end', function() {
    var answers = JSON.parse(response_string)[0];
    var response = extend({ 'answers': answers },req.body);
    return res.render('response', response);
  });
});

Hooking into the Web Speech API is just as easy (assuming you’re using a browser that implements the Web Speech API – I built this demo using Chrome on OS X). On the client side, you just need need to create a SpeechRecognition instance, and add the appropriate event handlers.

var recognition = new webkitSpeechRecognition();
 recognition.continuous = true;
 recognition.interimResults = true;

 recognition.onstart = function() { ... }
 recognition.onresult = function(event) {

   var result = event.results[event.results.length-1];
   var transcript = result[0].transcript;

   // then do something with the transcript
   search( transcript );
 };
 recognition.onerror = function(event) { ... }
 recognition.onend = function() { ... }

To make your app talk back to you (synthesize speech), you just need to create a new SpeechSynthesisUtterance object, and pass it into the window.speechSynthesis.speak() function. You can add event listeners to handle speech events, if needed.

var msg = new SpeechSynthesisUtterance( tokens[i] ); 

msg.onstart = function (event) {
    console.log('started speaking');
};

msg.onend = function (event) {
    console.log('stopped speaking');
};

window.speechSynthesis.speak(msg);

Check out these articles on HTML5Rocks.com for more detail on Speech Recognition and Speech Synthesis.

Here are those links again…

You can get started with Watson services for Bluemix at https://console.ng.bluemix.net/#/store/cloudOEPaneId=store