Tag Archives: html

IBM Watson QA + Speech Recognition + Speech Synthesis = A Conversation With Your Computer

Back in November I released a demo application here on my blog showing the IBM Watson QA Service for cognitive/natural language computing connected to the Web Speech API in Google Chrome to have real conversational interaction with a web application.  It’s a nice demo, but it always drove me nuts that it only worked in Chrome.  Last month the IBM Watson team released 5 new services, and guess what… Speech Recognition and Speech Synthesis are included!

These two services enable you to quickly add Text-To-Speech or Speech-To-Text capability to any application.  What’s a better way to show them off than by updating my existing app to leverage the new speech services?

So here it is: watsonhealthqa.mybluemix.net!

By leveraging the Watson services it can now run in any browser that supports getUserMedia (for speech recognition) and HTML5 <Audio> (for speech playback).

(Full source code available at the bottom of this post)

You can check out a video of it in action below:

If your browser doesn’t support the getUserMedia API or HTML5 <Audio>, then your mileage may vary.  You can check where these features are supported with these links: <Audio>getUserMedia

Warning: This is targeting desktop browsers – HTML5 Audio is a mess on mobile devices due to limited codec support and immature APIs.

So how does this all work?

Just like the QA service, the new Text To Speech and Speech To Text services are now available in IBM Bluemix, so you can create a new application that leverages any of these services, or you can add them to any existing application.

I simply added the Text To Speech and Speech To Text services to my existing Healthcare QA application that runs on Bluemix:

bluemix-dashboard
IBM Bluemix Dashboard

 

These services are available via a REST API. Once you’ve added them to your application, you can consume them easily within any of your applications.

I updated the code from my previous example in 2 ways: 1) take advantage of the Watson Node.js Wrapper that makes interacting with Watson a lot easier and 2) to take advantage of these new services services.

Watson Node.js Wrapper

Using the Watson Node.js Wrapper, you can now easily instantiate Watson services in a single line of code.  For example:

var watson = require('watson-developer-cloud');
var question_and_answer_healthcare = watson.question_and_answer(QA_CREDENTIALS);
var speechToText = watson.speech_to_text(STT_CREDENTIALS);

The credentials come from your environment configuration, then you just create instances of whichever services that you want to consume.

QA Service

The code for consuming a service is now much simpler than the previous version.  When we want to submit a question to the Watson QA service, you can now simply call the “ask” method on the QA service instance.

Below is my server-side code from app.js that accepts a POST submission from the browser, delegates the question to Watson, and takes the result and renders HTML using a Jade template. See the Getting Started Guide for the Watson QA Service to learn more about the wrappers for Node or Java.

// Handle the form POST containing the question
app.post('/ask', function(req, res){

    // delegate to Watson
    question_and_answer_healthcare.ask({ text: req.body.questionText}, function (err, response) {
        if (err)
            console.log('error:', err);
        else {
          var response = extend({ 'answers': response[0] },req.body);

          // render the template to HTML and send it to the browser
          return res.render('response', response);
        }
    });
});

Compare this to the previous version, and you’ll quickly see that it is much simpler.

Speech Synthesis

At this point, we already have a functional service that can take natural language text, submit it to Watson,  and return a search result as text.  The next logical step for me was to add speech synthesis using the Watson Text To Speech Service (TTS).  Again, the Watson Node Wrapper and Watson’s REST services make this task very simple.  On the client side you just need to set the src of an <audio> instance to the URL for the TTS service:

<audio controls="" autoplay="" src="/synthesize?text=The text that should generate the audio goes here"></audio>

On the server you just need to synthesize the audio from the data in the URL query string.  Here’s an example how to invoke the text to speech service directly from the Watson TTS sample app:

var textToSpeech = new watson.text_to_speech(credentials);

// handle get requests
app.get('/synthesize', function(req, res) {

  // make the request to Watson to synthesize the audio file from the query text
  var transcript = textToSpeech.synthesize(req.query);

  // set content-disposition header if downloading the
  // file instead of playing directly in the browser
  transcript.on('response', function(response) {
    console.log(response.headers);
    if (req.query.download) {
      response.headers['content-disposition'] = 'attachment; filename=transcript.ogg';
    }
  });

  // pipe results back to the browser as they come in from Watson
  transcript.pipe(res);
});

The Watson TTS service supports .ogg and .wav file formats.  I modified this sample slightly to return .ogg files for Chrome and Firefox, and .wav files for other browsers.  On the client side, these are played using the HTML5 <audio> tag. You can see my modifications in the git repository.

Speech Recognition

Now that we’re able to process natural language and generate speech, that last part of the solution is to recognize spoken input and turn it into text.  The Watson Speech To Text (STT) service handles this for us.  Just like the TTS service, the Speech To Text service also has a sample app, complete with source code to help you get started.

This service uses the browser’s getUserMedia (streaming) API with socket.io on Node to stream the data back to the server with minimal latency. The best part is that you don’t have to setup any of this on your own. Just leverage the code from the sample app. Note: the getUserMedia API isn’t supported everywhere, so be advised.

On the client side you just need to create an instance of the SpeechRecognizer class in JavaScript and handle the result:

var recognizer = new SpeechRecognizer({
  ws: '',
  model: 'WatsonModel'
});

recognizer.onresult = function(data) {

    //get the transcript from the service result data
    var result = data.results[data.results.length-1];
    var transcript = result.alternatives[0].transcript;

    // do something with the transcript
    search( transcript, result.final );
}

On the server, you need to create an instance of the Watson Speech To Text service, and setup handlers for the post request to receive the audio stream.

// create an instance of the speech to text service
var speechToText = watson.speech_to_text(STT_CREDENTIALS);

// Handle audio stream processing for speech recognition
app.post('/', function(req, res) {
    var audio;

    if(req.body.url && req.body.url.indexOf('audio/') === 0) {
        // sample audio stream
        audio = fs.createReadStream(__dirname + '/../public/' + req.body.url);
    } else {
        // malformed url
        return res.status(500).json({ error: 'Malformed URL' });
    }

    // use Watson to generate a text transcript from the audio stream
    speechToText.recognize({audio: audio, content_type: 'audio/l16; rate=44100'}, function(err, transcript) {
        if (err)
            return res.status(500).json({ error: err });
        else
            return res.json(transcript);
    });
});

Source Code

You can interact with a live instance of this application at watsonhealthqa.mybluemix.net, and complete client and server side code is available at github.com/triceam/IBMWatson-QA-Speech.

Just setup your Bluemix app, clone the sample code, run NPM install and deploy your app to Bluemix using the Cloud Foundry CLI.

Helpful Links

Video: Data Visualization With Web Standards

Last week I had the opportunity to present “Data Visualization With Web Standards” to the Data Visualization New York Meetup group.  There was a great turnout, and thanks to everyone who attended.  I’d like to especially thank Christian Lilley and Paul Trowbridge for organizing the event.

My presentation focused on the fundamental techniques of visualizing data within HTML/JS experiences.  You can view my presentation in its entirety below.  Slides and bullet points are below the fold…

Entire meetup video available here.

My slides are available below.  Just press the space bar to advance to the next “slide”.

data_viz

Key Points

Basically, there are 5 general ways to visualize data using web-standards techniques – here is a brief overview with pros & cons:


<img>

You can embed images using the html <img> that have server-rendered data visualizations. This is nothing new… They are very basic, but will certainly work.

  • Not interactive
  • Requires online & round-trip to server
  • No “WOW” factor – let’s face it, they are boring
  • Example: Google Image Charts

HTML5 <canvas>

You can use the HTML5 <canvas> element to programmatically render content based upon data in-memory using JavaScript. The HTML5 Canvas provides you with an API for rendering graphical content via moveTo or lineTo instructions, or by setting individual pixel values manually.  Learn more about the HTML5 canvas from the MDN tutorials.

  • Can be interactive
  • Dynamic – client side rendering with JavaScript
  • Hardware accelerated on some platforms
  • Can work offline
  • Works in newer browsers: http://caniuse.com/#search=canvas

Demos:


Scalable Vector Graphics (SVG)

SVG is a declarative XML-based markup language that is used to create vector graphics content, and can be used to create visual content inside of web experiences.

  • Client or Server-side rendering
  • Can be static or dynamic
  • Can be scripted with JS
  • Can be manipulated via HTML DOM
  • Works in newer browsers (but not on Android 2.x and earlier): http://caniuse.com/#search=SVG

Demos:


HTML DOM Elements

Visualizations like interactive maps, or simple charts can be created purely with HTML structures and creative use of CSS styles to control position, visual presentation, etc… You can use CSS positioning to control x/y placement, and percentage-based width/height to display relative values based upon a range of data.   For example, the following bar chart/table is created purely using HTML DIV containers with CSS styles.

Samples:


WebGL

WebGL is on the “bleeding edge” of interactive graphics & data visualization across the web. WebGL enables hardware-accelerated 3D graphics inside the browser experience. Technically, it is not a standard, and there is varied and/or incomplete support across different browsers (http://caniuse.com/#search=webgl).  There is also considerable debate whether it ever will be a standard; however there are some incredible samples out on the web worth mentioning:

Feel free to leave a comment with any questions.
Enjoy!