Tag Archives: Watson

Mobile Apps with Language & Translation Services using IBM Watson & IBM MobileFirst

UPDATE 12/22/15:  IBM Recently released a new iOS SDK for Watson that makes integration with Watson services even easier. You can read more about it here.


I recently gave a presentation at IBM Insight on Cognitive Computing in mobile apps.  I showed two apps: one that uses Watson natural language processing to perform search queries, and another that uses Watson translation and speech to text services to take text in one language, translate it to another language, then even have the app play back the spoken audio in the translated language.  It’s this second app that I want to highlight today.

In fact, it gets much cooler than that.  I had an idea: “What if we hook up an OCR (optical character recognition) engine to the translation services?” That way, you can take a picture of something, extract the text, and translate it.  It turns out, it’s not that hard, and I was able to put together this sample app in just under two days.  Check out the video below to see it in action.

To be clear, I ended up using a version of the open source Tesseract OCR engine targeting iOS. This is not based on any of the work IBM research is doing with OCR or natural scene OCR, and should not be confused with any IBM OCR work.  This is basic OCR and works best with dark text on a light background.

The Tesseract engine lets you pass in an image, then handles the OCR operations, returning you a collection of words that it is able to extract from that image.  Once you have the text, you can do whatever you want from it.

So, here’s where Watson Developer Cloud Services come into play. First, I used the Watson Language Translation Service to perform the translation.  When using this service, I make a request to my Node.js app running on IBM Bluemix (IBM’s cloud platform).  The Node.js app acts as a facade and delegates to the Watson service for the actual translation.

translator

You can check out a sample on the web here:

Translate english to:

On the mobile client, you just make a request to your service and do something with the response. The example below uses the IMFResourceRequest API to make a request to the server (this can be done in either Objective C or Swift). IMFResourceRequest is the MobileFirst wrapper for networking requests that enables the MobileFirst/Mobile Client Access service to capture operational analytics for every request made by the app.

NSDictionary *params = @{
  @"text":text,
  @"source":@"en",
  @"target":language
};

IMFResourceRequest * imfRequest =
  [IMFResourceRequest requestWithPath:@"https://translator.mybluemix.net/translate"
                      method:@"GET" parameters:params];

[imfRequest sendWithCompletionHandler:^(IMFResponse *response, NSError *error) {
  NSDictionary* json = response.responseJson;
  NSArray *translations = [json objectForKey:@"translations"];
  NSDictionary *translationObj = [translations objectAtIndex:0];
  self.lastTranslation = [translationObj objectForKey:@"translation"];
  // now do something with the result - like update the UI
}];

On the Node.js server, it is simply taking the request and brokering it to the Watson Translation service (using the Watson Node.js SDK):

app.get('/translate', function(req, res){
  language_translation.translate(req.query, function(err, translation) {
    if (err) {
      console.log(err)
      res.send( err );
    } else {
      console.log(translation);
      res.send( translation );
    }
  });
});

Once you receive the result from the server, then you can update the UI, make a request to the speech to text service, or pretty much anything else.

To generate audio using the Watson Text To Speech service, you can either use the Watson Speech SDK, or you can use the Node.js facade again to broker requests to the Watson Speech To Text Service. In this sample I used the Node.js facade to generate Flac audio, which I played in the native iOS app using the open source Origami Engine library that supports Flac audio formats.

You can preview audio generated using the Watson Text To Speech service using the embedded audio below. Note: In this sample I’m using the OGG file format; it will only work in browsers that support OGG.

English: Hello and welcome! Please share this article with your friends!

Spanish:
Hola y bienvenido! Comparta este artículo con sus amigos!

app.get('/synthesize', function(req, res) {
  var transcript = textToSpeech.synthesize(req.query);
  transcript.on('response', function(response) {
    if (req.query.download) {
      response.headers['content-disposition'] = 'attachment; filename=transcript.flac';
    }
  });
  transcript.on('error', function(error) {
    console.log('Synthesize error: ', error)
  });
  transcript.pipe(res);
});

On the native iOS client, I download the audio file and play it using the Origami Engine player. This could also be done with the Watson iOS SDK (much easier), but I wrote this sample before the SDK was available.

//format the URL
NSString *urlString = [NSString stringWithFormat:@"https://translator.mybluemix.net/synthesize?text=Hola!&voice=es-US_SofiaVoice&accept=audio/flac&download=1", phrase, voice ];
NSString* webStringURL = [urlString stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSURL *flacURL = [NSURL URLWithString:webStringURL];

//download the contents of the audio file
NSData *audioData = [NSData dataWithContentsOfURL:flacURL];
NSString *docDirPath = NSTemporaryDirectory() ;
NSString *filePath = [NSString stringWithFormat:@"%@transcript.flac", docDirPath ];
[audioData writeToFile:filePath atomically:YES];

//pass the file url the the origami player and play the audio
NSURL* fileUrl = [NSURL fileURLWithPath:filePath];
[self.orgmPlayer playUrl:fileUrl];

Cognitive computing is all about augmenting the experience of the user, and enabling the users to perform their duties more efficiently and more effectively. The Watson language services enable any app to greater facilitate communication and broaden the reach of content across diverse user bases. You should definitely check them out to see how Watson services can benefit you.

MobileFirst

So, I mentioned that this app uses IBM MobileFirst offerings on Bluemix. In particular I am using the Mobile Client Access service to collect logs and operational analytics from the app. This lets you capture logs and usage metrics for apps that are live “out in the wild”, providing insight into what people are using, how they’re using it, and the health of the system at any point in time.

Analytics from the Mobile Client Access service
Analytics from the Mobile Client Access service

Be sure to check out the MobileFirst on Bluemix and MobileFirst Platform offerings for more detail.

Source

You can access the sample iOS client and Node.js code at https://github.com/triceam/Watson-Translator. Setup instructions are available in the readme document. I intend on updating this app with some more translation use cases in the future, so be sure to check back!

 

 

Thoughts on Cognitive Computing

You may have heard a lot of buzz coming out of IBM lately about Cognitive Computing, and you might have also wondered “what the heck are they talking about?”  You may have heard of services for data and predictive analytics, services for natural language text processing, services for sentiment analysis, services understand speech and translate languages, but it’s sometimes hard to see the forest through the trees.

I highly recommend taking a moment to watch this video that introduces Cognitive Computing from IBM:

Those services that I mentioned above are all examples of Cognitive Computing systems, and are all available for you to use today.

From IBM Research:

Cognitive computing systems learn and interact naturally with people to extend what either humans or machine could do on their own.

They help human experts make better decisions by penetrating the complexity of Big Data.

Cognitive systems are often based upon massive sets of data and powerful analytics algorithms that detect patterns and concepts that can be turned into actionable information for the end users.  It’s not “artificial intelligence” in the sense that the services/machines act upon their own; rather a system that provides the user tools or information that enables them to make better decisions.

The benefits of cognitive systems in a nutshell:

  1. They augment the user’s experience
  2. They provide the ability to process information faster
  3. They make complex information easier to understand
  4. They enable you to do things you might not otherwise be able to do

Curious where this will lead?  Now take a moment and watch this video talking about the industry-transforming opportunities that Cognitive Computing is already beginning to bring to life”

So, why is the “mobile guy” talking about Cognitive Computing?

First, it’s because Cognitive Computing is big… I mean, really, really big.  Cognitive systems are literally transforming industries and providing powerful analytics and insight into the hands of both experts and “normal people”.  When I say “into the hands”, I again mean this literally; much of this cognitive ability is being delivered to those end users through their mobile devices.

It’s also because cognitive systems fit nicely with IBM’s MobileFirst product offerings.  It doesn’t matter whether you’re using the MobileFirst Platform Foundation server on-premise, or leveraging the MobileFirst offerings on IBM Bluemix, in both cases you can easily consume IBM Watson cognitive services to augment and enhance the interactions and data for your mobile applications. Check out the Bluemix catalog to see how you might start adding Watson cognitive or big data abilities to your apps today.

Last, and this is purely just personal opinion, I see the mobile MobileFirst offerings themselves as providing somewhat of cognitive service for developing mobile apps.  If you look at it from the operational analytics perspective, you have an immediate insight and a snapshot into the health of your system that you would never have seen otherwise.  You can know what types of devices are hitting your system, what services are being used, how long things are taking, and detect issues, all without any additional development efforts on your end. It’s not predictive analytics, but sure is helpful and gets us moving in the right direction.

IBM Watson Speech Services Just Got A Whole Lot Easier

UPDATE 12/22/15:  IBM Recently released a new iOS SDK for Watson that makes integration with Watson services even easier. You can read more about it here.


IBM_Watson_avatar_negIBM’s Watson Developer Cloud speech services just got a whole lot easier for mobile developers.  I myself just learned about these two, and can’t wait to integrate them into my own mobile applications.

The Watson Speech to Text and Text to Speech services are now available in both native iOS and Android SDKs, making it even easier to integrate language services into your apps.

These native APIs now include audio streaming back to the Watson Speech to Text service, for lower latency responses to spoken languages.

I can guarantee you that my “voice-drive iOS apps” demo will be updated soon, and I’ll be using this for all future language processing services.

Video – Smarter Apps with Cognitive Computing

UPDATE 12/22/15:  IBM Recently released a new iOS SDK for Watson that makes integration with Watson services even easier. You can read more about it here.


Last week I had the opportunity to present to a great audience at the MoDev DC meetup group on “Smarter Apps with Cognitive Computing”.   In this session I focused on how you can create a voice-driven experience in your mobile apps. I gave an introduction to IBM Bluemix and IBM Watson services (particularly the Watson language services), and demonstrated how you can integrate them into your native iOS apps. I also covered IBM MobileFirst for operational analytics and remote logging to provide insight into your app’s performance once it goes live.  Check out a recording of the complete presentation in the video below:

https://youtu.be/TGRMmf8e-6s

You can read more detail about how this example works and access source code for the sample application in the links below:

Just create an account on IBM Bluemix and you can get started for free!

This app uses three services available through IBM Bluemix, all of which are available for you to try out:

App Architecture
App Architecture

Feel free to poke around the code to learn more!

Voice-Driven Native Mobile Apps with IBM Watson & IBM MobileFirst

Update: The IBM Watson team just announced a new native SDK for both iOS and Android that simplifies and streamlines integration with Speech To Text and Text To Speech services.  Check out more detail here: IBM Watson Speech Services Just Got A Whole Lot Easier.


Using your voice to drive interactions within your app is a powerful concept. It is the primary interaction driving Apple’s Siri, Microsoft’s Cortana, and Google’s Voice Actions. By analyzing spoken words, voice commands allow you to complete possibly complex actions with minimal interaction with the device. Or, they enable entirely different forms of interaction, for example, interacting with a remote system through the telephone.

Voice driven interactions are essentially a two part process:

  • Transcribe audible signal to text transcript
  • Perform a system action by parsing text transcript

If you think that voice-driven apps are too complicated, or out of your reach, then I have great news for you: They are not! Last week, IBM elevated several IBM Watson voice services from Beta to General Availability – that means you can use them reliably in your own systems too!

Let’s examine the two parts of the system, and see what solutions IBM has available right now for you to take advantage of…

Transcribe audible signal to text transcript

Part one of this equation is converting the audible signal into text that can be parsed and acted upon. The IBM Speech to Text service fits this bill perfectly, and can be called from any app platform that supports REST services… which means just about anything. It could be from the browser, it could be from the desktop, and it could be from a native mobile app. The Watson STT service is very easy to use, you simply post a request to the REST API containing an audio file, and the service will return to you a text transcript based upon what it is able to analyze from the audio file. With this API you don’t have to worry about any of the transcription actions on your own – no concern for accents, etc… Let Watson do the heavy lifting for you.

Perform a system action by parsing text transcript

This one is perhaps not quite as simple because it is entirely subjective, and depends upon what you/your app is trying to do. You can parse the text transcript on your own, searching for actionable keywords, or you can leverage something like the IBM Watson Q&A service, which enables natural language search queries to Watson data corpora.

Riding on the heels of the Watson language services promotion, I put together a sample application that enables a voice-driven app experience on the iPhone, powered by both the Speech To Text and Watson Question & Answer services, and have made the mobile app and Node.js middleware source code available on github.

Watson Speech QA for iOS

This native iOS app, which I’m calling “Watson Speech QA for iOS” allows you to ask Watson questions in natural, spoken language, and receive textual responses based on the Watson QA Healthcare data set.

Check out the video below to see it in action:

https://youtu.be/0kedhwC3ikY

Bluemix Services Used

This app uses three services available through IBM Bluemix:

  1. Speech to Text – Convert spoken audio into text
  2. Question & Answer – Natural language search
  3. Advanced Mobile Access – Capture analytics and logs from mobile apps running on devices
App Architecture
IBM Watson Speech QA for iOS App Architecture

The app communicates to the Speech to Text and Question & Answer services through the Node.js middelware tier, and connects directly to the Advanced Mobile Access service to provide operational analytics (usage, devices, network utilization) and remote log collection from the client app on the mobile devices.

For the Speech To Text service, the app records audio from the local device, and sends a WAV file to the Node.js in a HTTP post request. The Node.js tier then delegates to the Speech To Text service to provide transcription capabilities. The Node.js tier then formats the respons JSON object and returns the query to the mobile app.

For the QA service, the app makes an HTTP GET request (containing the query string) to the Node.js server, which delegates to the Watson QA natural language processing service to return search results. The Node.js tier then formats the respons JSON object and returns the query to the mobile app.

The general flow between these systems is shown in the graphic below:

IBM Watson Speech QA for iOS - Logic Flow
IBM Watson Speech QA for iOS – Logic Flow

 

Code Explained

Mobile app and Node.js middleware source code and setup instructions are available at: https://github.com/triceam/IBM-Watson-Speech-QA-iOS

The code for this example is really in 2 main areas: The client side integration in the mobile app (Objective-C, but could also be done in Swift), and the application server/middleware implemented in Node.js.

Node.js Middleware

The server side JavaScript code uses the Watson Node.js Wrapper, which enables you to easily instantiate Watson services in just a few short lines of code

var watson = require('watson-developer-cloud');
var question_and_answer_healthcare = watson.question_and_answer(QA_CREDENTIALS);
var speechToText = watson.speech_to_text(STT_CREDENTIALS);

The credentials come from your Bluemix environment configuration, then you just create instances of whichever services that you want to consume.

I implemented two methods in the Node.js application tier. The first accepts the audio input from the mobile client as an attachment to a HTTP POST request and returns a transcript from the Speech To Text service:

// Handle the form POST containing an audio file and return transcript (from mobile)
app.post('/transcribe', function(req, res){

  //grab the audio WAV file attachment and prepare to send to Watson
  var file = req.files.audio;
  var readStream = fs.createReadStream(file.path);
  console.log("opened stream for " + file.path);

  var params = {
    audio:readStream,
    content_type:'audio/l16; rate=16000; channels=1',
    continuous:"true"
  };

  //send the audio WAV file to the watson.recognize service
  speechToText.recognize(params, function(err, response) {
    readStream.close();

    if (err) {
      return res.status(err.code || 500).json(err);
    } else {
      //parse the results and return them to the client
      var result = {};
      if (response.results.length > 0) {
        var finalResults = response.results.filter( isFinalResult );
        if ( finalResults.length > 0 ) {
          result = finalResults[0].alternatives[0];
        }
      }
      return res.send( result );
    }
  });
});

Once you have the text transcript on the client, you could do whatever you want with it. You could parse it to invoke local actions, or delegate to a natural language query service

The second method does exactly this: it accepts a URL query parameter from a HTTP GET request and uses that parameter in a Watson QA natural language search:

//handle QA query and return json result (for mobile)
app.get('/ask', function(req, res){

  //get a copy of the search query text from the req.query object
  var query = req.query.query;

  if ( query != undefined ) {
    //perform a search using the QA "ask" method
    question_and_answer_healthcare.ask({ text: query}, function (err, response) {
      if (err){
        return res.status(err.code || 500).json(response);
      } else {
        //format the results and return them to the mobile client
        if (response.length > 0) {
          var answers = [];

          for (var x=0; x<response[0].question.evidencelist.length; x++) {
            var item = {};
            item.text = response[0].question.evidencelist[x].text;
            item.value = response[0].question.evidencelist[x].value;
            answers.push(item);
          }

          var result = {
            answers:answers
          };
          return res.send( result );
        }
        return res.send({});
      }
    });
  }
  else {
    return res.status(500).send('Bad Query');
  }
});

Note: I am using the free/open Watson Healthcare data set. However the Watson QA service can handle other data sets – these require an engagement with IBM to train the Watson service to understand the desired data sets.

Native iOS – Objective C

On the mobile side we’re working with a native iOS application. My code is written in Objective C, however you could also implement this using Swift. I won’t go into complete line-by-line code here for the sake of brevity, but you can access the client side code in the ViewController.m file. In particular, this is within the postToServer and requestQA methods.

You can see the flow of the application within the image below:

app
App Flow: User speaks, transcript displayed, results displayed

 

The native mobile app first captures audio input from device’s microphone. This is then sent to the Node.js server’s /transcribe method as an attachment to a HTTP POST request (postToServer method on line 191). On the server side this delegates to the Speech To Test service as described above. Once the result is received on the client, the transcribed text is displayed in the UI and then a request is made to the QA service.

In the requestQA method, the mobile app makes a HTTP GET request to the Node.js app’s /ask method (as shown on line 257). The Node.js app delegates to the Watson QA service as shown above. Once the results are returned to the client they are displayed within a standard UITableView in the native app.

MobileFirst – Advanced Mobile Access

A few other things you may notice if you decide to peruse the native Objective-C code:

  1. Within AppDelegate.m you will see calls to IMFClient, IMFAnalytics, and OCLogger classes. These enable operational analytics and log collection within the Advanced MobileAccess service.
  2. All network requests inside of ViewController.m use the

    IMFResourceRequest class. Using the IMFResourceRequest class enables the collection of analytics for every request made within the application (through this class).

Together these allow for the collection of device logs, automatic crash reporting, and operational analytics that provide one of the strengths of the Advanced Mobile Access service, which is one of the mobile offerings on IBM Bluemix.

Source Code

Mobile app and Node.js middleware source code and setup instructions for this app are available at:

Just create an account on IBM Bluemix, and you have everything that you need to get started creating your own voice-driven apps.