voice-push.com

Pushing VoiceXML to the masses!



Home

Blog

VoiceXML
VoiceXML 2.0 & 2.1
VoiceXML 3.0
State Control XML
ASR & TTS
VoiceXML Applications

Video & VoiceXML
Video Apps

VUI vs. GUI
Client vs. Server Apps

Voice User Interfaces
DTMF vs. ASR
Target Audience
Dialog States
Global Commands
Zeroing Out
Personality
NLU vs. Directed Dialogs
Prompts - Wording
Prompts - Snippets
Prompts - Randomising
Prompts - Recording
Grammar Design
Waiting
Error Handling

Project Phases
User Requirements
Technical Spex
VUI Specifications
Development
Going Live!

Links

Contact

VoiceXML 3.0 Update!

There's been an update on VoiceXML 3.0! Check out this sneak preview at the W3C. As they are at pains to express, this isn't the final blueprint for VoiceXML 3.0 - it's more what the authors think should be in it. However, alot of it isn't far off the topics that I've listed below.

VoiceXML 3.0

So VoiceXML 3.0 hasn't even reached a working draft, however, there are plenty of indications about what it will contain. (There's an older whitepaper about V3 at the VoiceXML Forum.) Here are at least 3 issues that will probably be addressed:

Speaker Verification/Identification

Yup, so far VoiceXML left this one out. Don't know why, as it is pretty relevant to banking and call centre applications. Obviously you can work around it. Record the caller's utterance and send it back to the server for processing - but that rather defeats the purpose of having ASR and most likely Speaker Verification on your VoiceXML browser. So hopefully there will be a simple tag to handle speaker verification/identification.

For those of you not in the know, the difference between speaker verification and speaker identification is that in verification you claim to be someone and the system confirms if you are or not. In identification, the system says who you are from a fixed group of voices. So you would use verification in a banking application and identification to automatically subtitle a sitcom.

Video

From a language point of view including video as a "prompt" is fairly straight forward. Now instead of just being able to play audio files, you'll be able to play video files as well. A video IVR pretty much runs the same way as a voice IVR. You play a video, the customer makes a choice (using either ASR or DTMF) and the next video is played. No big deal. However, from a platform point of view the suppliers are going to have to make major changes in order to accommodate video. Video has both an audio and a video stream, which must be synchronised with each other. There are hundreds of different video codecs out there and converting from one to another isn't really the job of a VoiceXML browser. The first details about using video are already out there - check out the VoiceXML Review.

One of the things that VoiceXML would have difficulty handling in video, is the mixing of text and audio with existing video. For instance, let's say that you want to play an audio files, or some TTS over a video. Or you may want to add some text to a video to personalise it for a particular caller - 'Hi Michael! You have 3 new video mails'. Now there's a pretty good case for arguing that the application itself should handle this, but new standards may be necessary for this (Convedia's proposal for MSML/MOML).

Multimodality

At the moment there are two approaches to creating multi-modal applications - SALT and X+V. SALT is the Microsoft offering and xHTML + VXML is the IBM counterweight. Microsoft recently announced that they would start supporting VoiceXML in the .net environment, which is an indication that they may be accepting that VoiceXML is the preferred IVR language. Either way, VoiceXML 3.0 will have to take account of multi-modality - and hopefully by incorporating the best of SALT and X+V.



If you have any comments, ideas, issues, etc. about this topic why not try the voice-push forums






© voice-push.com