voice-push.com

Pushing VoiceXML to the masses!



Home

Blog

VoiceXML
VoiceXML 2.0 & 2.1
VoiceXML 3.0
State Control XML
ASR & TTS
VoiceXML Applications

Video & VoiceXML
Video Apps

VUI vs. GUI
Client vs. Server Apps

Voice User Interfaces
DTMF vs. ASR
Target Audience
Dialog States
Global Commands
Zeroing Out
Personality
NLU vs. Directed Dialogs
Prompts - Wording
Prompts - Snippets
Prompts - Randomising
Prompts - Recording
Grammar Design
Waiting
Error Handling

Project Phases
User Requirements
Technical Spex
VUI Specifications
Development
Going Live!

Links

Contact

Recording Prompts

Record more rather than less

Make sure to record all possible variations of your prompts. If you're not sure about the particular wording of a prompt then record all the variations that you can think of. If a particular product can be referred to in different ways, then record all variations. It's cheaper to do that than having to do another recording session. Be sure to record all snippets as both open and closed. Remember, the customer may change their minds about the call flow and the menu order. If you've recorded flexble prompts using snippets then you may be able to avoid a trip back to the recording studio.

Formats

Generally speaking a good recording studio will record a voice in CD quality - i.e. 44 kHz, 16 bit stereo. However, there’s not much point playing this kind of quality across a telephone line. Particularly as the top frequency is about 3.4 kHz, the sampling frequency is 8 kHz and the number of bits 8. Having said that, always get your recordings in the original quality and in your desired format. If the sound studio have got things wrong, then you can use the originals to create a better version. A program like CoolEdit (ok, it has a new name, but I still think of it as CoolEdit '96) should handle most of your needs. Don't let the studio run compressors and the like over the file - they aren't really necessary.

Now the format you actually choose for your files will often depend on what your platform can actually play. For most telephony applications the sampling rate is going to be 8 kHz. That gives you a top frequency of 4 kHz - which is better than you will probably get across a telephony line (the channel characteristics diminish the higher frequencies - hence a top frequency of about 3.4 kHz). As for mono/stereo, you won’t be surprised when I say that it has to be mono. The number of bits can be 8 or 16 - though a 16 bit file will have to be converted to 8 bit by the platform for the telephone.

In order to optimise the transmission of speech across a telephone channel, two codecs were developed which use compression to get (in effect) 13 bit resolution out of an 8 bit codec. These are a-law in Europe and mu-law in America. They have been optimised for speech and provide better quality than having your 16 bit files downsized to 8 bit. As regards amplitude, you may need to experiment a little with the VXML browser that you're using, but a good rule of thumb is that an RMS of -13dB should be ok.



If you have any comments, ideas, issues, etc. about this topic why not try the voice-push forums






© voice-push.com