Contact Us
General Enquiries Tel: 020 8326 8326
Need Support? Tel: 020 8326 8300
red box logo
NICE fully colourised logo
NICE in contact fully colourised logo
Verint fully colourised logo
Microsoft fully colourised logo
WW Call recording

Speech Analytics – The Technology Behind It

9 min read
Author Business Systems UK
Date Jan 28, 2015
Category The Inner Circle Guide

ContactBabel recently published the ‘The Inner Circle Guide to Customer Contact Analytics’ which was sponsored by Business Systems. For those with a hectic schedule, we created a short series of blogs covering some of the main points highlighting the latest insights on speech analytics covered in the guide. Please find the third in the series below.
Speech analytics can be delivered in a variety of ways. The most commonly known are the phonetics approach and the speech-to-text or transcription based approach. Increasingly we are now seeing companies offering a hybrid of the two with the ability to phonetically search against a set vocabulary or dictionary and then transcribe a percentage of the results.
Nexidia for example no longer solely rely on phonetics, they also use speech-to-text and Verint uses the transcription approach with accuracy based on proximity of other phrases, so there is a cross-over into phonetics. NICE has also historically taken the hybrid approach of phonetics search and transcription approach as outlined below.


Phonetics-based applications look for defined sounds or a string of sounds and attempt to match these sounds to target words or phrases in a phonetic index file. The phonetic search process uses an acoustic model tuned to a specific language, with the search terms converted into phonemes and results returned based on relevancy.

Dictionary vs non-Dictionary

There are different methods of phonetic search depending on the vendor and technology. Non dictionary dependent ad-hoc search is possible to find any phrases specified. Alternatively in the dictionary approach, used by companies liked NICE the searches are based on a glossary of pre-defined phrases.
The non-dictionary approach is useful where new phrases or terms are frequently being used in conversations. For example in retail where you may be launching new products all the time, a phonetics approach means the user can just type in the name and it can be searched upon. With either method of phonetic search there is still no guarantee the keyword or phrase found will be used in the correct context.

False positives

Phrase recognition is used to help alleviate this issue and reduces false positives, to help put words into context. By adapting the search and entering a longer phrase, the more chance you have of achieving accurate and unique results. Searching on single words will bring back more results, but risks a lot of false positives, unless you have a distinctive word, like a competitors name for example.


Also known as Large Vocabulary Continuous Speech Recognition (LVCSR), in LVCSR a call is transcribed into text in order for the analysis and keyword spotting to take place. It is largely dependent on a language model and dictionary to identify words correctly. It does not require pre-definition of words to search for as the content of the calls is available in the index.
When it comes to the actual indexing which takes the outputs from the speech engine in order to make it searchable, transcription-based processing is considerably slower, usually in the region of 4-20 x real-time versus >1000 x real time for some phonetics based systems.

Using transcription based analytics for root cause analysis

It is generally accepted that 60-70% accuracy in word recognition is average and transcription based analytics retains the entire content of calls not just the initial keywords and phrases specified. As a result it tends to be the better option for root cause analysis and identifying clusters of terms that occur together, providing a starting point for deeper analysis.


Where you have dual phonetic and transcription –based systems customers can benefit from phonetics’ rapid identification of key words and phrases, whilst allowing in-depth discovery and root cause analysis by use of the transcription method. A typical example would be to use this to analyse 100% of calls quickly with phonetic indexing, categorising and viewing trends, then transcribing the calls identified as being of particular interest in order to conduct root cause analysis. If you only transcribe the calls of interest you minimise the need to transcribe 100% of calls putting less strain on your servers.


First and foremost you need to think about the likely use of the technology and how you are going to apply it. If you’re likely to be searching for information many times a day as part of a business intelligence or process improvement project, then transcription may be preferred as searching is quicker. If the organisation is likely to process large amounts of audio but searching it infrequently for example in case of evidence production or proof of compliance, then phonetics may be a more appropriate choice.
Ultimately when considering which solution to implement, customers need to think about how they are going to use the technology, how important accuracy is and working with a supplier who can help fully embed the technology into the organisation in a way that adds value.

Keep an eye out for the next in our speech analytics blog series – where we’ll provide top tips for implementing speech analytics – not one to be missed!

Download the full ‘Inner Circle Guide to Contact Centre Analytics’ here >