Mittwoch, 3. Februar 2016

Almost 20th anniversary of "Voice recognition is ready for primetime"

As long as I can remember, the voice industry announced "Voice recognition is ready for primetime", e.g., in an articel from 1999 http://www.ahcmedia.com/articles/117677-is-voice-recognition-ready-for-prime-time. For a long time, I had the impression that there was not much improvement in the NIST ASR benchmark results.
NIST ASR benchmark results

All reported results seemed to be converging to some magic barrier that was still far from the human error rate. Recently, IBM reports on some remarkable improvements employing the switchboard corpus http://arxiv.org/pdf/1505.05899v1.pdf. Although they also rely on DNNs, they outperform current system (~12-14% WER) and claim to achieve a WER of ~8%. So we are coming closer to human performance.
It actually took some some time until speech really took of. The biggest advancements were clearly made with the advent of deep learning.

This seems not to be really true for NLU. Manning states in http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00239 that computational linguists should not worry since NLU never perceived this breakthrough when deep learning was applied.

Nevertheless, people started to realize the recent advancements. Especially Apple and Google did a good job in making it publicly available and usable. A recent survey from Parks associates shows that  speech products are used by more than 39% of smartphone users (http://www.parksassociates.com/360view/360-mobile-2015). Here, about 50% of Apple users are using it, while only around 30% of Android phone users are using voice. The researchers state that "Among smartphone users ages 18-24, 48% use voice recognition software, and use of the “Siri” voice recognition software among iPhone users increased from 40% to 52% between 2013 and 2015. This translates into 15% of all U.S. broadband households using Siri.". So, the coming generation seem to appreciate the use of voice control.
http://mobilemarketingwatch.com/voice-recognition-on-the-rise-parks-associates-report-shows-40-percent-of-u-s-smartphone-owners-use-it-65002/ 

Maybe, the speech industry made their promise for too long now, that voice is ready for primetime. Now, the gain in performance seems to be reflected in actual usage. And it is increasing...

Keine Kommentare:

Kommentar veröffentlichen