Freitag, 19. Februar 2016

Golden ages for NLU developers?

Currently, the landscape around NLU and AI is booming. Many startups are entering the market, trying to get a foot in the door that seems to be wide open, right now. The following figure, taken from an article at http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/, shows a snapshot of available artificial assistants in October 2015. And it is still growing...
Intelligent Assistance landscape, taken from http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/
Technology is improving rapidly and so are new features and functionality. At the same time, users' expectations towards speech technology grow. However, there are also some voices stating potential drawbacks of the current evolution. This technology "could leave half of the world unemployed" as stated by Moshe Vardi in http://www.theguardian.com/technology/2016/feb/13/artificial-intelligence-ai-unemployment-jobs-moshe-vardi. He expects that AI could wipe out 50% of the middle-class jobs in the next 30 years. He envisions a similar scenario like years ago when automation hit the working class. Now, it could be the middle class. The key lies in cognitive computing. IBM defines it in the context of their whitepaper about IBM Watson as "Cognitive Computing refers to systems that learn at scale, reason with purpose and interact with humans naturally. Rather than being explicitly programmed, they learn and reason from their interactions with us and from their experiences with their environment."

AI technology will change our lives for sure, and it is already doing. Vardi's scenario is not out of the world but it is only one scenario. It is on us what we make out of it.

For now, this change leaves developers that are interested in playing around with this technology with a multitude of frameworks that may be used for free. There are so many players in this field that startups need to gain momentum. A common strategy is to open their API to the public with just a registration. No fee. I already mentioned some in my post about Nuance to open their NLU platform.

It is great to play around with speech technology, but it has also several risks to rely upon a certain supplier. Will the startup still be there when my product is ready for the market? Will the supplier change the conditions after they gained sufficient momentum? ...

The last point happened, e.g., with Maluuba. They used to provide developers access to their system with open source code that they had on GitHub. Maluuba removed the repository from GitHub, but there are still some fragments of their napi in the Internet (Who is in charge of cleaning the internet?). It compiles but it requires registration at the Maluuba developer site which has been shut down.

Don't get me wrong. This is completely OK if you want to earn money. They have a great product and they made it from a startup to a global player in a very short time. I just showcases the risks of developers who want to develop products based on these offerings.

It looks like golden ages for NLU developers that want to play around with this technology, but it may be safer to rethink when you are going for actual products. This makes the landscapes much smaller.

Mittwoch, 3. Februar 2016

Almost 20th anniversary of "Voice recognition is ready for primetime"

As long as I can remember, the voice industry announced "Voice recognition is ready for primetime", e.g., in an articel from 1999 http://www.ahcmedia.com/articles/117677-is-voice-recognition-ready-for-prime-time. For a long time, I had the impression that there was not much improvement in the NIST ASR benchmark results.
NIST ASR benchmark results

All reported results seemed to be converging to some magic barrier that was still far from the human error rate. Recently, IBM reports on some remarkable improvements employing the switchboard corpus http://arxiv.org/pdf/1505.05899v1.pdf. Although they also rely on DNNs, they outperform current system (~12-14% WER) and claim to achieve a WER of ~8%. So we are coming closer to human performance.
It actually took some some time until speech really took of. The biggest advancements were clearly made with the advent of deep learning.

This seems not to be really true for NLU. Manning states in http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00239 that computational linguists should not worry since NLU never perceived this breakthrough when deep learning was applied.

Nevertheless, people started to realize the recent advancements. Especially Apple and Google did a good job in making it publicly available and usable. A recent survey from Parks associates shows that  speech products are used by more than 39% of smartphone users (http://www.parksassociates.com/360view/360-mobile-2015). Here, about 50% of Apple users are using it, while only around 30% of Android phone users are using voice. The researchers state that "Among smartphone users ages 18-24, 48% use voice recognition software, and use of the “Siri” voice recognition software among iPhone users increased from 40% to 52% between 2013 and 2015. This translates into 15% of all U.S. broadband households using Siri.". So, the coming generation seem to appreciate the use of voice control.
http://mobilemarketingwatch.com/voice-recognition-on-the-rise-parks-associates-report-shows-40-percent-of-u-s-smartphone-owners-use-it-65002/ 

Maybe, the speech industry made their promise for too long now, that voice is ready for primetime. Now, the gain in performance seems to be reflected in actual usage. And it is increasing...