Schnelle-Walka: NLU

Posts mit dem Label NLU werden angezeigt. Alle Posts anzeigen

Freitag, 19. Februar 2016

Golden ages for NLU developers?

Currently, the landscape around NLU and AI is booming. Many startups are entering the market, trying to get a foot in the door that seems to be wide open, right now. The following figure, taken from an article at http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/, shows a snapshot of available artificial assistants in October 2015. And it is still growing...

Intelligent Assistance landscape, taken from http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/

Technology is improving rapidly and so are new features and functionality. At the same time, users' expectations towards speech technology grow. However, there are also some voices stating potential drawbacks of the current evolution. This technology "could leave half of the world unemployed" as stated by Moshe Vardi in http://www.theguardian.com/technology/2016/feb/13/artificial-intelligence-ai-unemployment-jobs-moshe-vardi. He expects that AI could wipe out 50% of the middle-class jobs in the next 30 years. He envisions a similar scenario like years ago when automation hit the working class. Now, it could be the middle class. The key lies in cognitive computing. IBM defines it in the context of their whitepaper about IBM Watson as "Cognitive Computing refers to systems that learn at scale, reason with purpose and interact with humans naturally. Rather than being explicitly programmed, they learn and reason from their interactions with us and from their experiences with their environment."

AI technology will change our lives for sure, and it is already doing. Vardi's scenario is not out of the world but it is only one scenario. It is on us what we make out of it.

For now, this change leaves developers that are interested in playing around with this technology with a multitude of frameworks that may be used for free. There are so many players in this field that startups need to gain momentum. A common strategy is to open their API to the public with just a registration. No fee. I already mentioned some in my post about Nuance to open their NLU platform.

It is great to play around with speech technology, but it has also several risks to rely upon a certain supplier. Will the startup still be there when my product is ready for the market? Will the supplier change the conditions after they gained sufficient momentum? ...

The last point happened, e.g., with Maluuba. They used to provide developers access to their system with open source code that they had on GitHub. Maluuba removed the repository from GitHub, but there are still some fragments of their napi in the Internet (Who is in charge of cleaning the internet?). It compiles but it requires registration at the Maluuba developer site which has been shut down.

Don't get me wrong. This is completely OK if you want to earn money. They have a great product and they made it from a startup to a global player in a very short time. I just showcases the risks of developers who want to develop products based on these offerings.

It looks like golden ages for NLU developers that want to play around with this technology, but it may be safer to rethink when you are going for actual products. This makes the landscapes much smaller.

Mittwoch, 3. Februar 2016

Almost 20th anniversary of "Voice recognition is ready for primetime"

As long as I can remember, the voice industry announced "Voice recognition is ready for primetime", e.g., in an articel from 1999 http://www.ahcmedia.com/articles/117677-is-voice-recognition-ready-for-prime-time. For a long time, I had the impression that there was not much improvement in the NIST ASR benchmark results.

NIST ASR benchmark results

All reported results seemed to be converging to some magic barrier that was still far from the human error rate. Recently, IBM reports on some remarkable improvements employing the switchboard corpus http://arxiv.org/pdf/1505.05899v1.pdf. Although they also rely on DNNs, they outperform current system (~12-14% WER) and claim to achieve a WER of ~8%. So we are coming closer to human performance.
It actually took some some time until speech really took of. The biggest advancements were clearly made with the advent of deep learning.

This seems not to be really true for NLU. Manning states in http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00239 that computational linguists should not worry since NLU never perceived this breakthrough when deep learning was applied.

Nevertheless, people started to realize the recent advancements. Especially Apple and Google did a good job in making it publicly available and usable. A recent survey from Parks associates shows that speech products are used by more than 39% of smartphone users (http://www.parksassociates.com/360view/360-mobile-2015). Here, about 50% of Apple users are using it, while only around 30% of Android phone users are using voice. The researchers state that "Among smartphone users ages 18-24, 48% use voice recognition software, and use of the “Siri” voice recognition software among iPhone users increased from 40% to 52% between 2013 and 2015. This translates into 15% of all U.S. broadband households using Siri.". So, the coming generation seem to appreciate the use of voice control.

http://mobilemarketingwatch.com/voice-recognition-on-the-rise-parks-associates-report-shows-40-percent-of-u-s-smartphone-owners-use-it-65002/

Maybe, the speech industry made their promise for too long now, that voice is ready for primetime. Now, the gain in performance seems to be reflected in actual usage. And it is increasing...

Freitag, 11. Dezember 2015

NLU vs Dialog Management

Recently, I stumbled across a blog from api.ai that their system now supports slot filling: https://api.ai/blog/2015/11/09/SlotFilling/. Note, that my goal is not on blaming their system.

Currently, I observe that efforts towards spoken interaction coming from cognitive computing are still not fully aware of what has been done in dialog management research in the past decades, and vice versa. Both parties are coming from different centers in the chain of spoken dialog systems.

While the AI community usually focuses on natural language understanding (linguistic analysis) the spoken dialog community focuses on the dialog manager as the central point in this chain.

Both have good reasons for their attitude and are able to deliver convincing results.

Cognitive computing sees the central point in the semantics which should also be grounded with previous utterances or external content. Speech input and output is in this view restricted to be some input into the system and some output. Dialog management can be really dumb in this case. Resulting user interfaces are currently more or less based on queries.

The dialog manager focused view regards the NLU to be some input into the system while the decision upon subsequent interaction is being handled in this component. Resulting user interfaces range from rigid state-based approaches over the information state update approach up to statistical motivated dialog managers like POMPD.

My hope is, that both communities start talking to each other to better incorporate convincing results of "the other component" to arrive at a convincing user experience.

Samstag, 5. Dezember 2015

Microsoft researchers expect human like capabilities of spoken systems in a few years

Researchers at Microsoft believe that we are only a few years away from equal capabilities of machines to understand spoken language as humans do. Although many advances have been made in the past years there are still too many challenges that need to be solved.This is especially true for distant speech recognition that we need to cope with in daily situations. Maybe, their statement is still a bit too optimistic. However, as systems are available already and people start using it they are right in their assumption that these systems will make progress. We just have to make sure that voice based assistants like Cortana are used at all. Currently some of these systems seem to be more a gimmick to play with until users become bored of it. Hence, they are actually dammed to improve fast to also be helpful.

http://news.microsoft.com/features/speak-hear-talk-the-long-quest-for-technology-that-understands-speech-as-well-as-a-human/

Schnelle-Walka