Posts mit dem Label Artificial Intelligence werden angezeigt. Alle Posts anzeigen
Posts mit dem Label Artificial Intelligence werden angezeigt. Alle Posts anzeigen

Sonntag, 17. Juli 2016

NLU is not a User Interface

Some time ago I already spoke about NLU vs Dialogmanagement. My hope was that people working in NLU and Voice User Interface design would start talking to each other. I enhanced these ideas in a paper submitted to the IUI Workshop on Interacting with Smart Objects: NLU vs. Dialog Management: To Whom am I Speaking? In essence "Dialogmangement-centered  systems  are  principally  constrained  because they anticipate the users input as plans to help them to achieve their goal.   Depending  on  the  implemented  dialog  strategy they allow for different degrees of flexibility. NLU-centered systems see the central point in the semantics of the utterance, which should also be grounded with previous utterances or external content.  Thus, whether speech or not, NLU regards this as a stream of some input to produce some output. Since no dialog model is employed,  resulting user interfaces currently do not handle much more than single queries".

Actual dialog systems must go beyond this and combine knowledge from both research domains to provide convincing user interfaces.

Now, I stumbled across a blog entry from Matthew Honnibal who bemoans the current hype around artificial intelligence and the ubiquitous promise for more natural user interfaces. He is right that voice simply is another user interface. He states:  "My point here is that a linguistic user interface (LUI) is just an interface. Your application still needs a conceptual model, and you definitely still need to communicate that conceptual model to your users. So, ask yourself: if this application had a GUI, what would that GUI look like?"

He continues with mapping the spoken input to method calls along with their parameters. Then, he concludes: "The linguistic interface might be better, or it might be worse. It comes down to design, and your success will be intimately connected to the application you’re trying to build."

This is exactly the point where voice user interface design comes into play. Each modality requires special design  knowledge for effective interfaces.  Matthew Honnibal seems neither be aware of the term VUI nor of the underlying aproaches and concepts. Maybe, it is time to rediscover it to build better voice-based interfaces employing state-of-the-art NLU technology.

Dienstag, 5. Juli 2016

AI and the Need for Human Values

Stuart Russel, professor in Berkeley  and who wrote the standard book on artificial intelligence with Peter Norvig, speculates  about the future of AI.

He has no doubts that AI will change the world. "In future, AI will increasingly help us live our lives", he said, "driving our cars and acting as smart virtual assistants that know our likes and dislikes and that will manage our day." The technology is already there that is more accurate in analyzing and monitoring a plethora of documents to forecast events or provide us with hints to make our lives easier. "Looking further ahead, it seems there are no serious obstacles to AI making progress until it reaches a point where it is better than human beings across a wide range of tasks."

In the Best of all cases "[W]we could reach a point, perhaps this century, where we're no longer constrained by our difficulties in feeding ourselves and stopping each other from killing people, and instead decide how we want the human race to be."

Ob the other side he also sees a great danger. Autonomous weapons may reveal as great threat. "Five guys with enough money can launch 10 million weapons against a city," he said.

He demands serious plans how to core with that. "A system that's superintelligent is going to find ways to achieve objectives that you didn't think of. So it's very hard to anticipate the potential problems that can arise." Therfore, "there will be a need to equip AI with a common sense understanding of human values."

He suggests the only absolute objective of autonomous robots should be the maximising the values of humans as a species.

This is all well said. But did the human species already reach a point to agree upon a common set of values? Who will be the one to decide how we want the human race to be? How would we teach those to AI? I fear that this remains a nice vision and that we will reach the point where we would have needed such an integration of values too early.

Montag, 4. April 2016

The Other Data Problem of Machine Learning

There is one big problem that machine learning usually faces: The acquisition of data. This has been one of the bigger hindrances to train speech recognizers for quite some time. A nice read in this context is a blog from Arthur Chan from seven years ago, where he explains his thought on true open source dictation: http://arthur-chan.blogspot.de/2009/04/do-we-have-true-open-source-dictation.html

This problem increased, when deep learning entered the scene of speech recognition. More and more data is needed to create convincing systems. The story continues with spoken dialog management. Apple seems to want to make a step forward in this direction with the acquisition of VocalIQ:  http://www.patentlyapple.com/patently-apple/2015/10/apple-has-acquired-vocal-iq-a-company-with-amazing-focus-on-a-digital-assistant-for-the-autonomous-car-beyond.html
All news tried to see this in the light of Apple's efforts towards integration into the automotive market. CarPlay http://www.apple.com/ios/carplay/ to display apps on the dashboard and what some people call iCar http://www.pcadvisor.co.uk/new-product/apple/apple-car-rumours-what-on-earth-is-icar-3626110/ were recently in the news.
I am not sure if there really is such a relation. It might be useful for Siri as well. Adaptive dialogs have been a research topic for some years, now. Maybe, it is time for this technology to address a broader market.

So far, Apple seemed to be reluctant with regard to learned dialog behavior. In the end, these processes cannot guarantee a promised behavior. This is also one of the main reasons, why this technology is not adopted as fast as in other fields where (deep) learning entered the scene. Pieraccini and Huerta describe this problem in Where do we go from here? Research and commercial spoken dialog systems as the VUI-completeness principle. They describe it as "the behavior of an application needs to be completely specified with respect to every possible situation that may arise during the interaction. No unpredictable user input should ever lead to unforeseeable behavior. Only two outcomes are acceptable, the user task is completed, or a fallback strategy is activated..." This quality measure has been established throughout years and is not available with statistical learning of the dialog strategy. In essence, this fear can be described as follows: Let's assume the user is asking "Hey, what is the weather like in Germany?". In (the very unlikely case) that it is in the data, the system may have learned that a good answer to this could be "Applepie".

Consequently, the data to train the system has to be selected and filtered. Sometimes, such a lack is discovered while the system is running. Usually, this is the worst case scenario. Recently, this happened to Apple's Siri. A question to Siri where to hide a dead body became evidence in a murder trial. Siri actually came up with some answers.
Screenshot of Siri 's answer to hide a body 
Now, it has been corrected and Siri simply answers "I used to be able to answer this question.".

Similarly, Microsoft was in the news with its artificial agent Tay. Tay was meant to learn while people were interacting with it. It took less than 24 hours from the statement "Humans are super cool" to "“Hitler was right.”. Data was coming more or less unfiltered from hackers aiming to shape this attitude of Tay.

Evolvement of Tay on Twitter, from https://twitter.com/geraldmellor/status/712880710328139776/photo/1

Again, the base problem is in the ethics of the data: selection and filtering. But what are the correct settings for that? Who is in charge of determining the playground? Usually, this is the engineer developing the system (and thus his ethical background).
This "other problem of machine learning" seems to be not in the focus of those developing machine learning systems. Usually, they are busy with coming up with some data at all to initially train their system at all.

However, this problem is not really new. Think of Isaac Asimov who invented the laws of robotics. He already had the idea of guidance criteria to machine behavior. Maybe, we are in the need to develop something in this light while we move on this road.

And this is also true for spoken dialog systems that actively learn their behavior from usage as adaptive dialogs. It will be awkward to see learning systems out there that change their behavior to something that was never intended by the developer. I am waiting for those headlines.

Freitag, 19. Februar 2016

Golden ages for NLU developers?

Currently, the landscape around NLU and AI is booming. Many startups are entering the market, trying to get a foot in the door that seems to be wide open, right now. The following figure, taken from an article at http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/, shows a snapshot of available artificial assistants in October 2015. And it is still growing...
Intelligent Assistance landscape, taken from http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/
Technology is improving rapidly and so are new features and functionality. At the same time, users' expectations towards speech technology grow. However, there are also some voices stating potential drawbacks of the current evolution. This technology "could leave half of the world unemployed" as stated by Moshe Vardi in http://www.theguardian.com/technology/2016/feb/13/artificial-intelligence-ai-unemployment-jobs-moshe-vardi. He expects that AI could wipe out 50% of the middle-class jobs in the next 30 years. He envisions a similar scenario like years ago when automation hit the working class. Now, it could be the middle class. The key lies in cognitive computing. IBM defines it in the context of their whitepaper about IBM Watson as "Cognitive Computing refers to systems that learn at scale, reason with purpose and interact with humans naturally. Rather than being explicitly programmed, they learn and reason from their interactions with us and from their experiences with their environment."

AI technology will change our lives for sure, and it is already doing. Vardi's scenario is not out of the world but it is only one scenario. It is on us what we make out of it.

For now, this change leaves developers that are interested in playing around with this technology with a multitude of frameworks that may be used for free. There are so many players in this field that startups need to gain momentum. A common strategy is to open their API to the public with just a registration. No fee. I already mentioned some in my post about Nuance to open their NLU platform.

It is great to play around with speech technology, but it has also several risks to rely upon a certain supplier. Will the startup still be there when my product is ready for the market? Will the supplier change the conditions after they gained sufficient momentum? ...

The last point happened, e.g., with Maluuba. They used to provide developers access to their system with open source code that they had on GitHub. Maluuba removed the repository from GitHub, but there are still some fragments of their napi in the Internet (Who is in charge of cleaning the internet?). It compiles but it requires registration at the Maluuba developer site which has been shut down.

Don't get me wrong. This is completely OK if you want to earn money. They have a great product and they made it from a startup to a global player in a very short time. I just showcases the risks of developers who want to develop products based on these offerings.

It looks like golden ages for NLU developers that want to play around with this technology, but it may be safer to rethink when you are going for actual products. This makes the landscapes much smaller.

Montag, 28. Dezember 2015

Smart Interaction Beyond the Smartphone

A similar opinion as stated in my last blog about Smartphones as Explicit Devices do not meet Weiser's Vision of Ubiquitous Computing stated Jennifer Winter in a blog  for user testing. In her blog Will AI Replace Your Smartphone? she bemoaned the bad user experience of smartphones requiring their users to pick up their phones, start an application to actually go for what they really want.

She sees more user friendliness in those interactions that disappear (similar to Weiser's vision). And there is some potential in it, despite the fact that users become addicted to their smartphones. Jennifer Winter mentions a statement from the Mobile World Congress in Barcelona in March 2015, that 79 percent of smartphone users have their phones within arm's reach for all but three hours of the day.

However, users see alternatives to that as a study by the Ericsson's Consumer Lab revealed.The study states: "1 in 2 smartphones users now thinks that smartphones will be a thing of the past, and that this will happen in just 5 years." Users want smart interaction with objects, but it does not need to be mediated by the smartphone. The study reveals even more potential about the use of artificial intelligence in our lives:
  • 85 % think wearable electronic assistants will be common within 5 years
  • 50 % believe they will be able to talk to household appliances, as they do to people
  • ...
More statements are in the following figure of that study.
Consumers who think using artificial intelligence (AI) would be a good idea 


It is unfortunate. that the study does not differentiate between AI and its interface. And so does Jennifer Winter. The technology behind the stuff we have with smartphones today is for sure artificial intelligence, cognitive computing, ... The difference lies in the interface which should (and will) disappear in the next few years.

Mittwoch, 16. Dezember 2015

Nuance opens their NLU to developers

Nuance just opened their NLU platform as a beta to developers https://developer.nuance.com/public/index.php?task=mix

It is more than simply NLU but a full stack including speech recognition that can be used in own applications as shown in their promotion video.


Similar to efforts of NLU startups residing under .ai, Nuance Mix is able to detect an intent and user defined entities from entered sentences. The possibility to also employ Nuance ASR, however, makes it more complete than those efforts. Maybe, this has to be seen as an attempt to strenghten Nuance approach to a virtual assistant that they call Nina. Nina has been out for a while but did not receive much attention so far.
The market of virtual assistants is already somewhat populated. Google Now and Apple's Siri are well known and established. Others, like Microsoft's  Cortana also try to gain attraction. Recently, Microsoft opened Project Oxford as a cloud - based tool for the creation of smart (voice centric) applications. A comparable, but maybe more advanced offer is IBM Watson which is available for some time. Another one is Amazon Echo that also opened their platform to developers.

It appears that spoken language technology is mature enough to be really useful. Good news for developers who want to play around with voice interaction to control applications in the internet of things. Currently, there is a plethora of SDKs available that can be used for free. The current question is not if we will see more spoken interaction with everyday things in our lives, but who will win the race for a sufficient number of users and their data. Maybe, Nuance is already too late with Nuance Mix to enter that market. Maybe, they can step in nevertheless, relying on their year-long dominance in speech recognition

Freitag, 11. Dezember 2015

NLU vs Dialog Management

Recently, I stumbled across a blog from api.ai that their system now supports slot filling: https://api.ai/blog/2015/11/09/SlotFilling/. Note, that my goal is not on blaming their system.

Currently, I observe that efforts towards spoken interaction coming from cognitive computing are still not fully aware of what has been done in dialog management research in the past decades, and vice versa. Both parties are coming from different centers in the chain of spoken dialog systems.
While the AI community usually focuses on natural language understanding (linguistic analysis) the spoken dialog community focuses on the dialog manager as the central point in this chain.
Both have good reasons for their attitude and are able to deliver convincing results.

Cognitive computing sees the central point in the semantics which should also be grounded with previous utterances or external content. Speech input and output is in this view restricted to be some input into the system and some output. Dialog management can be really dumb in this case. Resulting user interfaces are currently more or less based on queries.

The dialog manager focused view regards the NLU to be some input into the system while the decision upon subsequent interaction is being handled in this component. Resulting user interfaces range from rigid state-based approaches over the information state update approach up to statistical motivated dialog managers like POMPD.

My hope is, that both communities start talking to each other to better incorporate convincing results of "the other component" to arrive at a convincing user experience.



Samstag, 5. Dezember 2015

Microsoft researchers expect human like capabilities of spoken systems in a few years

Researchers at Microsoft believe that we are only a few years away from equal capabilities of machines to understand spoken language as humans do. Although many advances have been made in the past years there are still too many challenges that need to be solved.This is especially true for distant speech recognition that we need to cope with in daily situations.  Maybe, their statement is still a bit too optimistic. However, as systems are available already and people start using it they are right in their assumption that these systems will make progress. We just have to make sure that voice based assistants like Cortana are used at all. Currently some of these systems seem to be more a gimmick to play with until users become bored of it. Hence, they are actually dammed to improve fast to also be helpful.