Schnelle-Walka: Personal Assistants

Posts mit dem Label Personal Assistants werden angezeigt. Alle Posts anzeigen

Dienstag, 5. Juli 2016

AI and the Need for Human Values

Stuart Russel, professor in Berkeley and who wrote the standard book on artificial intelligence with Peter Norvig, speculates about the future of AI.

He has no doubts that AI will change the world. "In future, AI will increasingly help us live our lives", he said, "driving our cars and acting as smart virtual assistants that know our likes and dislikes and that will manage our day." The technology is already there that is more accurate in analyzing and monitoring a plethora of documents to forecast events or provide us with hints to make our lives easier. "Looking further ahead, it seems there are no serious obstacles to AI making progress until it reaches a point where it is better than human beings across a wide range of tasks."

In the Best of all cases "[W]we could reach a point, perhaps this century, where we're no longer constrained by our difficulties in feeding ourselves and stopping each other from killing people, and instead decide how we want the human race to be."

Ob the other side he also sees a great danger. Autonomous weapons may reveal as great threat. "Five guys with enough money can launch 10 million weapons against a city," he said.

He demands serious plans how to core with that. "A system that's superintelligent is going to find ways to achieve objectives that you didn't think of. So it's very hard to anticipate the potential problems that can arise." Therfore, "there will be a need to equip AI with a common sense understanding of human values."

He suggests the only absolute objective of autonomous robots should be the maximising the values of humans as a species.

This is all well said. But did the human species already reach a point to agree upon a common set of values? Who will be the one to decide how we want the human race to be? How would we teach those to AI? I fear that this remains a nice vision and that we will reach the point where we would have needed such an integration of values too early.

Montag, 4. April 2016

The Other Data Problem of Machine Learning

There is one big problem that machine learning usually faces: The acquisition of data. This has been one of the bigger hindrances to train speech recognizers for quite some time. A nice read in this context is a blog from Arthur Chan from seven years ago, where he explains his thought on true open source dictation: http://arthur-chan.blogspot.de/2009/04/do-we-have-true-open-source-dictation.html

This problem increased, when deep learning entered the scene of speech recognition. More and more data is needed to create convincing systems. The story continues with spoken dialog management. Apple seems to want to make a step forward in this direction with the acquisition of VocalIQ: http://www.patentlyapple.com/patently-apple/2015/10/apple-has-acquired-vocal-iq-a-company-with-amazing-focus-on-a-digital-assistant-for-the-autonomous-car-beyond.html

All news tried to see this in the light of Apple's efforts towards integration into the automotive market. CarPlay http://www.apple.com/ios/carplay/ to display apps on the dashboard and what some people call iCar http://www.pcadvisor.co.uk/new-product/apple/apple-car-rumours-what-on-earth-is-icar-3626110/ were recently in the news.

I am not sure if there really is such a relation. It might be useful for Siri as well. Adaptive dialogs have been a research topic for some years, now. Maybe, it is time for this technology to address a broader market.

So far, Apple seemed to be reluctant with regard to learned dialog behavior. In the end, these processes cannot guarantee a promised behavior. This is also one of the main reasons, why this technology is not adopted as fast as in other fields where (deep) learning entered the scene. Pieraccini and Huerta describe this problem in Where do we go from here? Research and commercial spoken dialog systems as the VUI-completeness principle. They describe it as "the behavior of an application needs to be completely specified with respect to every possible situation that may arise during the interaction. No unpredictable user input should ever lead to unforeseeable behavior. Only two outcomes are acceptable, the user task is completed, or a fallback strategy is activated..." This quality measure has been established throughout years and is not available with statistical learning of the dialog strategy. In essence, this fear can be described as follows: Let's assume the user is asking "Hey, what is the weather like in Germany?". In (the very unlikely case) that it is in the data, the system may have learned that a good answer to this could be "Applepie".

Consequently, the data to train the system has to be selected and filtered. Sometimes, such a lack is discovered while the system is running. Usually, this is the worst case scenario. Recently, this happened to Apple's Siri. A question to Siri where to hide a dead body became evidence in a murder trial. Siri actually came up with some answers.

Screenshot of Siri 's answer to hide a body

Now, it has been corrected and Siri simply answers "I used to be able to answer this question.".

Similarly, Microsoft was in the news with its artificial agent Tay. Tay was meant to learn while people were interacting with it. It took less than 24 hours from the statement "Humans are super cool" to "“Hitler was right.”. Data was coming more or less unfiltered from hackers aiming to shape this attitude of Tay.

Evolvement of Tay on Twitter, from https://twitter.com/geraldmellor/status/712880710328139776/photo/1

Again, the base problem is in the ethics of the data: selection and filtering. But what are the correct settings for that? Who is in charge of determining the playground? Usually, this is the engineer developing the system (and thus his ethical background).
This "other problem of machine learning" seems to be not in the focus of those developing machine learning systems. Usually, they are busy with coming up with some data at all to initially train their system at all.

However, this problem is not really new. Think of Isaac Asimov who invented the laws of robotics. He already had the idea of guidance criteria to machine behavior. Maybe, we are in the need to develop something in this light while we move on this road.

And this is also true for spoken dialog systems that actively learn their behavior from usage as adaptive dialogs. It will be awkward to see learning systems out there that change their behavior to something that was never intended by the developer. I am waiting for those headlines.

Mittwoch, 16. März 2016

Google's Offline Personal Assistant

In June 2015, there were some first rumours that few commands for Google Now would be available even when you are offline. An APK teardown of the Google app reported at
http://www.androidpolice.com/2015/06/27/apk-teardown-google-app-v4-8-prepares-for-ok-google-offline-voice-commands-to-control-volume-and-brightness-and-much-more/ revealed some string resources that hinted on this.

<string name="offline_header_text">Offline voice tips</string>
<string name="offline_on_start_cue_cards_header_listening">Offline</string>
<string name="offline_on_start_cue_cards_header_timeout">Offline voice tips</string>
<string name="offline_on_start_cue_cards_second_header_listening">You can still say...</string>
<string name="offline_on_start_cue_cards_second_header_timeout">You can still say "Ok Google," then...</string>
<string name="offline_on_start_cue_cards_second_header_timeout_without_hotword">You can still touch the mic, then say...</string>
<string name="offline_options_start_hotword_disabled">You can still touch the mic, then say...</string>
<string name="offline_options_start_hotword_enabled">You can still say "Ok Google," then...</string>
<string name="offline_error_card_title_text">Something went wrong.</string>
<string name="error_offline_no_connectivity">Check your connection and try again.</string>

So far, Google Now required an online connection to work. Since this is not always a given it is beneficial to have a workaround in these cases. They found the following four options:

Make a call
Send a text
Play some music
Turn on Wi-Fi

Usually, such a tear-down is more of the kind of a rumor than facts. In this case, these rumors proved to be true. In September 2015, this functionality was made available as reported, e.g., on http://www.androidpolice.com/2015/09/28/the-google-android-app-now-supports-limited-voice-commands-for-offline-use/. The way this is reflected on the UI is shown on the following picture

Google Now in offline mode,taken from http://www.androidpolice.com/2015/09/28/the-google-android-app-now-supports-limited-voice-commands-for-offline-use/

So, some commands are also available, when you are offline. So far, this list is larger than the original one, but limited to

Play Music
Open Gmail (works with any app name on the device)
Turn on Wi-Fi
Turn up the volume
Turn on the flashlight
Turn on airplane mode
Turn on Bluetooth
Dim the screen

Unfortunately, this is only true for the English version. For instance, it refuses to work in German, even if the English offline recognition is downloaded to the device. Neither English or German commands will work. Instead, the following screen is shown.

Google Now in offline mode for German on my Samsung Galaxy 5

Yet, it is unclear, when this will work. But, Google seems to be advancing their embedded technology as reported in Personalized Speech Recognition on Mobile Devices. Here, they describe a remarkable speed-up of their embedded speech recognizer. They state their newest technology "...provides a 2× speed-up in evaluating our acoustic models as compared to the unquantized model, with only a small performance degredation". The recognition performance for open ended dictation in an open domain WER increased from 12.9% to 13.5%. Moreover, they report a decrease of the footprint. Their acoustic model "...is compressed to a tenth of its original size". Apart from that they still feature language model personalization through a combination of vocabulary injection and on-the-fly language model biasing.

In the end, they "built(d) a system which runs 7× faster than real-time on a Nexus 5, with a total system footprint of 20.3 MB"

The latter work only aims for the recognition task as it is available with what they call "Voice Typing". It still needs integration of NLU to make it actual commands to use it for Google Now.

So, Google seems to be on the way for a personal assistant that can also be used if you are not connected to the internet. Some of the commands may make not much sense if you are offline, but some will work, sooner or later. English is supported in first place and it is unclear when other languages will follow. But it is a start.

Freitag, 19. Februar 2016

Golden ages for NLU developers?

Currently, the landscape around NLU and AI is booming. Many startups are entering the market, trying to get a foot in the door that seems to be wide open, right now. The following figure, taken from an article at http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/, shows a snapshot of available artificial assistants in October 2015. And it is still growing...

Intelligent Assistance landscape, taken from http://venturebeat.com/2016/02/14/intelligent-assistance-the-slow-growth-space-that-will-eventually-wow-us/

Technology is improving rapidly and so are new features and functionality. At the same time, users' expectations towards speech technology grow. However, there are also some voices stating potential drawbacks of the current evolution. This technology "could leave half of the world unemployed" as stated by Moshe Vardi in http://www.theguardian.com/technology/2016/feb/13/artificial-intelligence-ai-unemployment-jobs-moshe-vardi. He expects that AI could wipe out 50% of the middle-class jobs in the next 30 years. He envisions a similar scenario like years ago when automation hit the working class. Now, it could be the middle class. The key lies in cognitive computing. IBM defines it in the context of their whitepaper about IBM Watson as "Cognitive Computing refers to systems that learn at scale, reason with purpose and interact with humans naturally. Rather than being explicitly programmed, they learn and reason from their interactions with us and from their experiences with their environment."

AI technology will change our lives for sure, and it is already doing. Vardi's scenario is not out of the world but it is only one scenario. It is on us what we make out of it.

For now, this change leaves developers that are interested in playing around with this technology with a multitude of frameworks that may be used for free. There are so many players in this field that startups need to gain momentum. A common strategy is to open their API to the public with just a registration. No fee. I already mentioned some in my post about Nuance to open their NLU platform.

It is great to play around with speech technology, but it has also several risks to rely upon a certain supplier. Will the startup still be there when my product is ready for the market? Will the supplier change the conditions after they gained sufficient momentum? ...

The last point happened, e.g., with Maluuba. They used to provide developers access to their system with open source code that they had on GitHub. Maluuba removed the repository from GitHub, but there are still some fragments of their napi in the Internet (Who is in charge of cleaning the internet?). It compiles but it requires registration at the Maluuba developer site which has been shut down.

Don't get me wrong. This is completely OK if you want to earn money. They have a great product and they made it from a startup to a global player in a very short time. I just showcases the risks of developers who want to develop products based on these offerings.

It looks like golden ages for NLU developers that want to play around with this technology, but it may be safer to rethink when you are going for actual products. This makes the landscapes much smaller.

Mittwoch, 16. Dezember 2015

Nuance opens their NLU to developers

Nuance just opened their NLU platform as a beta to developers https://developer.nuance.com/public/index.php?task=mix.

It is more than simply NLU but a full stack including speech recognition that can be used in own applications as shown in their promotion video.

Similar to efforts of NLU startups residing under .ai, Nuance Mix is able to detect an intent and user defined entities from entered sentences. The possibility to also employ Nuance ASR, however, makes it more complete than those efforts. Maybe, this has to be seen as an attempt to strenghten Nuance approach to a virtual assistant that they call Nina. Nina has been out for a while but did not receive much attention so far.

The market of virtual assistants is already somewhat populated. Google Now and Apple's Siri are well known and established. Others, like Microsoft's Cortana also try to gain attraction. Recently, Microsoft opened Project Oxford as a cloud - based tool for the creation of smart (voice centric) applications. A comparable, but maybe more advanced offer is IBM Watson which is available for some time. Another one is Amazon Echo that also opened their platform to developers.

It appears that spoken language technology is mature enough to be really useful. Good news for developers who want to play around with voice interaction to control applications in the internet of things. Currently, there is a plethora of SDKs available that can be used for free. The current question is not if we will see more spoken interaction with everyday things in our lives, but who will win the race for a sufficient number of users and their data. Maybe, Nuance is already too late with Nuance Mix to enter that market. Maybe, they can step in nevertheless, relying on their year-long dominance in speech recognition

Schnelle-Walka