Montag, 25. Januar 2016

Silent Speech Recognition



A somewhat interesting read on how we will talk to computers in the near future is here: http://www.wired.com/2015/09/future-will-talk-technology/

Still Smartphones are vision-centric device, as already discussed in http://schnelle-walka.blogspot.com/2015/12/smartphones-as-explicit-devices-do-not.html .Currently, I see people holding their smartphones in front of their face to initiate, e.g., voice queries starting by "OK Google" or similar. So, it is not needed to look at their phone to learn about the manufacturer of their phones. Moreover, since voice is ubiquitous, you will also learn their plans. A more subtle way seems to come with subvocalization. It exploits the fact that people tend to form words without speaking them out loud. Avoiding subvocalization is also one of the tricks to speed up reading http://marguspala.com/speed-reading-first-impressions-are-positive/
Subvalization slows down your reading speed
It is still an ongoing research topic in HCI, but I wonder how mature it is. Will it be useful at all? Or will we get used to people talking to their phones, gadgets or whatever in the same way that we got used to people having a phone call while they are walking?

Another interesting alternative comes with silent speech recognition. Denby et al. define it in Silent Speech Interfaces as: Silent speech recognition systems (SSRSs) enable speech communication to be needed when an audible acoustic signal is unavailable. Usually, these techniques employ brain computer interfaces (BCI) as a source of information. The following figure, taken from Yamaguchi et al., about Decoding Silent Speech in Japanese from Single Trial EEGs: Preliminary Results, are suited to describe the scenario.
Experimental setup for SSI from Yamaguchi et al.


In their research they investigated, among others, how to differentiate the Japanese words for spring, summer, autumn and winter. They were able to proof that this setup works well, but the results are still far from being usable at all.

But does this kind of interface makes sense at all? It might be useful in scenarios where noise is omnipresent and kills all efforts towards traditional speech recognition efforts. A car is one example. However, the apparatus needs to be less intrusive for this.