Speech Recognition Is Now Nearing Conversational Accuracy As Security Researcher Use Speech Recognition To Defeat reCAPTCHA
ByIBM announced recently that it broke industry records involving speech recognition by extending deep learning technologies. The result is a technology that recognizes spoken words with only a 5.5 percent word error rate.
In normal human conversations, it is common that there would be times when we miss a word or two. It is only when we cannot connect, or would want a confirmation of what we think we heard, do we ask the person we are speaking with to repeat the last phrase he or she has spoken. Imagine how it would be like for a computer.
Last year, IBM achieved a milestone with its AI's conversational speech that garnered a word error rate of 6.9 percent. This year it lowered that error rate to 5.5 percent breakthrough by combined long short-term memory (LSTM) and WaveNet language models alongside three strong acoustic models, Computer Business Review reported.
The process makes the AI learn not only from positive examples but from it also takes advantage of negative ones. Like anyone living who learns, the AI is getting smarter as it goes and it learns and performs better when similar speech patterns repeated.
Last December, the company added diarization to its Watson Speech to Text service, which meant that the AI is now enabled to identify or differentiate individual speakers in a conversation. It has always been an industry goal to have a machine that can reach human parity, which means having an error rate on par with that of actual humans conversing, according to IBM.
In reassessing the industry benchmark, IBM collaborated with speech and technology service provider Appen, which identified that human parity is lower than what anyone has yet achieved, 5.1 percent. IBM's current standing is now very close, it might be long now when we could talk to machines like those that we do with each other every day.
Meanwhile, a security researcher used Google's own speech recognition service to beat its reCAPTCHA field that bypassed the security feature. According to the researcher who goes by the name of "East-EE," there is logic vulnerability within Google's reCAPTCHA field.
The audio challenge in reCAPTCHA can be bypassed. Apparently, this is not the first time reCAPTCHA has been defeated by security researchers. In 2012, it was defeated 70 percent of the time by using deep-learning technology.
The hacker needs to convert the reCAPTCHA audio to a wav file and send it to Google's Speech Recognition API. It will then send a result in written version (string) of the audio challenge. This string is then copied and pasted into the text box, and clicking 'Verify' on the reCAPTCHA widget.