Where Siri Has Trouble Hearing, a Crowd of Humans Could Help
A program called Scribe harnesses humans on the Internet to generate speech captions in under five seconds.
This rapid-fire crowd-computing experiment could be a big help for deaf and hearing-impaired people. It also could also provide new ways to enhance voice recognition applications like Siri in areas where they struggle.
Scribe’s algorithms direct human workers to type out fragments of what they hear in a speech. By turning up the volume or slowing down the speed of slices of the audio, the program can direct different workers to unique but overlapping sections of a speech and then give them a few seconds to recover before asking them to type again.
Using natural-language processing algorithms, Scribe strings together the typed-out fragments into a complete transcript, and the redundant overlaps can help it weed out errors. (This shotgun computing technique is similar to the way many DNA sequencing machines work, Bigham points out.) It can produce a transcript or caption with a delay as short as three seconds using just three to five workers.
The only requirement is that the workers can hear and type, so even as a group, they cost less than a stenographer and don’t need days of advance notice, he notes. That could be a big help for a deaf student who wants to, say, take a new online class that hasn’t been captioned.
Bigham (see “Innovators Under 35, 2009: Jeffrey Bigham”) and his University of Rochester colleague Walter Lasecki have tested Scribe with laborers they found through Amazon’s Mechanical Turk, where people sign up to perform simple tasks. Those workers were paid a minimum of $6 an hour by Bigham’s team. The team also hired undergraduate work-study students for $10 an hour. The crowdsourced work of people in both groups appears to be only slightly less accurate than that of a professional stenographer, Bigham says. And in some cases, the pooled workers more accurately transcribed jargon terms that a single professional typist might mishear.
“What Scribe is starting to show is the ability to work together as part of a crowd to do very difficult performance tasks better than a person can do alone,” he says.
Bigham is now developing Scribe into an app that he hopes could help deaf people crowdsource transcripts quickly. To support a large number of users, he is also considering licensing the technology or spinning off a startup.
It’s not the first time someone has thought of using cheap, computer-coӧrdinated human labor to bolster the traditional weaknesses in artificial intelligence programs or other software. Twitter is hiring people on Mechanical Turk to help its search engine classify newsy topics that suddenly start trending. Bigham also has created a crowdsourced personal-assistance system called Chorus (see “Artificial Intelligence, Powered By Many Humans”) that could be smarter than Siri but cheaper than any individual hourly employee.
This is not to say that human labor will always outperform automated systems at transcribing speech. Aditya Parameswaran, a researcher at Stanford University who also works on human-assisted computation methods, says that as learning algorithms improve, crowdsourcing techniques like these will be useful mostly for augmenting the computers’ accuracy, rather than for having humans do the bulk of the work.