5 March 2008 - 12:01Fight Spam While Digitizing Your Books!

Programmers fromCarnegie Mellon University have created a new service to reduce spam while enabling individuals to digitize books.
The service is called ReCaptcha which is a variation of the commonly used Captcha technique for reducing spam via email or posted blog comments. Users must pass visual pattern recognition tests by reading words that have been obscured or distorted. ReCaptcha enables users to digitize the scanned images containing words the computer can’t decipher.
This adds an element of productivity to Captchas that was non-existent up until now. Ben Maurer, the chief architect of the project and undergraduate at Carnegie Mellon University, recently announced the project on his blog: “Not only can you solve your problems with spam, you can help preserve mankind’s written history into the digital age.”
Luis von Ahn, the “executive producer” of ReCaptcha and assistant professor at Carnegie Mellon revealed the immediate success of the program: “Since the project launched Tuesday, 150 web sites have begun using it. In just the first half of Thursday, the project had digitized 8,000 words.” This is just one great example of how large numbers of individuals can harness their collective energy on the Internet. News sites such as Slashdot and Digg and iStockphoto, a company which sells stock photography are others. Von Ahn estimates that 60 million Captcha tests are completed by individuals every day. Therefore, ReCaptcha can be used to digitize a very large quantity of words. ReCaptcha can also block email addresses from computers that collect them in order to create spam mailing lists.
This is how the service works: users view two words. One is from a conventional Captcha, whereas the other is an unknown word unrecognizable by computerized optical character recognition. When a user correctly identifies the word in the Captcha, the program assumes the individual has also decoded the unknown word. Von Ahn adds that ReCaptcha requires three different people to digitize the same word before the program considers it to be correct.
You can obtain ReCaptcha via an application programming interface that can be integrated into your website. Google Code hosts software plug-ins required to use the API via open-source software packages.
No Comments | Tags: CAPTCHA, Anti Spam