Turing, With Audio

16 May 2004 • code | PHP • PermaLink

My article about Turing Protection generated lots of comments about how using image CAPTCHAs restricts access to the visually impaired.

So, I’ve played around a bit, and added an audio component. If you can’t read the CAPTCHA image, you can listen to a .WAV file of our lovely server, spelling out the characters to you.

Again, the code is very straightforward. You just need one new file to generate the file.

You will also need to install some text-to-speech program. The program will need to accept a string to read, and output a sound file, ideally sent to stdout. I searched around for a bit, and flite seemed to mostly fit the bill. It can only output to a file, so we use PHP to generate, stream, then cleanup temporary files.

I also made a slight change to the protection script. Since the sound file doesn’t distinguish between upper and lowercase, and the image is either in all uppercase or all lowercase, I just made the checking routine case-insensitive.

To see it in action, try clicking here. Again, please send your comments and suggestions.

Update: 17-May-2004

Several folks have noted that some of the letters aren’t pronounced very well. Specifically “Q”. This is how flite is generating the sounds, so I suspect that with a small and cheap speech synthesizer, you get what you pay for, so to speak. (“So to speak” ... hah!)

If anyone can suggest a different speech engine, I’ll try that instead. Hopefully, there is a Debian package for it though, since the server I’m using is configured pretty much entirely through packages.

For the record, if your challenge string is “abc123fq”, then the command that gets passed to flite is basically:

/usr/bin/flite -t "a, b, c, 1, 2, 3, f, q."

The commas add a bit of a pause between letters. But it relies on how flite decides to pronounce “q”. I’ve tried spelling out the letters:

/usr/bin/flite -t "aye, bee, see, one, two, eff, kyoo."

That didn’t (to me) produce a better sound than before though. Plus, how do you spell “q”? Kyoo? Keeyou? Queue?

Comments

  1. The ‘Q’ sound like an ‘U’. :)

    -orban
    orban
    16 May 2004, 04:27 • PermaLink
  2. Yes, the Q is very hard to understand – and I got “Q B” in the sequence without a long enough gap to make them distinct.

    The fact that it’s so simple to do amazes me, but it would be nice if the results were clearer :-)
    Peter Bowyer
    16 May 2004, 04:41 • PermaLink
  3. Thanks for the code Colin. I made a few minor modifications to integrate it into my own application. Something I’d like to warn people about: The turing_test session variable may not get set by colins program in time for you to know what it is during the page load that shows the image. But that shouldn’t be a problem as it will know when the user submits that page for the one that checks the code.
    Peter Belt
    31 July 2004, 05:17 • PermaLink
  4. Why don’t you record yourself saying each letter as an .au file, cat the files together (with an appropriate gap file inbetween of course), and convert the file to a wav afterwards?
    Jim
    21 October 2004, 09:37 • PermaLink
  5. anything done into the direction of the last comment?
    Pepino
    13 December 2004, 16:28 • PermaLink
  6. just to followup with the course of the industry:

    Google, MSN, Yahoo united these days in the spam battle against comment spammers

    http://it.slashdot.org/it/05/01/19/0516246.shtml?tid=111&tid=217
    Calin Uioreanu
    19 January 2005, 08:18 • PermaLink
  7. The turing-sound file arrives in my broswer with a filetype of PHP, which won’t play in any audio program. Shouldn’t it arrive as a .wav file?

    Browser if Firefox 1.04.
    bob easton
    28 May 2005, 04:59 • PermaLink
  8. sfsfsdfs

    dsfsdf
    2 February 2011, 19:07 • PermaLink