Thursday, November 13, 2008

Why simply CAPTCHAs are not enough

I was looking at different CAPTCHAs today and reading about the failures their various failures. The problem as it exists is not that we can not find a problem that requires a human to solve. That is only the first peice, because CAPTCHAs as a rule defeat automation and then only part of the time. That leaves computers working in isolation (after a program of hack tool has been created) to succeed only some of the time. This does not preclude hack programs from doing all the busy work and showing a human a CAPTCHA to solve in order to continue.

Imagine the very best case scenario of this - a single person who has developed program to expedite spamming 100 times doesn't need to enter in all 100 complete entries, but instead solve 100 CAPTCHA alone. Now try to imagine the worst case scenario - programmer has developed a program to solve CAPTCHAs automatically by farming them off as CAPTCHAs to his porn site for willing patrons to fill out and verify (similar in a sense to the work of the reCAPTCHA project).

This leads us back to square one, because either we are now battling a potentially less intelligible human or the one who developed the measure to break the CAPTCHA in the first place.

Several solutions as I see it exist to this problem:

1) Variety- Add fresh dose of variety to the CAPTCHA creation. In order to prevent CAPTCHAs from being broken as easily they need to do things that even people are not expecting...at least not all the time. CAPCHTAs can have differing instructions included for how to solve them. "Enter all but the third letter" or "Enter only the vowels from above". They can have pieces that play into solving them that are not in the same place in the page or in the same place on each page. They can be moving (flash based) CAPCHTAs or items that are dynamic in nature.

2) Statistical trustworthiness - People are most trusted when they act like we expect normal people to act. They post up to a certain limit per day. They take a certain amount of time to get things done. They do certain things, and avoid others. By flagging the actions and inactions of trustworthy perople, machines and their related information, we know enough about them to know whether thay should be allowed to create another account, or post that message. We do not have to mine data everywhere to accuire this kind of information, but we do have to assume that anyone that starts with a blank slate could be as wretched as the worst spammer.

3) Private/Public Keys - Implement something like PGP (Pretty Good Privacy) which has a Private/Public Key system to sign documents, require a registration, and use the system to authenticate as a real person. This eliminates privacy just as Statistical Trustworthiness begins to do.

Looking at the kinds of solutions that are possible, the somber conclusion is that Application Service Providers are either going to have to work much harder, or the internet is going to have to become more regulated to keep spam and misuse from becoming a larger threat.

No comments: