Free Software Shop logo

Free Software Shop

Project Honey Pot - The Good, The Bad And The Bots


Free Software Shop Internet Topic: Internet
Date: Sat, Apr 04 @ 10:09 PM

Project Honey Pot is a distributed network of decoy web pages built to gather information about mailicious robots, crawlers and spiders. Honey Pot collects data on harvesters, dictionary attackers and comment or email spammers. This information is made available to help website administrators keeping bad bots from their sites and detect spam activity. And you can help them.

One way to help is, if you're a website administrator, to install a honey pot on your own site. A honey pot as used by this project is (usually a part of) a site filled with spam trap email addresses and other traps to attract and detect bad bots. These pages are linked from internal and external sources in a way not visible for human visitors (a 1x1 pixel image for instance), to prevent false positives. Bots with good intentions are kept away from these pages via robots.txt, a small file in the root directory of websites filled with rules where spiders, bots and crawlers are and aren't allowed to visit. Additionally, metatags on the pages indicate the content should neither be indexed nor links should be followed.
99% of the bots playing by the rules won't end up on these pages. Those who do obviously have a hard time respecting the rules set out for them and can be indicated as bad bots with no good intentions. IP-addresses where these bots operate from are gathered by all the honey pot project participants.
The fake email adresses specially crafted and published for spam bot trapping make it possible to track where and when a harvester was active gathering email addresses for increasing their database with to be spammed email addresses.

If you're not a website administrator you can still help, by adding quicklinks. QuickLinks are the links which allow bad spiders to find the honeypot pages on other websites. If you can post these links you help trapping them. The more inbound links to a honey pot, the more effective it will be.

The data gathered by the honey pot project is used to maintain a blacklist, which helps website owners to deny the bad guys access or give them a special treatment on their website. API's and plugins exist for integration with DNS or webserver software (Apache).

I should notice I haven't looked into depth how they handle false positives. They use whitelisting and blacklisting rules to counter the possibility of falsely blacklisting people who use dynamic IP's or an anonymous proxy server. I'm pretty sure the bad guys use these as well. The amount of time for blacklisting varies by the amount of reports received for any particular bad behaviour, starting from 5 seconds and up.

More info: Project Honey Pot


Source: http://freesoftwareshop.org/forum/article.php?sid=507