If you write a blog or run a forum you need to know about the corporate snoop/spy bots that crawl your website or blog. These bots scrape your information then strip off your link and any info that is not relevant and sell the pertinent info to companies tracking their key words using social CRM platforms. According to the sellers of these platforms they allow companies to listen, engage, and act on their customers’ conversations across the entire social web.
This stripped and packaged info IS NOT available to the public unless they pay to get it. This unauthorized use of your website content also violates your copyright since they are using your work product without your permission. These bots also suck huge amounts of your bandwidth on your server as they crawl your site night after night looking for changes or new content.
Two of the bigger snoop bots are Radian 6 out of Canada www.radian6.com/crawler and Scout Lab. Using robot.txt to block the Radian 6 bot DOES NOT work, you have to email them to be taken off their crawl list. You can block the IPs Radian 6 bots use. Radian 6 uses the IPs 184.108.40.206/16 and 220.127.116.11/16. You can block them using your IP block on your server admin panel or htaccess. If you're not sure how to block IPs contact your website host and they can do it for you in most cases. If you pen a blog you may not have control over this. I believe Word Press has a plug in for blocking IPs and stopping scraping.
More and more of these scraper sites are popping up around the world. Google is reportedly penalizing these public scraper sites in their search index ranking for having duplicate content but I wouldn't leave it to Google to watch my back. The sites like Radian 6 that charge for your content should also not be allowed to make money off your hard work.