Web Crawlers

The following URLs are crawlers that have been banned by the IT Skeptic:

65.214.44.29 crawler.bloglines.com - controvertial banning, but they leave error messages everywhere and eat huge resource
128.194.135.94 A hungry crawler from irldotcsdottamudotedu, doing "research". Piss off.
195.175.37.6 out of Turkey
210.150.10.74, 211.125.157.48 out of Japan.
209.128.76.39 USA
218.124.48.22 Japanese. Big resource eaters. Look like spammers
220.181.19.163 Chinese Sogou spider corp(dot)sohu(dot)com(slash)20051130(slash)n240842344(dot)shtml .
220.181.34.161 Dunno who it is, comes out of China, but very hungry spider
222.46.18.34 China Railway corporation! dear me. Ate 50% more than Google. And right after the Chinese Govt featured in a spoof on the IT Skeptic. Chinese spooks I reckon. So they can combine sex and travel, and ...
203.160.1.42 out of Vietnam
192.69.234.122 Dunno who dwhl.de are but they provide an anonymous ftp server too
87.99.76.% from Latvia (87.99.64.0 - 87.99.95.255 )
89.34.173.% from Romania

These crawlers are permitted to access this site:

64.78.155.100 News gator. High hits but low resource usage - nice people.
66.249.65.38, 72.14.192.0 - 72.14.255.255 Despite being by far the biggest remaining muncher of resource, who can say no to Google? But man Googlebot 66.249.65.176 is hungry!!
66.150.96.109, 66.150.96.121 "Burning Door", a.k.a feedburner. Better let them in, they serve my RSS feeds!
65.214.44.151 Nice light crawler
65.55.212.138 A Mickeysoft bot, average hunger but hits lightly

This list does not include known spammers who have been blocked. "Crawlers" are grouped (by me) by their high hit rates and/or high resource consumption: their intent may or may not be legitimate.

Many thanks to domaintools.com and googleprr.com and of course standard Google search. All powerful tools in establishing the bona fides - or not - of an address.

Syndicate content