疯狂的蜘蛛-搜狗篇

字体大小: 中小 标准 ->行高大小: 标准

客户遇到的问题如下:
疯狂的蜘蛛-搜狗篇 (1)
检查日志后,发现有很多相同IP段的IP在访问附件图片,如下图(点击查看大图):

疯狂的蜘蛛-搜狗篇 (2)
如果IP是分散的,有可能是被攻击,但是全部集中在几个C段,只有一个可能性,就是蜘蛛(后来检查发现部分IP设置了RDNS)。

最后统计到共有三个IP段,所有IP如下
106.120.173.100
106.120.173.103
106.120.173.106
106.120.173.109
106.120.173.112
106.120.173.115
106.120.173.118
106.120.173.121
106.120.173.124
106.120.173.127
106.120.173.130
106.120.173.133
106.120.173.136
106.120.173.139
106.120.173.142
106.120.173.145
106.120.173.148
106.120.173.154
106.120.173.157
106.120.173.64
106.120.173.67
106.120.173.70
106.120.173.73
106.120.173.76
106.120.173.79
106.120.173.82
106.120.173.85
106.120.173.88
106.120.173.91
106.120.173.94
106.120.173.97
218.30.103.100
218.30.103.102
218.30.103.104
218.30.103.106
218.30.103.108
218.30.103.110
218.30.103.112
218.30.103.116
218.30.103.118
218.30.103.120
218.30.103.122
218.30.103.124
218.30.103.126
218.30.103.128
218.30.103.130
218.30.103.19
218.30.103.250
218.30.103.30
218.30.103.34
218.30.103.36
218.30.103.38
218.30.103.42
218.30.103.43
218.30.103.44
218.30.103.45
218.30.103.46
218.30.103.47
218.30.103.48
218.30.103.49
218.30.103.50
218.30.103.51
218.30.103.52
218.30.103.53
218.30.103.54
218.30.103.55
218.30.103.56
218.30.103.57
218.30.103.96
218.30.103.98
61.135.189.126
61.135.189.127
61.135.189.128
61.135.189.129
61.135.189.130
61.135.189.131
61.135.189.132
61.135.189.133
61.135.189.134
61.135.189.135
61.135.189.136
61.135.189.137
61.135.189.138
61.135.189.139
61.135.189.140
61.135.189.141
61.135.189.142
61.135.189.143
61.135.189.144
61.135.189.145
61.135.189.146
61.135.189.147
61.135.189.148
61.135.189.149
61.135.189.150
61.135.189.151
61.135.189.152
61.135.189.153
61.135.189.154
61.135.189.155
61.135.189.156
61.135.189.157

由于请求的是图片,单个文件较大,而且最不可思议的是,抓取频率十分密集,把带宽全部占满(阿里云的2M带宽主机),导致了网站访问速度受到严重影响。

而且,这些蜘蛛留下的不是正常的蜘蛛useragent,难道搜狗自己也知道做坏事不能留名?

将IP段屏蔽后,带宽恢复正常,如下图:
疯狂的蜘蛛-搜狗篇 (3)蜘蛛密集抓取耗尽主机资源(CPU、带宽)的事情经常发生,并且大多都是一搜(已改名为神马)、搜狗、搜搜(已和搜狗合并)这些二流蜘蛛,百度和Google的蜘蛛很少干这种蠢事。

此文章由 http://www.ositren.com 收集整理 ,地址为: http://www.ositren.com/htmls/68071.html