A bot is often also called a spider. Normally you would block a bot or spider using the following robots.txt
:
User-agent: BaiduSpider
Disallow: /
or
User-agent: *
Disallow: /
This doesn't work for blocking Baidu... :(
You can use the following IIS URL Rewrite Module rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt
, all other requests are blocked with a 403 Access Denied
.
Expand the pattern=
with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing"
or pattern="Googlebot|Bing"
.
Hint, search IIS URL Rewrite related posts on Saotn.org!
<!--
Block Baidu spider
-->
<rule name="block_BaiduSpider" stopProcessing="true">
<match url="(.*)" />
<conditions trackAllCaptures="true">
<add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
<add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
</conditions>
<action type="CustomResponse"
statusCode="403"
statusReason="Forbidden: Access is denied."
statusDescription="Access is denied!" />
</rule>
HackRepair.com’s Bad Bots .htaccess in web.config for IIS
Verifying the rewrite rule to block Baidu
Using Fiddler's Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.
That's it!