Tuesday, February 02, 2010

Finding Googlebot IP Addresses In IIS Server Logs

I’ve made a parser command to find all references of GoogleBot (case insensitive) in our server logs, extract their source IP addresses, and summarize the hit count. To do this in Windows with your IIS logs files, you will need to have Gnu CoreUtili tools.

First try this from command prompt, type in:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | more

You should get an output of text log records, each containing a GoogleBot request.

Notice:

· My log files are located in c:\temp\logs

· I’m looking for requests to the root (“ / “); optional

Next, extract the column containing the requestors IP address; in my log file, it is column number 9:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | cut -f 9 -d " “

Lastly, sort, summarize and store the results to a local file:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | cut -f 9 -d " “ | sort | uniq –c > c:\temp\googlebot-class-c.txt

Note: It’s possible/probably that some of the request headers are fabricated and not actually coming from Google.

No comments:

Share Links