Skip to main content

Finding Googlebot IP Addresses In IIS Server Logs

I’ve made a parser command to find all references of GoogleBot (case insensitive) in our server logs, extract their source IP addresses, and summarize the hit count. To do this in Windows with your IIS logs files, you will need to have Gnu CoreUtili tools.

First try this from command prompt, type in:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | more

You should get an output of text log records, each containing a GoogleBot request.

Notice:

· My log files are located in c:\temp\logs

· I’m looking for requests to the root (“ / “); optional

Next, extract the column containing the requestors IP address; in my log file, it is column number 9:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | cut -f 9 -d " “

Lastly, sort, summarize and store the results to a local file:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | cut -f 9 -d " “ | sort | uniq –c > c:\temp\googlebot-class-c.txt

Note: It’s possible/probably that some of the request headers are fabricated and not actually coming from Google.

Comments

Popular posts from this blog

Obtaining HTTPcmd : Command line utilities

Windows 2000 Resource kit has a tool call httpcmd to perform GET operation. Microsoft offers no downloads for this tool of the Windows 2000 ResKit. The Window 2003 Res Kit does not contain that command; instead obtain the IIS 6 Resource Kit. Down from here . Use the tool tinyget: tinyget -srv:raweb01 -uri:http://relevantads.com -d Also use the tool wfetch to perform detailed HTTP requests and response anaylsis.

VB.Net code to control mouse movement and click

VB.Net code to perform mouse movements and clicks. Include references at the top of the class code file to Windows interface libraries: Public Declare Auto Function SetCursorPos Lib "User32.dll" (ByVal X As Integer, ByVal Y As Integer) As Integer Public Declare Auto Function GetCursorPos Lib "User32.dll" (ByRef lpPoint As Point) As Integer Public Declare Sub mouse_event Lib "user32" Alias "mouse_event" (ByVal dwFlags As Integer, ByVal dx As Integer, ByVal dy As Integer, ByVal cButtons As Integer, ByVal dwExtraInfo As Integer) Some fixed constant values will be needed, so include these as basic names: Public Const MOUSEEVENTF_LEFTDOWN = &H2 Public Const MOUSEEVENTF_LEFTUP = &H4 Public Const MOUSEEVENTF_MIDDLEDOWN = &H20 Public Const MOUSEEVENTF_MIDDLEUP = &H40 Public Const MOUSEEVENTF_RIGHTDOWN = &H8 Public Const MOUSEEVENTF_RIGHTUP = &H10 Public Const MOUSEEVENTF_MOVE = &H1 This ...

Google's Automated Search Query Capture

It's known that Google takes preventative measures to reduce automated use of their search engine. In fact, Googles terms of service restrict the use of automated queries. Normally human users with real browsers will not be suspect of such use and thereby should not trigger firewall rules that detect queries that appear to be automated. However I found myself in just that position. After running several varied queries, I came back to running the repeating a past query (through the browser drop down query history) and received the following : HTML Source Interesting to note is that the page response header is a 503 error code . I suspect this was triggered by my complex query, retrieving multiple pages or results, and repeated usage in a short period. Google knowledge base on this topic suggests that users that have this problem may also have a virus or other spyware on their computer or another in the network.