Monday, April 12, 2010

Google's Automated Search Query Capture

It's known that Google takes preventative measures to reduce automated use of their search engine. In fact, Googles terms of service restrict the use of automated queries. Normally human users with real browsers will not be suspect of such use and thereby should not trigger firewall rules that detect queries that appear to be automated.

However I found myself in just that position. After running several varied queries, I came back to running the repeating a past query (through the browser drop down query history) and received the following:

Google Automated blocking of Searches


HTML Source

Interesting to note is that the page response header is a 503 error code.
Google Suspicious Searches HTTP Header


I suspect this was triggered by my complex query, retrieving multiple pages or results, and repeated usage in a short period. Google knowledge base on this topic suggests that users that have this problem may also have a virus or other spyware on their computer or another in the network.

Wednesday, February 17, 2010

DNS Settings for Linux

File for client DNS resolution as well as the default domain search is: /etc/resolve.conf

Modify this file:
vi /etc/resolve.conf

to appear like:
search relevantads.com
nameserver 4.2.2.1
nameserver 4.2.2.4


No restart should be required.

You can replace the term "relevantads.com" with your local domain name. That will allow for pinging a machine without fully qualified domain name.

The nameserver records should ideally be provided by your ISP. The above 4.2... records are gold TLD's which may block abusive traffic.

Tuesday, February 02, 2010

Finding Googlebot IP Addresses In IIS Server Logs

I’ve made a parser command to find all references of GoogleBot (case insensitive) in our server logs, extract their source IP addresses, and summarize the hit count. To do this in Windows with your IIS logs files, you will need to have Gnu CoreUtili tools.

First try this from command prompt, type in:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | more

You should get an output of text log records, each containing a GoogleBot request.

Notice:

· My log files are located in c:\temp\logs

· I’m looking for requests to the root (“ / “); optional

Next, extract the column containing the requestors IP address; in my log file, it is column number 9:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | cut -f 9 -d " “

Lastly, sort, summarize and store the results to a local file:

grep -i "GoogleBot" c:\temp\logs\*.log | grep " / " | cut -f 9 -d " “ | sort | uniq –c > c:\temp\googlebot-class-c.txt

Note: It’s possible/probably that some of the request headers are fabricated and not actually coming from Google.

Friday, January 29, 2010

How to Test a Website Through Command Prompt

The information that is exchanged through our internet browsers travels over standard communication protocols. Most of these protocols are compatible with basic keystrokes (ASCII) characters and can be accessed in their raw format through a few keystrokes in command prompt.

1) Prepare a notepad window with the HTTP request codes that you want to run;


GET / HTTP/1.1
Host: www.google.com

2) Copy the commands to the clipboard;
3) Open command prompt and telnet;

telnet www.google.com 80


4) Paste the HTTP code by right-clicking; then hit enter a couple times;

Note that you will get a blank screen (unless you have full duplex connection) and will not see the keystrokes that you enter.

At this point you will see a stream of text fill the screen. You should be able to scroll back in the command prompt window; you may need to change the command prompt properties to scroll all the way back.



Reference

Debug Browser Requests


Your browser will pass many headers in each request it makes. In the past, we used to have to run HTTP monitoring tools to see what the browser was doing, but now Firefox has a great add-on called Live HTTP Headers that makes it much easier.

Protocols


There are several protocols that can be used in telnet, here is a list of ones I have used:







ServiceStandard Port
HTTP (websites)80
SMTP (email sending)25
FTP (file transfer)21
POP3 (fetch messages)110
IMAP (email server)143

Other online HTTP testing tools


Pingdom is great as it shows time to first/last byte
Uptrends - several geographical sources to test from.
Site-perf
LinkVendor

Monday, January 11, 2010

Windows 7 God Mode

Godmode in windows 7 lets you have a single place to configure most windows settings:

Make a folder called: GodMode.{ED7BA470-8E54-465E-825C-99712043E01C}
Or from cmd (command prompt) run:


cd \
mkdir "GodMode.{ED7BA470-8E54-465E-825C-99712043E01C}"

Share Links