Recently I was asked by management to get some statistics on who downloaded a particular file on our web server. Of course, figuring out such a thing is pretty much impossible unless you require each person to register prior to the download, and we choose not to do that on our server because it is a major inconvenience to the person trying to do the download. I told management that the best I could do is to tell them what Network and possibly what geographic region each download was from, which is not too hard to do with dig/nslookup and whois (or a variety of geo-location services on the web–just google for “locate IP”. But after I promised to management that I’d get this information, I realized that I couldn’t figure out an easy way to do a lookup a list of IP address (every tool I knew of would be a single lookup at a time, which would make for a long day of copying and pasting IPs). So I wrote a script to do it, and now I’m sharing that script with you.
So here is how it works: you pass this script an IP address of a list of addresses, and it will output the IP, the Domain Name for the IP (the PTR record), the City, the State (if in the US), and the Country code for each IP address in a comma separated list.
Here are some examples of usage:
To lookup a single IP address:
jed@jed-daniels-computer:~$ ./lookup_ip.sh 64.233.167.99
64.233.167.99,py-in-f99.google.com.,Mountain View,CA,US
To lookup a list of address in a file that has one IP address per line:
jed@jed-daniels-computer:~$ ./lookup_ip.sh `cat file`
66.94.234.13,w2.rc.vip.scd.yahoo.com.,Sunnyvale,CA,US
64.233.167.99,py-in-f99.google.com.,Mountain View,CA,US
205.158.104.181,networkphysics.com.,San Mateo,CA,US
To pull the IP addresses from a log file for parsing:
Determine the field to get the IP from. In this case, it is the first field, and the obvious delimiter is the space following the IP address. Here is an example from my apache log file:
205.158.104.176 - - [30/May/2007:19:03:37 -0700] "GET /category/unix/ HTTP/1.1" 200 51716 "http://www.itsnotthenetwork.com/category/networking-basics/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11"
Use cut to snarf the IPs from the file, then sort them and use uniq to remove duplicates:
jed@jed-daniels-computer:~$ cut -d" " -f 1 access_log |sort|uniq >> file
Now run the script again, this time redirecting it to a file called visitors.csv and the screen so you can watch the action:
jed@jed-daniels-computer:~$ ./lookup_ip.sh `cat file` |tee visitors.csv
If you wanted to only count get info on a particular page or object, such as an image that only exists on certain pages (or is even linked from other sites):
Use grep to find lines that only have that object (image, URL, etc.), then use cut, sort and uniq:
jed@jed-daniels-computer:~$ grep img11.gif access_log |cut -d" " -f 1 |sort|uniq |tee file
Now run the script again, this time redirecting it to a file called img11_loads.csv:
jed@jed-daniels-computer:~$ ./lookup_ip.sh `cat file` >> img11_loads.csv
How it works:
First, I started with the easy part: find the domain name of the IP address. Using a dig lookup on the IP to find its PTR record, this is pretty easy: dig +short -x 205.158.104.181. Of course this doesn’t work if the connecting IP has no PTR record, but oh well.
Next, I had to figure out a way to get the geographic location. This is as much art as it is science, so I knew not to expect this to be perfect, but a reasonable guess was really all that I was after. After a bit of googleing, I found hostip.info, an awesome site that not only has the tool do do it, but was kind enough to publish an API for it, including example code of how to access it (note that I’m using links instead of lynx, but either one will work just fine):
jed@jed-daniels-computer:~/Desktop$ links -dump "http://api.hostip.info/rough.php?ip=205.158.104.181"
Country: UNITED STATES Country Code: US City: San Mateo, CA Guessed: false
The API outputs a bunch of data, so all I had to do after that was to slice it and dice it the way I wanted to get a nice pretty CSV file that I could bring into Excel and send to manager type people. Note that the data isn’t perfect–it reports Network Physics as being in San Mateo instead of Mountain View, but for my purposes, anywhere within several hundred miles is good enough.
Download the script: lookup_ip.sh (MD5: 11065ee2887597aa127aeafea62288a9)
NOTE: Bash 3.X or higher is required to run this script. If you don’t have a recent version of bash, stay tuned to this site and I’ll walk you through updating on various platforms soon.
API for geolocating IP addresses [Hostip.info]
Other comments: my scripting ability is pretty weak. There are probably MANY ways to do this better. Feel free to let me know what they are and to teach them to me. If you make improvements to the script and want to share them, or if you do something really exciting with this script, or even just find it useful, please let us know at tips@itsnotthenetwork.com.
I wrote and tested this script on my OSX system. It should work fine on any system running a bash 3.x shell, which should be most versions of Linux, FreeBSD, or Unix (note that you may need to upgrade the shell, as I did, but that is another article for another day). If you are using Windows and want to run this script, you can do so using Cygwin, which I will also write about in another article.