Wednesday, September 01, 2010

Disallow access to server stat files

So after 10 years on the web the spiders have found my server stat files, in this case AWSTATS. This is the program output I use to see data regarding site visitors; number of visits, page views, visit duration, key words and on and on. The program stores the data in large text [txt] files, but outputs the data to me in the form of charts and graphs. In fact the txt files are very large, multi-Mega Byte files, the graphs are some what smaller and more readable.

I would recommend that every web master add a 'Disallow' line to the robots.txt file to stop the web spiders from reading your stat files. In my case the line looks like this; Disallow: /awstats/.

The bottom curve is server bandwidth ~ the increase occurs as my stat files started showing up in my search results. Because the graph is set up to show number of visitors, the bandwidth is normalized. So the 100,000 horizontal bar which indicates 100,000 visits or page views... indicates 10GB of server bandwidth for the bottom curve. The 200,000 line indicates 20GB for bandwidth

I only noticed the server text files showing up in search result about a month ago, because normally I don't need to search my own engineering site ~ right, I wrote it. So I only added a 'block' to the robots file a few weeks ago; however I recommend that you block access now even if you don't have an issue. It only takes a few minutes to add and if you pay for bandwidth or blocked if you exceed bandwidth it may well be worth the time.

If you look back to 2006/2007 you can see that bandwidth tracked unique visits, but by 2008 the gap stated to widen. Two years before Google started to rank pages on down-load speed I had already started to make the web site more efficient.

Unfortunately now the bandwidth data is meaningless, because it only shows these large txt statistic files being downloaded. For example one 2.52MB txt file was downloaded 230 times last month, a 4.11MB file was downloaded 98 times. That's 328 visitors that used the search bar and received bogus results, are they going to come back for a second visit? Really its much worse, before I stopped counting, there were 1,206 people last month who thought that one of those text files was a valid search return.