Sunday, August 01, 2010

Server Bandwidth increasing

Even as visitors to the web site have been decreasing over the last few months bandwidth has increased. Now it's normal for site hits to decrease during this time of the year, normal for my site.

In March there were 190,622 unique visitors to the site using 14.64G Bytes of bandwidth.
In June there were 156,355 unique visitors to the site using 16.15G Bytes of bandwidth.
In July there were 150,100 unique visitors to the site using 17.03G Bytes of bandwidth.

Checking Googlebot Crawl Stats the amount of data downloaded did double for the end of July, but only to a high of 28kBytes per day. However the number of pages crawled stayed about average at 812 pages per day. So the only assumption that can be made is that Google was reading pages with more pictures than normal. Some pages have more graphics than normal; also, some of the graphic files are local to the server and some are held out on Picasa. Of course the ones located on Google Picasa do not effect my bandwidth.

Now I have removed a few dozen pic files from Picasa over the last few months. Google started to rank web sites based on down-load speed and they considered getting the pic files from 'Google' Picasa as slow. Not because Picasa is slow, although it maybe, but because they consider looking up the DNS [address] of another web site as being inherently slow. The most relevant posting was Web Site Speed Enhancements.

Last month I did up-load a new site-map to the server which Google has been reading every few days. The site map is 300k Bytes which is the size of about 60 html files, or maybe 30 files if you were to count the pic files too. The reason for up-loading the sitemap was to try and get more pages included in Google's index, which I have but maybe at the cost of server bandwidth. June 30 had 1,504 pages in Google's search index [URL's in Web Index], and as of July 28 there were 1,782 files included in Google's Index.

The server counter AWSTATS indicates that Googlebot used 285.97MB of server bandwidth, and the spider from Yahoo used 227.35MB of bandwidth. I have a [BaiDuSpider] spider from China that used 681.53MBytes of bandwidth. I noticed that a number of internet posts have had issues with the amount of data being indexed by this particular spider. It would appear that over the previous few months BaiDuSpider was only reading about 15MB/months so maybe it just got around to reading the entire web-site.

Now checking Google Analytics I see no real increase in visits from China, a little over 3,000 a month for the last three months. The top 10 robots used almost 3G Bytes of bandwidth, it's to bad their not sending me more traffic.

Even Alexa used over 329MBytes to spider my site, to bad my traffic rank is down 26,000 [-10% Reach] but then that would figure because my hits are down as well.
Oh and I just noticed that Google is also reading my text files from my stats counter, so that's another 20M of data it got off my server.

I guess I should also mention that over the last thirty days I've added maybe 30 pages and maybe that many pic files so that would account for a bit of the increase too. Anyway check out the attached chart, the lowest trend is bandwidth [normalized] ~ click to enlarge.

2 comments:

Leroy said...

8/2/10 I just noticed that hits from Google are up from 400/day last month to 750/day this month. This is Google the search engine not Googlebot the spider.

Now what I also noticed is that Google image search now shows up under "Google" in the referring sites with a new name instead of Google/images or what ever. So image pulls from Google have double on my site which could explain the higher bandwidth issues I blogged about.

So a hit from Google Image search was listed as images.google, but those hits have gone to zero over the last few months. Now it appears that an image hit from Google is called google/imgres. I checked by doing an image search from pics on my site. I don't think Google caches images, but I'm not sure.

This could help explain some of the bandwidth issues. Just keep in mind that images searches from Google may be called something new in your stats report.

Leroy said...

8/18/10 I just added line of text to the robots.txt file so Google will no longer read my server stats. Each of those files Google was reading was 9MB, so I think that was what was going on with the increase in bandwidth usage. So I assume by next month the file usage should go back to normal. The server text files are still indexed by Google but I can't believe any one would pull one up ~ it was Google reading them, not normal people.

I also added a line to the Robots.txt file to indicate the location of my xml site-map, so now other search engines beside Google should be able to find the sitemap too.

Post a Comment