Showing posts with label Spider. Show all posts
Showing posts with label Spider. Show all posts

Thursday, October 15, 2009

Googlebot Crawl Rate


I took a look at how Google's spider was reading my site today. The search engine spider used by Google is called Googlebot. So far this month Googlebot has used up 43MB of bandwidth [as seen by AWSTATS]. Last month Googlebot used 64 MegaBytes of bandwidth via 7327 hits to the server. The attached graphic shows a high of 1,184 hits with an average of 557 hits and a low of 198 hits per day. The website update rate is;
Oct.: 202 page updates
Sep.: 336 pages updates
Aug.; 253 page updates
July; 165 page updates
So it would appear that Google is just barely picking up on the pages that get updated, if only because Googlebot will sometimes reread the same page more than once.

Related Posts;
Googlebot Crawl Stats; Jan 11 2009.
Spider Crawl Stats; Oct 24 2008.
Googlebot Crawl Stats; July 4 2008.
Robot Visits; Sep 14 2007.
Google Crawl Rate; April 14 2007.
Each one of these older posts provides a graphic of the crawl rate.

Friday, October 24, 2008

Spider Crawl Stats


I just checked the crawl stats for interfacebus.com using Google's Webmaster Tools. The attached graphic shows Googlebot's [Google's spider] activity over the last 90 days.


The first two graphs appear normal, but the last graph does not. Check out the reduction in time to spider a page even as the amount of data downloaded has increased.

Does this mean that my server has gotten faster over the last few months? Or maybe Google is coming out at 3am when there are less people using the server?

Wednesday, August 13, 2008

Time to spider the site again


This week end I'll run the Xenu spider again to check for bad links. Some time after that I'll run the GsiteCrawler to generate an xml site map to upload to Google web-master tools [after I fix any issues found by the Xenu crawler]. I never run a crawler during normal office hours, as they put a strain on the server. Both Crawlers have to check each page on the server; or 1523 active pages and 209 inactive pages.

I've updated 1,440 pages since the Xenu program was run, but most have been html up-dates [a few gif files may be included in that count]. However; I've hand checked most pages over that time so I should not find that many issues. I was running the spider program once a month, but it seemed to miss a few issues so I've been hand checking links. The last check by the program indicated 5,978 links

Twenty new pages were added to the site since 6/7/08, so it's about time for a Google site map via GsiteCrawler. There may be a few page-to-page link updates that have been made as well, but not many ~ should make a more up to date site-map regardless.

Most of the page up-dates are due to one or more of these issues [most updated at the same time / page]:

Java script code change for Google Analytics [page visit counter]
Java script code change for Google Ads
Removal of all Java Script code for Google Referral Ads [program terminated]
Removal of two different meta tags [html coding issues, redundant un-required code]
Re-direct gif links as Google pages go out of service [Google web site terminated]
Java script code change for the Google Search Bar [ Code up-date, but not required]
Non-US ['dot'com] pages getting the html rel 'no follow' code [html non visible change]

The attached pic is of a Georgia map indicating 188 visits from that country so far this year. Russia just invaded them.....

Friday, July 04, 2008

Googlebot Crawl Stats


Here is the latest graph showing Googlebot activity in the last 90 days crawling interfacebus.com. It appear that the web site was off-line back in April and the spider stopped coming by for a few days. At most the site was off-line for less than a half day.

The same page that provides this data also shows a bar chart of pages with page rank 'PR'. There are four columns; High, Medium, Low, and Not yet assigned. I can't really tell if I have any pages ranked as high (above PR 5), as I can only see a sliver of color. Medium (PR 5) may contain a few pages, at least the index page. Most pages show as low (below PR5). What I don't see is any pages listed as 'not yet assigned', which has always showed many pages in the past. So Google has ranked all or most of the new pages that have been added to the web site ~ and that tells me I have stopped adding pages at the same rate I had been.

Friday, August 24, 2007

No Page Rank


I came across a page on the site yesterday that does not yet have a Google Page Rank. The page covers RF Phase Detector Manufacturers. ~ Not very well because the page only lists one manufacturer.

Any how I always assume waiting 4 months to receive a page rank, maybe 5 if the spider is running late. The 'Last Modified' date on the page is 3/17. All the other pages in that section already have a ranking: RF Device Manufacturers.

The Phase Detector page shows up in a Google search so I know its been spidered?

Of course all the new pages added with in the last few months don't have a page rank either, but I expect that.

Sunday, October 22, 2006

Web Spider


Crawed Pages

This picture shows how the Google spider [googlebot] craws www.interfacebus.com over the last three months. It appears that most pages are hit and spidered with-in one day, than are spidered again during the month. The site has about 1040 pages as determined by the cached pages on Google; “site:interfacebus.com”. Those spikes appear to reach 800 to 850 pages.