Saturday, December 13, 2008

Sitemap Generator


I up-loaded a new XML sitemap to Google [Google, Webmaster Tools, Sitemaps]. The current XML sitemap contains 1,859 submitted URLs, as reported by Google. The previous XML sitemap was last up-loaded on August 16th and contained 1,727 URLs. So 132 new pages were added to the web site over the last 4 months. The last site map caused 1,318 URLs to be indexed by Google.

During this same time frame 968 html files were changed or up-dated in some fashion. However the up-dates could be just html fixes in some cases.

The program GSiteCrawler is used to spider the site and generate an XML Sitemap. Because of the bandwidth required I only generate a sitemap once every few months. The program has to check every page on the web site. But once the site map is generated and up-loaded to the server the bandwidth usage remains the same. Google reads the sitemap once every few days regardless of how old it is. So if Google is going to download the sitemap from the server it may as well be up to date.

Other search engines find the site map via a comment in the robots.txt file on the server. That comment happens to point to another web site which also holds the sitemap.xml file. So only Google downloads the sitmap from my server, all other search engines get the same file from another server [saving me bandwidth].

I also started to update the HTML version of the sitemap, the human readable version. I added 18 new page addresses to the sitemap that had been added to the site from 8/16 to 10/16. I up-loaded what I had and will add the previous two months as time permits.

A few days ago I spidered the site using Xenu to check for broken web links. Xenu indicates 2% of the 6,105 URLs on the site were bad. However giving them a few days to come back proved that only about a dozen URL links were really gone. I did end up deleting or removing a few links.

6 comments:

Leroy said...

12/15/08; Over night Google downloaded the new sitemap. So now it should be a few more days until Google starts to see some of the new pages that have been added over the last few months.

Keep in mind that Google sees the new pages as the new page address are listed in the 'what's new blog' [address listed on the websites index page.

Leroy said...

12/20/08 Google Sitemaps now reports 1,417 indexed URLs, so the search engine just found a hundred new pages, they should now start to show up in the search listings.

Anonymous said...

1/6/09 Google download the sitemap again December 26, and then again on the 3rd of Jan.

I did note that the tool is only telling me today 1/6/09 that the system looked at the upload 3 days before.

Leroy said...

1/20/09 Indexed URLs now indicate 1428 URLs

Leroy said...

2/2/09 Now Google indicates that there are 1414 URLs indexed. You can see that the number of pages that are indexed in the search results changes each time Google re-downloads the sitemap. It was last down loaded on 1/31.

Leroy said...

2/06/09 Indexed URLs are back up to 1429 pages

Post a Comment