Thursday, December 02, 2010

URLs in Web Index

This would be the third revision or up-date to the posting of the web stats showing URLs in Web Index.
The stats relate to this Engineering Web Portal.

The table of data below shows the number URLs or pages in Google's web index. First of all I don't like to generate new site maps that often because of the bandwidth requirements of the sitemap program. That is the program I run needs to check every page I have on the server, so it's a big hit to my server. But if you look at the data and the increasing number of URLs in the web index you will notice that generating a sitemap over and over again is not required.

However I do recommend running a new sitemap if you have a large amount of new pages. The number of URLs in the web index always increases after a site-map is generated. However you'll notice that in some cases the number of indexed pages falls off a few days later. I assume the program reading the sitemap finds the new pages and adds them to the web index, but sometime later another computer algorithm determines the page should not be index and they drop off the index.

01/21/11 = 1,938 urls in web index [Site-map loaded]
11/30/10 = 1,875 urls in web index.
11/04/10 = 1,854 urls in web index.
10/23/10 = 1,805 urls in web index.
07/24/10 = 1,779 indexed pages.  [Site-map loaded]
07/01/10 = 1,535 indexed pages.  [Site-map loaded]
06/30/10 = 1,504
urls in web index.
06/24/10 = 1,455 urls in web index.
06/18/10 = 1,426 urls in web index.
05/31/10 = 1,400 urls in web index.
05/22/10 = 1,394 urls in web index.
04/07/10 = 1,309 urls in web index.
03/27/10 = 1,322 indexed pages. [Site-map loaded]
12/19/09 = 1,481 indexed pages. [Site-map loaded]
12/13/08 = 1,318 indexed pages. [Site-map loaded]


There are two reasons why the number of URLs in the web index are increasing. First I generate at least two or three new pages every month, sometimes many more. So you would except to see an increasing number of pages included in the web index as those new pages are found and included. Secondly I'm always updating preexisting pages already residing on the web-site. So after time the up-dates may allow a page to be indexed, usually because it has more content [text]. Many times I'll add a new page topic that is little more than a graphic and a small description. But over time I get back to up-dating the page to include more data and so on. Once the page gets 'the required amount' of content Google included the URL into the web index.

How soon your new page shows up in the web index, with out a sitemap, depends on how often your pages are crawled. The chart shows that my site has 500 pages crawled every day, although you can't tell from that how many pages are re-crawled. So for my site any new page added is found within a few days, I assume 15 days ~ so I don't really need a sitemap.


Although it has little to do with the number of URLs in the web index, over and over I see new pages not receiving any incoming hits for months. That is even as a new page gets included in the web index, it takes three months for the page to really start getting any visits. Web Masters call that the Google sand box. So don't think that just because a page gets indexed that it will bring in a larger number of hits the next day. Also I have groups or sections of pages that have dropped of the index, for years now. Currently I have 1,946 URLs submitted and 1,875 URLs included. That three month "down-time" is also the same amount of time it takes a page to get a Google Page Rank. [if it ever gets one].

2N3485 Transistor Derating. 3 year old page with zero page rank.
Semiconductor Manufacturers 'L'. 5 year old page with zero page rank.
Electron Tube Classification 2 week old page with 0 page rank.
74L121 Monostable Multivibrator IC. 1 week old page with 0 page rank.

Note always run your site-map generator program at night or when you expect low incoming traffic. Remember that program is talking to your server, which would be in direct competition with your visitors.

Also I have no idea how to determine which pages of my site are not included in the web index, other wise I might fix those pages.

The number of web pages; URLs in web index change every week or so, the numbers listed above are just the ones I wrote down.
. err click the title to read the comments, if you didn't come to this specific topic. The blog compresses the comments section unless you're on that particular page. The comments are updates to the blog post.
.......

9 comments:

Leroy said...

12/2/10 I should also say that the location of my sitemap is listed in my robots.txt file so any search engine that reads that file will find the location of my site-map. Only Google reads my site as often as shown in the graph, any other search engine would need the help of the site-map.

Because I use Web Master Tools from Google, I 'up-load' my sitemap to them ~ really just tell them it's address too.

Leroy said...

12/2/10 Number of URLs in Web Index 1,887 pages.

I should also say that for the last 3 week I've been running on a new computer. So I haven't even down-loaded a site-map generator program yet. I would have to spin up my old computer to figure out which program I like, although I think there are a few blog posts regarding which program(s) it is.

However the number of URLs in the web index has increase by 12 pages between yesterday night and today ~ without a sitemap.

Leroy said...

12/2/10 I just added a new page yesterday to hold a large graphic, one that wouldn't fit on the page it was intended for. So I added a thumb-nail picture [smaller version] to the first page:

Spherical Coordinate System

Then add the larger graphic to the new page;

Spherical Coordinate System

Many sites use thumbs pointing to the larger version pic. The problem is that if either the page holding all the thumbs or the page holding the large graphic has no text, than that the page will never be indexed. In my case I want the page with the thumb indexed which it is, but I don't care if the page holding the larger pic file ever gets indexed. Why? because the original page has all the text and a graphic, so it will be found using either a text search or picture search. If a visitor does find the page than they will visit the other page if and only if they want to see that larger graphic.

Yes it does make it appear I have another page that is not being indexed, but I still don't care. I'll add text to the other page some time later.

Plus because the page holding the smaller graphic gets indexed, it gets spider all the time. So the small 10k file gets spidered instead of the larger 50k file ~ that saves me 40k of server bandwidth each time the page gets spidered. Remember in an image search both graphics appear the same [just a bit smaller].

Leroy said...

12/3/10 In the next blog post [later today] I'll detail how many web pages are getting how many visits, compared to last year. That would be how many pages get 1 to 99 hits, 100 to 199 and so on.

Any way the important thing to remember is adding more pages does not mean more site visits. The standard Google recommendation is always work on your pages that bring in the most visitors.

Leroy said...

12-7-10 Number of URLs in web index is now 1,892, up five pages from last week. However I can't tell when those new pages were added to the web site, either last week or three months ago. At least the new pages are being picked up, that's the important thing.....

Leroy said...

12-11-10 Like I said these indexed numbers go up and down. The number of urls in web index now indicate 1,888 or down 4 from the last comment three days ago.

Leroy said...

12-12-10 An external site is linking to my site map. Google found the link and now shows the xml file in the search results. So the only way to stop that is to delete my site map from Webmaster tools.

Leroy said...

12-27-10 I just generated a new sitemap, using a different file name [so it's not indexed].

About an hour after I up-loaded it Google indicated that 1,754 urls were submitted and 1,707 pages were indexed. I don't know why these numbers are lower than what they had been. But my PC has 1,966 html files stored, so it would appear that the sitemap generator I just used did not pick them all up.

Leroy said...

1-21-11 I regenerated a new sitemap tonight using a different sitemap generator. Now I have 1,952 urls' submitted with 1,938 url's in the web index.

The change tells me two things; first the last site-map did not find or index all of my web pages and so did not include them into the site-map, second the urls' in web index only refers to urls within the site-map.

The url's in web index must count the number of pages in Google's index that are also included in the site map. That web indexed number does not count the total number of html files that are indexed, but only those that are index and in the sitemap.

So I would assume that if I uploaded a sitemap with 1 url, Google would indicate that I have one url indexd, regardless of the total number of urls that really are indexed.

Post a Comment