Wednesday, January 14, 2009

Differentiating Browser Cache From Search Engine Cache

If you need to clear your cache, you must understand that "cache" is a generic term. My dog used to bury bones int he same hole, which was his "cache" of bones. Online, we refer to cache as a storage mechanism associated with various applications to make them faster. There are two prominent caches for Online users, one of which we don't care about, the other we really do...

Browser Cache, as defined by the Wikipedia...

Web caching is the caching of web documents (e.g., HTML pages, images) in order to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.

It is not to be confused with a web archive, a site that keeps old versions of web pages.

The last sentence leads us to our important interpretation of Cache, which is anything that is archived, as discussed below. Browser Caches give the impression that the Internet connection is much faster than it really is. By saving the text and images from your browsing history, you can return to the same pages and view the stored text and images, rather than downloading it again. You think you browsed the same site, but you really browsed the stored files in your browser. This causes a problem when you want to see the very latest information straight from the web server's mouth, not your old stored files. If so, hold down thr SHIFT key while clicking the Refresh/Reload button in your browser application. This should tell our browser to override the local file cache and collect the newest version from the server.

Now, do we care if your browser has stored old media? No! You won't get sued for having old content in your browser, and nobody should be able to view your local cache aside you. Lets move on to the important cache, which is Web Archives or Web Caches...

With a dissimilar goal from your browser, Online resources seek to archive your website, ftp server, and whatever they can get their hands into. We'll start with Google, the almighty brain and dominant intelligence in the world. When you browse to Google and perform a search, you get search results pages, which list titles, synopses, and some other information for each listing. If you look closely, you'll notice the term "Cached" and then "Similar Pages" at the bottom of each listing.


Go ahead and click the "Cached" link and read the statement at the top. There is a statement provide by Google telling you that what you're viewing is a "snapshot" from a,previous date, for the site listed in the search results page. I clicked the top link and Google states...

This is Google's cache of http://www.ultimatechocolate.com/. It is a snapshot of the page as it appeared on Jan 10, 2009 13:11:56 GMT. The current page could have changed in the meantime. Learn more

These search terms are highlighted: chocolate treats


This is pretty important stuff! Google visited this website and collected more than just the text. They collected the source code, text included, and all the reasonable images, style sheet, and Javascript code. We could ask if this is itself a Copyright Infringement, but we'll get into philosophical discussions elsewhere. Most companies WANT Google to index and archive their content, as part of helping the world find their site, buy their products, and create more revenues.

The down side is that Google may have archived something you are freaking out about. A Cease And Desist letter is enough to give the average person some minor brain hemorraging! Now you have found that Google continues to display your infringing or sensitive information and images seemingly forever, even after your own site was cleansed. Google is not alone in this handling of historical information, but the concern is how to get your content removed! I'll blog on this specifically, and individually. This page is intended to make sure you know the difference between types of caches, which I hopefully drove home.

The question at hand for most is how to get Google and other Online entities to delete all cached content from the past, while allowing them to continue caching in the future. We'll get into removal of specific pages, specific images, and future Google cache management and control.

No comments:

Post a Comment