Search engine cache

Search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down.[1]

The link for the cached version of a web page in search results from Google (top), Bing (middle) and Yandex (bottom)

When a web crawler crawls the web, it collects the contents of each web page to allow the page to be indexed by the search engine. At the same time, it can store a full copy of that page. The search engine may make the copy accessible to users in the search engine results. Web crawlers that obey restrictions given in robots.txt[2] or meta tags[3] by the webmaster may not make a cached copy available to search engine users if instructed not to.

Search engine cache can be used for crime investigation,[4] legal proceedings[5] and journalism.[6][1] Examples of search engines that offer their users cached versions of web pages are Google Search, Bing, Yandex Search, and Baidu.

Search engine cache may not be fully protected by the usual laws that protect technology providers from copyright infringement claims.[7]

Some search engine caches may be equipped with additional functionality such as the ability to view the page as simple unstyled hypertext or its source code, as supported by Google Cache ("Full version", "Text-only version", "View source").

References

  1. Wilfried Ruetten (2012). The Data Journalism Handbook. O'Reilly Media, Inc. ISBN 9781449330064. When a page becomes controversial, the publishers may take it down or alter it without acknowledgment. If you suspect you're running into the problem, the first place to turn is Google's cache of the page as it was when it did its last crawl.
  2. "Robots meta tag, data-nosnippet, and X-Robots-Tag specifications". noarchive: Do not show a cached link in search results.
  3. "Special tags that Google understands - Search Console Help". noarchive - Don't show a Cached link for a page in search results.
  4. Todd G. Shipley, Art Bowker (2013). Investigating Internet Crimes: An Introduction to Solving Crimes in Cyberspace. Newnes. ISBN 9780124079298. For the investigator this can be a valuable piece of information. Depending on when Google crawled the site, the last page may contain information different from the current page. Documenting and capturing Google's cached page of a webpage can therefore be important step to ensure this time snapshot is preserved.
  5. Steven Mark Levy (2011). Regulation of Securities: SEC Answer Book. Aspen Publishers Online. ISBN 9781454805434. The World Wide Web is not as ephemeral as one might think. An increasing number of older web pages are available online through such services as the Wayback Machine, Google Cache, Yahoo Cache, or Bing Cache. Some plaintiffs' lawyers and corporate gadflies use these services as a matter of routine.
  6. Cleland Thom (2014-10-23). "Google's caches and .com search engine provide 'right to be forgotten' solutions". Press Gazette. Journalists can also access delisted content via the Google cache.
  7. Herman De Bauw, Valerie Vandenweghe (June 2011). "Brussels Court of Appeal upholds judgment against Google News and Google Cache". Archived from the original on 2015-04-26. For the cache function, the Court rejected the exception of a "technically necessary copy". This exception exempts temporary reproduction which is a necessary part of a technical process applied by an intermediary for transmission in a network between third parties. According to the Court, the cache copy that Google stores on its server is not technically necessary for efficient transmission.


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.