The deep web is the newest phenomenon in the Internet world. The World Wide Web, which can now also be known as the surface web, has another side, vastly larger and mostly unknown until recently. This is the deep web. The deep web has been defined as web content that is found in searchable databases. This web content is of the type that can only be found by some type of direct query. The deep web is also known as the invisible web. The deep web is not really invisible, but because searchable databases are not indexable or queryable by today’s search engines, they appear invisible to the average Internet user as they search the Internet.
Search engines are web sites, whose primary purpose is to enable people to find information on the web. These devices and their related software have the formidable task of indexing or attempting to index the entire World Wide Web. All search engines create and maintain their own enormous searchable databases. Currently, the best and biggest search engines only index from one third to one half of the publicly available documents on the Internet.
Search engines are designed to read flat web pages. Flat web pages are like this one or one that you have created yourself. As web sites evolve and grow, it becomes more and more difficult to create individual pages because of the sheer size of many sites. Many web sites have turned to databases to create web pages on the fly when requested by a user. The database contains the information, which is inserted into a web page template on demand. Therefore, there is no flat pages ever created, hence the problem stated above. If there are no flat web pages created, then there are no pages for the spider or crawler, or bot to index and thus no listing in the search engine database.
Recent calculated estimates by Completeplanet.com estimate the surface web at 2-3 billion documents, while the deep web is estimated to be a mind boggling 575 billion documents. So, is there a way to access the deep web? Is there some new type of web searching technology that can access the deep web? Yes, there is. The following are some of the sites that have been created to access the vast number of online databases or directly search them.
Deep Web Facts
•400-500 times larger than the surface Web
• 7,500 7,500terabytes of info vs 19 terabytes of info
• 575 billion document vs 2-3 billion documents
• Over 200,000 deep Web sites presently exist•
• Deep Web sites receive 50% more monthly traffic than surface sites (average)
•Largest growing category of new info on the Internet
• Have narrower, deeper content than surface sites
• Quality of 1,000% – 2000% greater than the surface Web
• Content is highly relevant to every information need, market and domain
• More than ½ of deep Web content resides in topic-specific databases
• 95% of the deep Web is publicly accessible information and not subject to subscription fees
The following is a short list of the more popular deep web search tools.
Also see this article
Deep Web Search Engines
https://Archive.org – Archive Huge behemoth of media now public domain – rare books, sound recordings, video, 20 year archived images of all old websites, and free audio books!
www.beaucoup.com Beaucoup This site contains a collection of over 2,500 searchable databases and search engine
http://www.deepwebtech.com Deep Web Technologies. This site has information on mining the deep web - the part of the Internet that encompasses vast and diverse content including commercial subscription databases, content buried within publicly available websites, and internally-generated documents scattered throughout an organization. They can build custom solutions starting with their proprietary Explorit™ software.
http://adswww.harvard.edu/ – Digital Library for Physics and Astronomy The SAO/NASA Astrophysics Data System (ADS) is a Digital Library portal for researchers in Astronomy and Physics, operated by the Smithsonian Astrophysical Observatory (SAO) under a NASA grant. The ADS maintains three bibliographic databases containing more than 11.6 million records covering publications in Astronomy and Astrophysics, Physics, and the arXiv e-prints. Abstracts and full-text of major astronomy and physics publications are indexed and searchable through the new ADS "Bumblebee" interface as well as the traditional "Classic" search forms. A set of browsable interfaces are also available.
http://www.firstgov.gov FirstGov This site contains a new government information gateway. Comprehensive and well maintained, FirstGov.gov, is the official U.S. gateway to all government information, and is the catalyst for a growing electronic government.
http://www.istl.org/01-winter/internet.html Freely Accessible Databases for the Public This site is from Sandy Lewis - Sciences and Engineering Librarian & Library Instruction Coordinator University of California, Santa Barbara
http://www.loc.gov/ Library of Congress Phenomenal digitized archives, “American Memory” especially interesting. Includes a good newspaper archive.
http://lii.org Librarians' Index to the Internet This site offers two services: a searchable, browsable collection of over 16,000 high-quality Websites, and a weekly newsletter [they have over 20,000 subscribers in 85 countries], available by email or RSS, of high-quality Websites. LII can also lead you to Invisible Web databases by typing in a broad topic and adding the words: "and databases" (i.e., biology and databases).
http://people.hws.edu/hunter/deepwebgate03.htm Recommended Gateway Sites - Deep Web. This site is a portal type site with a large variety of related links and search engines in a number of categories.
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html The Invisible Web Finding Information on the Internet: A Tutorial The Invisible Web: What it is, Why it exists, How to find it, and It inherent ambiguity
www.osti.gov – Office of Science & Technology Information Government research archives, if your tax dollars paid for it, the results are here. Also a huge collection of science presentation videos.
http://www.quandl.com/ – Quandi An awesome collection of 9,000,000 of financial, economic, and social datasets
www.search.com Search This site is a gateway to over 800 engines and has subject access is through Directory Categories, then subtopics then enter search term(s)
http://vlib.org/ WWW Virtual Library – a listing of indexes to industries. Need to know about Architecture? Biochemical war? Zoology? This may get you there.
http://databank.worldbank.org/data/home.aspx – World DataBank Specialty statistical data on all kinds of subjects, from countries GDP to levels of blindness.
Deep Web Informational Sites
The Invisible Web http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
These deep web search tools are taken from a listing from
The Ultimate Guide to the Invisible Web fro OEDB
- About WebSearch — Christmas 2006 web search guide.
- About Websearch — The deep web — find out more about the deep web — deep web search.
- ALA — American Library Association.
- Deep Web Research — A gigantic list of resources.
- Deep Web Technologies.
- Lifehacker — How to search the invisible web.
- QProber — Classifying and searching hidden-web text databases.
- The Invisible Web Weblog.
- University of California, Berkeley — Invisible or deep web.