search engine optimisation  
Site search engine optimisation by MakeMeTop
  UK Search Engine Listings » | national | regional | specialist | UK SEO firms | submit a site |  
  Articles » | recent articles | archived articles | news | submit an article |  
  Resources » | forum | newsletter | SEO resources | webmaster resources | suggest a site |  
  Info » | about us | contact us | feedback | site credits |  
:: Home » Articles » | archived articles |
 



Archived Articles


The next step in searching: trawling the Deep Net
  Megan Hamilton | 28 September 02
 

Searching is about to change as a new breed of search engine enters the market to uncover the vast array of information that current engines miss. Research shows that today's search engines only index a fraction of the information available on the Internet.

A survey conducted by U.S.-based BrightPlanet estimates the web is some 500 times larger than indicated by the results provided by popular search engines such as Yahoo!, AltaVista and Google.Traditional search engines are simply scratching the surface of the vast information reservoir because the majority use crawlers to index the web.

But a wealth of information lies in searchable databases that just cannot be accessed by such tools.Search-engine crawlers can index only static pages, rather than the dynamic information stored in databases.

That means they are able to index the home pages of large databases, but generally go no further. For example, when searching for a movie, a crawler will lead you to the Internet Movie Database's homepage (www.imdb.com), but it will not be able to search the valuable film information contained therein. Crawlers can lead you right up to the front door, but they can't take you inside.

And it is not just a problem of the depth of information that can be accessed but also of timeliness.The limitations of traditional search engines became particularly apparent after the events of Sept. 11, when Google and its ilk were unable to provide current results for keywords such as "World Trade Center" and "terrorist attack."In indexing sites, engines only show the past while ignoring the present.This inability to access valuable data has created a major information gap for Internet searchers using traditional engines, and the deficit has spawned a number of new industry terms, such as "deep Net," "invisible web" and "deep web."

Deep Net best describes this unchartered territory, as the information is not actually invisible and valuable information resources extend beyond the web.The deep Net consists of a vast array of information gold mines contained in specialist databases from business associations, universities, libraries and government departments.

A good example of a deep Net resource is AustLII - the Australasian Legal Information Institute (www.austlii.org). AustLII is one of the largest sources of Australasian legal materials on the Net, with more than 7GB of raw text materials and more than 1.5 million searchable legal documents.The challenge for the search-engine market is how to unlock these quality information resources for the online searcher.

A few sites have made the deep Net more accessible by identifying some of the resources that are normally overlooked by traditional search engines. Invisible Web, for example, provides a comprehensive list of niche databases that help a searcher locate a specialist resource.Another site, Purge, has adopted a two-tiered approach: The first step is to select a category, such as "legal," and the second is to select a particular database from this category.Turbo10 enables searchers to query multiple specialist databases at one time.

Deploying searchengine adapter technology, Turbo10 is able to dynamically connect to hundreds of deep Net databases in all topic categories and languages.Turbo10's metasearch functionality allows it to query up to 10 heterogeneous databases, retrieve the results and package them into one consistent interface.Turbo10 has started to divide resources into topic-specific collections, such as legal, reference, news, sports and finance. Each collection combines a selection of surface web (traditional search) and deep Net engines to broaden the search.

For example, the legal collection also connects to a host of niche legal sites, such as AustLII and CanLII (Canadian Legal Information Institute), enabling searchers to perform specific queries across targeted legal databases.By tapping into the deep Net, this new breed of search engine is expanding the depth of information available to the online searcher and increasing the speed of accessing it.

The challenge is to make sure searchers don't drown in the abundance of information in the deep Net. Search engines must not jeopardize quality for the sake of quantity.As more deep Net engines are brought to the surface, it becomes difficult to choose which databases should be queried for each search.

For example, when a user searches for "cats," is that person looking for a general description, a research paper, the latest news, the musical or the acronym?The future of search-engine technology lies in the ability to solve this matching problem and to draw the most relevant results from both the surface web and the deep Net.

   
  Megan Hamilton, is a co-founder of Turbo10 Search Engine. Email her at
megan@turbo10.com
 
Reproduction rights
You may reproduce this article in any format as long as the content is not edited and the "About the author" portion above remains intact.
 


 
  Use our constantly updated list of the most popular UK search engines and directories:
  1) Google UK
  2) Yahoo
  3) MSN
  4) AOL
  5) Ask Jeeves
   
   
   
   
   
   
   
   
 
 
 
 
National Listings | Regional Listings | Specialist Listings | Recent Articles | SEO Resources | About us | Contact us | Feedback | Home  
© SearchEngineSpy 2003