| |
Searching is
about to change as a new breed of search engine enters the market
to uncover the vast array of information that current engines miss.
Research shows that today's search engines only index a fraction
of the information available on the Internet.
A survey conducted by U.S.-based BrightPlanet estimates the web
is some 500 times larger than indicated by the results provided
by popular search engines such as Yahoo!, AltaVista and Google.Traditional
search engines are simply scratching the surface of the vast information
reservoir because the majority use crawlers to index the web.
But a wealth of information lies in searchable databases that just
cannot be accessed by such tools.Search-engine crawlers can index
only static pages, rather than the dynamic information stored in
databases.
That means they are able to index the home pages of large databases,
but generally go no further. For example, when searching for a movie,
a crawler will lead you to the Internet Movie Database's homepage
(www.imdb.com), but it will not
be able to search the valuable film information contained therein.
Crawlers can lead you right up to the front door, but they can't
take you inside.
And it is not just a problem of the depth of information that can
be accessed but also of timeliness.The limitations of traditional
search engines became particularly apparent after the events of
Sept. 11, when Google and its ilk were unable to provide current
results for keywords such as "World Trade Center" and
"terrorist attack."In indexing sites, engines only show
the past while ignoring the present.This inability to access valuable
data has created a major information gap for Internet searchers
using traditional engines, and the deficit has spawned a number
of new industry terms, such as "deep Net," "invisible
web" and "deep web."
Deep Net best describes this unchartered territory, as the information
is not actually invisible and valuable information resources extend
beyond the web.The deep Net consists of a vast array of information
gold mines contained in specialist databases from business associations,
universities, libraries and government departments.
A good example of a deep Net resource is AustLII - the Australasian
Legal Information Institute (www.austlii.org).
AustLII is one of the largest sources of Australasian legal materials
on the Net, with more than 7GB of raw text materials and more than
1.5 million searchable legal documents.The challenge for the search-engine
market is how to unlock these quality information resources for
the online searcher.
A few sites have made the deep Net more accessible by identifying
some of the resources that are normally overlooked by traditional
search engines. Invisible Web, for example, provides a comprehensive
list of niche databases that help a searcher locate a specialist
resource.Another site, Purge, has adopted a two-tiered approach:
The first step is to select a category, such as "legal,"
and the second is to select a particular database from this category.Turbo10
enables searchers to query multiple specialist databases at one
time.
Deploying searchengine adapter technology, Turbo10 is able to dynamically
connect to hundreds of deep Net databases in all topic categories
and languages.Turbo10's metasearch functionality allows it to query
up to 10 heterogeneous databases, retrieve the results and package
them into one consistent interface.Turbo10 has started to divide
resources into topic-specific collections, such as legal, reference,
news, sports and finance. Each collection combines a selection of
surface web (traditional search) and deep Net engines to broaden
the search.
For example, the legal collection also connects to a host of niche
legal sites, such as AustLII and CanLII (Canadian Legal Information
Institute), enabling searchers to perform specific queries across
targeted legal databases.By tapping into the deep Net, this new
breed of search engine is expanding the depth of information available
to the online searcher and increasing the speed of accessing it.
The challenge is to make sure searchers don't drown in the abundance
of information in the deep Net. Search engines must not jeopardize
quality for the sake of quantity.As more deep Net engines are brought
to the surface, it becomes difficult to choose which databases should
be queried for each search.
For example, when a user searches for "cats," is that
person looking for a general description, a research paper, the
latest news, the musical or the acronym?The future of search-engine
technology lies in the ability to solve this matching problem and
to draw the most relevant results from both the surface web and
the deep Net.
|