Search Engines

George Klemic
WWW seminar
Spring 1997

This is my talk on search engines, mainly as presented in the WWW seminar, with some touch ups. Contents:
  1. Getting Started
  2. Basics
  3. Advanced Search Techniques
  4. Nitty Gritty Comparison
  5. A Quick Glance
  6. About Robots
  7. Summary
  8. References

Getting Started

Since this is a presentation on search engines, I figured the best place to go for information was to...the search engines!

Clicking the "Net Search" button on the netscape screen takes you directly to:
http://home.netscape.com/home/internet-search.html
This brings up Yahoo, or one of 4 other search engines.

Since I'm giving a talk on search engines, I decided to start with this as a topic. Typing in "search engines" in the box and clicking search, took me to:
http://search.yahoo.com/bin/search?p=search+engines&a=n
This search returned 3 category matches and 597 site matches.

A category match means that in the subject tree, the search key was encountered in the name of a branch of the tree. A site match means that in either the URL title, or in its desctiption, a match is made.

Category matches made for search engines:

1) Computers and Internet: Internet: World Wide Web: Searching the Web: Search Engines

The link to this location is:
http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/ Searching_the_Web/Search_Engines/ This site can also be reached by choosing "computers and internet" from the main Yahoo menu, then choosing "internet", etc.

What this site contains is links to many search engines. Some popular ones, like AltaVista and WWWW (WWW Worm), and some unfamiliar, including many that are specifically designed to look up information dealing with other countries. We will look at these later.

2) Computers and Internet: Internet: World Wide Web: Searching the Web: Comparing Search Engines

http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/ Searching_the_Web/Comparing_Search_Engines/

This has links to articles comparing different search engines. There are hundreds, of which I looked over about 20-30, and chose some of the better ones to make this presentation. Credits for where articles come from at the references section.

3) Science: Mathematics: Numbers: Specific Numbers: Pi: Search Engines

This has no relevance to search engines as we want. It takes you to a site where you can enter a number and determine if it is in the first 50,000,000 digits of pi. This is just an early lesson that not everything a search engine returns is what you want.

(ps: I was disappointed to see that my SS number did not appear in the first 50 million digits of pi).

Basics

This article by Tyner: Sink or Swim: Internet Search Tools & Techniques, is a good starting place, as the difference between search engines and subject guides is explained, as well as basics on setting up queries. Some statistics are given on a few of the common search engines.

Eagan/Bender wrote a similarly useful article: Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web. Basic search engine strategies are given in a slightly different fashion. A comparison between search engines is available here as well. This comparison is better than many, in that it, in English, tells what the basics of each engine are, and why one would want to use a particular engine over another.

Advanced Search Techniques

In the article How to Search the Web - A Guide to Search Tools, Gray explains many of the functions that are useful in creating a successful search query. These include how boolean conditions are used, wildcards, proximity operators, and word phrases.

Also in this article is a table summarizing advanced features available between many search engines. And, of course, more search tips.

The article Advanced Searching: Tricks of the Trade discusses advanced features of 4 search engines, Alta Vista, InfoSeek, Lycos, and Open Text. Three complex sample queries were tested on each of these search engines. Results and overall conclusions. One important point to note is that, the conclusion made was that no search engine performed best in all cases.

Nitty Gritty Comparison

One frequently asked question is "which search engine is the fastest?" Well...for the answer to that, as well as many other questions comparing search engines, refer to Search Engine Showdown: IW Labs Tests Seven Internet Search Tools.

In Search Engines for the World Wide Web: A Comparative Study and Evaluation Methodology, a detailed study of Alta Vista, Lycos, and Excite is done. Included in this work is numeric data, which seems difficult to come by.

Finally, a large scale analysis was performed in 1995. This analysis was on the number of relevant hits made for subject searches. See Quantitative Analysis of Five WWW "Search Engines".

A Quick Glance

These pages are really short but to the point.

This site has info on size of the database, what its contents are, how searching is done, search tip, how results are returned, address, update frequency, and how to get additional information. This is the only central location that I have found (so far) that has this information for multiple search engines. These include Yahoo, Alta Vista, excite, Hot Bot, infoseek, Lycos, and Open Text.

Finally, here is a one page summary that includes a brief description, pros, and cons to about a dozen search engines.

About Robots

The robot FAQ is the best place to go for beginning information. Included here is information about what a robot is and how to prevent robots from visiting your site. Also information on META tags.

For those interested in creating their own robot, you should read the Guidelines for Robot Writers.

Summary

After reading many articles and trying out numerous search engines, my non-quantitative opinion is to go with Alta Vista.

Here is a compilation of the most valuble tables presented in these articles. No new information is here; it is simply a collection of comparisons and results, all in once place.

References

July 96
Chu, H. and Rosenthal, M. Search Engines for the World Wide Web: A Comparative Study and Evaluation Methodology. Heting Chu Palmer School of Library & Information Science, Long Island University.
http://www.asis.org/annual-96/ElectronicProceedings/chu.html

Apr 96
Eagan, Ann and Bender, Laura (April 1996). Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web. "Untangling the Web. Proceedings of the Conference, April 26, 1996, University of California, Santa Barbara"
http://www.library.ucsb.edu/untangle/eagan.html

Dec 96
Features: Searching is My Business: A Gumshoe's Guide to the Web (12/96). PC World Communications
http://www.ballehs.dk/PCWorld_links.html

July 96
Gray, Terry A. How to Search the Web - A Guide to Search Tools
http://issfw.palomar.edu/Library/TGSEARCH.HTM

Sept 96
Internet Search Tool Details (Digital Library SunSITE). UC Regents.
http://sunsite.berkeley.edu/Help/searchdetails.html

1993
Koster, Martijn. Guidelines for Robot Writers
http://info.webcrawler.com/mak/projects/robots/guidelines.html

?
Koster, Martijn. The Web Robots FAQ
http://info.webcrawler.com/mak/projects/robots/faq.html

Dec 95
Tomaiuolo, N. and Packer, J. Quantitative Analysis of Five WWW "Search Engines"
http://neal.ctstateu.edu:2001/htdocs/websearch.html

Feb 97
Tyner, Ross (1996). Sink or Swim: Internet Search Tools & Techniques. A hands-on workshop at Connections '96, May 10, 1996
http://www.sci.ouc.bc.ca/libr/connect96/search.htm

May 96
Venditto, Gus. Search Engine Showdown: IW Labs Tests Seven Internet Search Tools.
http://www.qcfurball.com/iworld/showdown.html

May 96
Zorn, Emanoil, Marshall, Panek. Advanced Searching: Tricks of the Trade
http://www.onlineinc.com/onlinemag/MayOL/zorn5.html