Searching with Google

Table of Contents

Search Engines vs Web Directories Googles Results Layout
Basic Search Mechanisms   
    Basic Indexing Features  
    Basic Search Features  
Advanced Search Features  
    Advanced Search Menu Page  
    Term Qualification  

For the last several years, Google has maintained the largest index of pages on the World Wide Web (see SearchEngineWatch.com).  Today, in the later half of 2003, it is estimated that there are over 3.3 billion pages indexed by Google's database of web pages.

This document is about searching Google.  As you will soon find out, Google has its own system of search features and devices.  You must know them in order to make effective use of Google for your Internet searching needs.

Search Engines vs Web Directories

First, understand that Google is a search engine.  In other words, it uses a "spider" to "crawl" the World Wide Web, looking for new pages and changed, updated pages (and other documents, like Word files and pdf files) that it did not know about previously or that have changed since the last time it found a page at a particular URL, and adds that new data to its already huge database that indexes the World Wide Web.

It is not, in and of itself, a web directory, as was the very popular Yahoo! site when it started.  Web directories are datasets that are much, much smaller--maybe a few thousand up to a couple of million web sites referred to in a very large web directory like Open Directory Project (see below) as opposed to over 3 billion web pages referred to in a search engine like Google.  Web directories, however, are very, very useful finding tools because they are filled with web sites that have been classified by human beings who are experts in some area of information.  Search engines are added to by programs which scan the publicly-accessible World Wide Web for new or updated pages; web directories are organized, carefully chosen lists of the better and best web sites dealing with all possible topics.

Indeed, web directories are so important that many search engines actually include web directory services at their sites as well as their all-web search engines.  Google, for example, has a separate tab or button labeled "Directory" which lays out 16 general categories under which, altogether, over 3 million web sites are organized and pointed to.  Google didn't create this information; Google licensed the right to list the web directory services of the Open Directory Project (ODP):

The ODP is also known as DMOZ, an acronym for Directory Mozilla.  This name reflects its loose association with Netscape's Mozilla project, an Open Source browser initiative.  The ODP was developed in the spirit of Open Source, where development and maintenance are done by net-citizens, and results are made freely available for all net-citizens. 

Basic Search Mechanisms

The default page brought up by http://google.com, shown above, is the input box to search the World Wide Web. 

There are three main types of web searches that Google facilitates:

  1. searching the Web itself, from this main page (the default condition)

  2. browsing through a directory of web sites made available by Open Directory Project (ODP)

  3. an advanced search page

You should also notice that Google has five tabs above its search box, with the other four, besides the Web, being Images, Groups, Directory, and News.  One of them, Directory (circled item #2), has already been mentioned--it is the Google presentation of Open Directory Project.  The other tabs can be easily used as well: Images allows the user to locate images that are used on web pages on the Internet; Groups allows the user to search through the millions of messages left on newsgroups since the early 80's; News allows users to get current news in a variety of areas, and to search for past news stories.

The area to the right of the search box contains three text links:

Advanced Search
Preferences
Language Tools

We will return to Preferences and Language Tools later; Advanced Search will be taken up below.

Basic Indexing Features

  • Lower Case and Upper Case.  Upper case and capitalization of search words has  no impact or change in what results one gets from Google: all words are stored in Google's index as lower case, and all searches using any combination of upper and lower case letters obtains the same results:

    Homer

    HOMER

    homer

  • Stop Words.  there are a small number of very common words that Google does not use for indexing purposes.  Called "stop words" or "delete words," these two-dozen or so words are used frequently in our English language text, but are also, therefore, of little retrieval value--they are common language "placeholders" that don't allow us to discriminate effectively for search purposes between pages that have them and web pages that don't have them.  It makes little sense to say you wish to search for only those web pages that contain the word "in" for example.

    If a glossary of all the words appearing in all the WWW's pages were produced, most of the occurrences would be to words like "of," "for," "by," "with," "to," etc. Since we have evidence that these words are of almost no search significance, the producers of search engine indexes like Google save extraordinary amounts of database space by not indexing according to the occurrences of these almost insignificant words.  But, there are sometimes special circumstances under which a user might wish to be able to search according to these words, and you should know that special procedures are made available to you to force Google to search for the occurrence of  the word, say "the," in a search result ("the Who," for example).

    Google stop words
     

    a at in that when
    about be is the where
    an by it this which
    and for of to who
    are from on was will
    as I or what with

 

  • Forced Inclusion of a Stop Word.  Although stop words are normally not searched for by Google, they can be forced into a Google search specification in one of two ways.  First, one may place a plus sign (+) directly in front of the word that Google will usually not include in a search: the plus means that following word must be found in the search results. 

    Second, one may simply include the usually excluded word in a phrase, enclosed in quotes:

already paid +for

"already paid for"

Basic Search Features

  • Default logical AND operation. When the user puts more than one word into the search box, the engine assumes that the user wishes for the two (or more) words to be ANDed together.  In other words, the default logical operation used by Google in the absence of any specification at all is the logical AND operator.  Please also note that the logical operators are always specified in capital letters by the user.

    cats dogs

    . . .
    will retrieve the same results as . . .

    cats AND dogs

  • Logical OR condition. When the user seeks to expand search results by giving Google several different words or by specifying a list of synonyms or near-synonyms, the user should specify the OR logical operator between each of the words.  In other words, the OR operator is not implied, as is the default condition of the logical AND operator.  One must indicate the logical OR by placing it, in capital letters, between two words:

    cats OR dogs

  • Logical negation.  The last logical operation--negation--is performed in Google by placing a minus sign directly in front of the word.  Therefore, were we searching for web pages that were about cats, but not about dogs, we would use this formulation:

cats -dogs

  • Phrase Binding.  To ask Google to search for a string of words occurring in a fixed order, enclose the words--what we call a phrase--in quotes. 

As shown above in the discussion of stop words, this phrase-binding process can include words that Google normally doesn't allow the user to specify:

"gone with the wind"

"vitamin a"

This phrase searching technique is particularly useful in finding odd or non-standard phraseology, in pieces of well-known text.  Were you searching for a copy of Lincoln's Gettysburg address, for example, you should go ahead and specify . . .

"four score and seven years ago our fathers"

Punctuation, by the way, is ignored, so either of these specifications will tally the same result:

"four score and seven years ago, our fathers"

Advanced Search Features

Advanced Search Menu Page

  •  

Google's Results Layout