For the last several years,
Google has maintained the largest index of pages on the World Wide Web (see
SearchEngineWatch.com).
Today, in the later half of 2003, it is estimated that there are over 3.3 billion pages indexed by
Google's database of web pages.
This document is about
searching Google. As you will soon
find out, Google has its own system of
search features and devices. You must know them in order to make
effective use of Google for your Internet
searching needs.
Search Engines
vs
Web Directories
First, understand that
Google is a search
engine. In other words, it uses a "spider" to "crawl" the
World Wide Web, looking for new pages and changed, updated pages (and other
documents, like Word files and pdf files) that it did not know about
previously or that have changed since the last time it found a page at a
particular URL, and adds that new data to its already huge database that
indexes the World Wide Web.
It is not, in and of itself, a
web directory, as was the very popular
Yahoo! site when it
started. Web directories are datasets that are much, much
smaller--maybe a few thousand up to a couple of million web sites referred
to in a very large web directory like Open
Directory Project (see below) as opposed to over 3 billion web pages
referred to in a search engine like Google.
Web directories, however, are very, very useful finding tools because they
are filled with web sites that have been classified by human beings who
are experts in some area of information. Search engines are added to
by programs which scan the publicly-accessible World Wide Web for new or
updated pages; web directories are organized, carefully chosen lists of
the better and best web sites dealing with all possible topics.
Indeed, web directories are so
important that many search engines actually include web directory services
at their sites as well as their all-web search engines.
Google, for example, has a separate tab or
button labeled "Directory" which lays out 16 general categories under
which, altogether, over 3 million web sites are organized and pointed to.
Google didn't create this information;
Google licensed the right to list the web
directory services of the Open Directory
Project (ODP):
The ODP is also known
as DMOZ, an acronym for Directory Mozilla. This name reflects its
loose association with Netscape's Mozilla
project, an Open Source browser initiative. The ODP was developed in
the spirit of Open Source, where development and maintenance are done
by net-citizens, and results are made freely available for all
net-citizens.
Basic Search
Mechanisms
![](top.jpg)
The default page brought up by
http://google.com, shown above, is the
input box to search the World Wide Web.
There are three main types of
web searches that Google facilitates:
-
searching the Web itself, from
this main page (the default condition)
-
browsing through a directory
of web sites made available by Open Directory Project (ODP)
-
an advanced search page
You should also notice that
Google has five tabs above its search box, with the other four,
besides the Web, being Images, Groups, Directory, and News. One of
them, Directory (circled item #2), has already been mentioned--it is the
Google presentation of Open Directory Project. The
other tabs can be easily used as well: Images allows the user to locate
images that are used on web pages on the Internet; Groups allows the user
to search through the millions of messages left on newsgroups since the
early 80's; News allows users to get current news in a variety of areas,
and to search for past news stories.
The area to the right of the
search box contains three text links:
Advanced Search
Preferences
Language Tools
We will return to Preferences
and Language Tools later; Advanced Search will be taken up below.
Basic Indexing Features
-
Lower Case and Upper Case.
Upper case and
capitalization of search words has no impact or change in what
results one gets from Google: all words are stored in
Google's index as lower case, and all searches
using any combination of upper and lower case letters obtains the same results:
Homer
HOMER
homer
-
Stop Words. there are a
small number of very
common words that Google does not use for indexing purposes.
Called "stop words" or "delete words," these
two-dozen or so words are
used frequently in our English language text, but are also, therefore, of little
retrieval value--they are common language "placeholders" that
don't allow us to discriminate effectively for search purposes between
pages that have them and web pages that don't have them. It makes
little sense to say you wish to search for only those web pages that
contain the word "in" for example.
If a
glossary of all the words appearing in all the WWW's pages were
produced, most of the occurrences would be to words like "of," "for,"
"by," "with," "to," etc. Since we have
evidence that these words are of almost no search significance, the
producers of search engine indexes like
Google save extraordinary amounts of database space by not indexing
according to the occurrences of these almost insignificant words.
But, there are sometimes special circumstances under which a user might
wish to be able to search according to these words, and you should
know that special procedures are made available to you to force
Google
to search for the occurrence of the word, say "the," in a search result
("the Who," for example).
Google stop words
a |
at |
in |
that |
when |
about |
be |
is |
the |
where |
an |
by |
it |
this |
which |
and |
for |
of |
to |
who |
are |
from |
on |
was |
will |
as |
I |
or |
what |
with |
-
Forced Inclusion of a Stop
Word. Although stop words are normally not searched for by
Google, they can be forced into
a Google search specification in one of two ways. First, one may
place a plus sign (+) directly in front of the word that Google will
usually not include in a search: the plus means that following word
must be found in the search results.
Second, one may simply include the usually excluded word in a phrase,
enclosed in quotes:
already paid +for
"already paid for"
Basic Search Features
-
Default logical AND
operation. When the user puts more than
one word into the search box, the engine assumes that the user wishes for
the two (or more) words to be ANDed together. In other words, the
default logical operation used by Google in the absence of any
specification at all is the logical AND operator. Please also note
that the logical operators are always specified in capital
letters by the user.
cats dogs
. . . will retrieve the same results
as . . .
cats AND dogs
-
Logical OR condition. When the user seeks to
expand search results by giving Google several different words or
by specifying a list of synonyms or
near-synonyms, the user should specify the OR logical operator between
each of the words. In other words, the OR operator is not implied,
as is the default condition of the logical AND operator. One must
indicate the logical OR by placing it, in capital letters, between two
words:
cats OR dogs
-
Logical negation. The last logical
operation--negation--is performed in Google by placing a minus sign
directly in front of the word. Therefore, were we searching for
web pages that were about cats, but not about dogs, we would use this
formulation:
cats -dogs
As shown above in the
discussion of stop words, this
phrase-binding process can include words that
Google normally doesn't allow the
user to specify:
"gone with the wind"
"vitamin a"
This phrase searching
technique is particularly useful in finding odd or non-standard
phraseology, in pieces of well-known text. Were you
searching for a copy of Lincoln's Gettysburg address, for example, you
should go ahead and specify . . .
"four score
and seven years ago our fathers"
Punctuation, by the way, is
ignored, so either of these specifications will tally the same result:
"four score
and seven years ago, our fathers"
Advanced Search
Features
Advanced Search Menu
Page
![](advsrchpage.jpg)
Google's Results
Layout
|