WEB CRAWLER ALGORITHM
| Size |
12
Million Urls (approx) |
| Spider Class |
Shallow |
| Meta Tag support |
YES |
| Frame support |
NO |
| Image Map support |
NO |
| ALT Text support |
NO |
| HTML Comments |
NO |
| Url Searching |
YES |
| Embedded directory |
YES |
| Summission
URL |
http://www.webcrawler.com/info/add_url/ |
SUBMISSION POLICY
Web Crawler is unique among the search engines discussed in this document.
For one thing, Web Crawler is one of the oldest search engines and one
of the smallest. Although Web Crawler and Excite merged a while ago,
Web Crawler has managed to maintain its own unique system.
Web Crawler has this to say about their submission policy (see ref.)
It seems that more and more website owners and designers have been
"spamming" -- including unsolicited, extra or irrelevant information
on their pages, usually in the form of word lists -- in order to make
search engines display them at the top of their listings. This practice
is something that we strongly discourage. Searching the Internet is
our business, and spamming actively interferes with providing everyone
who uses the World Wide Web with the best search engine we possibly
can.
In order to make our index cleaner and more navigable, and to foster
a more level playing field for everyone, we've started removing these
pages from our index and screening new submissions. If you load pages
with long, repetitve word lists, or titles this will cause WebCrawler
either to ignore the repetition or, in some cases, to ignore such documents
entirely.
Web Crawler is the first search engine here to openly admit that multiple
titles are considered spamming.
HTML FACTS
Web Crawler indexes all text on a page (up to 1 megabyte). Web Crawler
does not provide support for frames or imagemaps. Additionally Web Crawler
ignores comments and alt text.
Web Crawler was the first system to implement an artificial intelligence
routine to generate a summary for an entry. However they quickly saw
the problems in using this method and decided to offer support for the
meta description tag. Should your page omit the meta description tag,
Web Crawler will invoke their AI routine to determine a summary for
your page.
RANKING METHODS
Web Crawler says this about their ranking method (see ref.)
1.Use a title uniquely descriptive of your page or site. Since WebCrawler's
indexing/relevance algorithm gives slightly more weight to titles than
to body text pages with titles containing dead-weight words like "Homepage"
or "Home Page on the WWW" don't often get easily found.
2.Make sure that the main page of the site describes to the fullest
extent possible what the site's about. It doesn't have to be over-long
and exhaustive, but as much text with the important words in it as you
can possibly have without sacrificing the design/layout of the site
will help on the indexing front
The first item is fairly standard, Web Crawler likes to see unique
site titles. But the second item is more interesting. For one thing
its an opposite approach to ranking than used by Excite. Web Crawler
wants more descriptive text, not less.
Shown below are two entries taken from Web Crawler, in the first entry,
the page being summarized contained a meta description tag, hence the
summary control remained in the hands of the designer. The second entry
lacked any meta tags whatsoever, the text pulled from the page to generate
the summary came from the bottom of that page, and does not adequately
reflect the content of the page.
SUMMARY
The Web Crawler spider is a shallow spider, so be prepared to submit
your primary pages to them.
Although small by comparison, Web Crawler is backed by the folks at
Excite, which translates into all that AOL exposure. You can't make
a mistake by submitting your site to web crawler.
|