I went back and had a look at the Yahoo Search Content Quality Guidelines the other day after Michael Campbell mentioned them in his ezine. They tell us that Yahoo wants to index:

.

.

.

.

.

  1. Original and unique content of genuine value
  2. Pages designed primarily for humans, with search engine considerations secondary
  3. Hyperlinks intended to help people find interesting, related content, when applicable
  4. Metadata (including title and description) that accurately describes the contents of a web page
  5. Good web design in general

Immediate Thoughts:

No revelations in (1) & (2).
(3) Seems to shout “anchor text,” but probably simply refers to linking in general since it’s said that for years
(4) Makes it plain that the title and meta description tag ARE important.
(5) Reinforces that search engines like html that validates, with structured sites full of good old plain links.

So what is search engine spam to Yahoo? Well, here’s what Yahoo says it doesn’t want in its index:

  1. Pages that harm accuracy, diversity or relevance of search results
  2. Pages dedicated to directing the user to another page
  3. Pages that have substantially the same content as other pages
  4. Sites with numerous, unnecessary virtual hostnames
  5. Pages in great quantity, automatically generated or of little value
  6. Pages using methods to artificially inflate search engine ranking
  7. The use of text that is hidden from the user
  8. Pages that give the search engine different content than what the end user sees
  9. Excessively cross-linking sites to inflate a site’s apparent popularity
  10. Pages built primarily for the search engines
  11. Misuse of competitor names
  12. Multiple sites offering the same content
  13. Sites that use excessive pop-ups, interfering with user navigation
  14. Pages that seem deceptive, fraudulent, or provide a poor user experience

Immediate Thoughts:

(1) Misleading spam pages with nothing to do with the terms they are optimised for

(2) Redirects, which are especially prevalent amongst affiliate marketers, and Doorway Pages

(3) Duplicate content in any form, whether scrapes, articles or datafeeds

(4) Virtual hostnames actually means “maintaining more than one server on one machine, as differentiated by their apparent hostname,” so I’m unsure whether Yahoo has its language confused and is refering to sites using many subdomains when a directory would do fine, or to the use of several variations on a domain name all pointing back to the same folder. In either case, the point is in regard to doing it simply to get a greater presence in the search engines.

(5) Obviously auto-generated content, including vanilla datafeed sites and unmodified RSS feeds

(6) Too many examples to list!

(7) Yes, people DO still do this. Only now most use CSS. But SE’s are making inroads into reading CSS too.

(8) Cloaking done to inflate rankings, with a specially optimised page served to the search engine crawlers and a normal page to regular visitors.

(9) Link networks, mini nets, blog farms, etc.

(10) Well who else would you build them for if you want them read? Sorry! :P Over optimised to the point usability and value is impaired.

(11) Registering misspellings of well-known names, piggy-backing on competitor names to rank, attempting to sabotage competitor rankings, etc.

(12) Again to gain greater exposure in the SERPS, but not much of an issue with todays duplicate content filters.

(13) Clear enough.

(14) Catch-all for anything else.

Unfortunately though — and I’m guessing Michael doesn’t realise this — the Yahoo Search Content Quality Guidelines have not changed much in years. In fact, the current page is almost identical to the Inktomi Content Guidelines (the search engine Yahoo bought years ago because they didn’t have their own) from back in June 2002! (The link sends you to the original page as stored in the wayback.org index).

It makes you wonder how accurate the Yahoo Search Content Quality Guidelines really are …

Technorati Tags: , , , ,