Wednesday, January 14, 2004

google duplicates - ie resellers, affiliates etc

ResourceShelf: "See Also: 'Challenges in Web Search Engines'
This twelve-page paper was written by Dr. Monika Henzinger (Research Director, Google), Dr. Rajeev Motwani (Professor at Stanford) and Dr. Craig Silverstein (Director of Technology, Google). From the abstract, '...article presents a high-level discussion of some of the problems with information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.' Content quality, spam, cloaking, duplicate hosts, and vaguely structured data are some of the topics discussed."

More on Google's Upcoming (We Think) IPO
The article quotes Peter Norvig, Google's director of search quality saying, "Google is all about providing access to information, not being a venue for ads." The article continues, "Norvig said that although the company offers paid search results, they're always clearly marked and never get in the way of the objective results that remain the company's focus."

And while we're on the Google beat a couple of comments about this quote and other matters.


A comment or two.
1) All of the major web engines are doing a good job of labeling paid content. The problem is that many people have no idea what the differences is vs. non-paid content.

2) One problem that Google runs into is how much of their database is duplicate or near duplicate content, something that the traditional research database tries to avoid. Expect more on this soon (much more study is needed) but a few examples right now

No comments: