Tuesday, January 06, 2004

Slashdot | Better Search Results Than Google?

How people search...
Slashdot | Better Search Results Than Google?: "Searching for info about electronic products is the worst on google.
I use the following along with any thing i want to search and it usually does the trick
-shop -shopping -price -buy -order -shipping'.
This no doubts subtracts one or two sites which are good but atleast filters out most of the shopping sites."...

The thing I have noticed to be the greatest single limit on web searching is the operator. I can regularly find things on the net that my co-workers cannot. This is because I understand keyword boolean searching at a deeper level than most people.

I blame this on the level of education of the common population, as opposed to being evidence of my own superiority. 8-)

In a world where most people have never actually met or "dealt with" a librarian (archivist, whatever 8-) it should surprise nobody that these self-same people have no idea what it means to take personal responsibility for organizing their own approach to knowing things.

Having grown up near and actually talked to librarians all my life I actually understand how to group information. Applying that knowledge to a search for some words and against others isn't that far a stretch.

It is a personal pet peve of mine to have to listen to people bemoan Google (etc.) when these self-same people have never even *noticed* the advanced search link, nor even learned the power of the minus ("-") in the standard search bar.

There is no technology that can "fix" bad user inquiries that won't in turn "ruin" good ones.

From my own experience with developing search technologies for an e-content site, these guys are on the right track. Compared to a lot of search technologies out there, Google is dumb. But it is blazing fast, general purpose, and smarter than most of its (former) compettitors. Part of why it is dumb is that it is so general purpose. To make a search engine smarter, you have to add context. Specialized search engines can do this by standardizing their inputs. Google could do this too, but it would require complex parsing of everything that it spiders.

Another thing that Google really lacks is detection of duplicates. Google tries to do this, but does it poorly. I remember recently doing a search on Google for an obscure DB2 error code, and getting the same page out of the IBM manual over and over again, all on different college websites.
This is another area where linguistic/statistical analysis could really help. Most knowledge-base products offer a "More Like This" feature that is an index of linguistic similarities between items. An easy way to detect duplicates with such a system is to have a fine scale and place an uppler limit on similarities, i.e. any two items with a similarity > N are likely to be duplicates.


All of this being said, I would be surprised if Google does not address these issues in the very near future. I do not think they have gone down the path that many large companies go down where they stop trying to innovate and instead just try to protect their turf

No comments: