Thursday, September 29, 2005

New GooglePatent Application: Variable personalization of search results in a search engine

United States Patent Application: 0050216434: "Variable personalization of search results in a search engine

Abstract
A search engine provides personalized rankings of search results. A user interest profile identifies topics of interest to a user. Each topic is associated with one or more sites, and a boost value, which can be used to augment an information retrieval score of any document from the site. Search results from any search are provided to the user, with a variable control of the ranking of the results. The results can be ranked by their unboosted information retrieval score, thus reflecting no personalization, or by their fully or partially boosted information retrieval scores. This allows the user to selectively control how their interests affect the ranking of the documents. "

Cre8: Variable personalization of search results - Google - Cre8asite forums bragadocchio writes: "This invention would enable a searcher to fill out a profile, perform a normal search, and then use a slider button to indicate how much his or her personal information from the profile should be used to modify (rerank) that search based upon the personalization information that they have entered into the profile, by sliding the button partially, or all the way to a full influence on the results. "

He highlights the following "interesting aspects of this invention is the discussion on how certain sites are determined to be related to the specific topics.

Quote:

[0045] where nih.gov, cdc.gov, and med.Stanford.edu are various sites that have been determined (either manually or automatically) to be related to the topic "Health". In other words, for each topic in the directory, there is a set of sites that have been determined to be relevant to the topic, and for each of these sites, a boost value is defined.

[0046] The boost for the sites listed in the topical directory is generally determined as follows:

[0047] a) A "site graph" is generated where nodes of the graph are sites (basically, pages on the same host) and edges between nodes are weighted based on the number of pages from one site that link to pages on another. This same type of graph can be used to compute all topic boost maps.

[0048] b) For each topic in the directory, say "Health", a number of sites are selected as "start sites" S0 whose home page is listed in the Open Directory. For example, for a university like Stanford, start sites may be selected as any site ending in .stanford.edu.

[0049] c) A computation is run in two passes:

[0050] i) first identify a set of sites S1 that are linked-to heavily by those sites in S0, with each site in S1 assigned a weight according to how heavily it's linked-to by sites in S0.

[0051] ii) then identify those sites S2 that are linked-to heavily by those sites in S1, weighted as with S1.

[0052] d) The sites in S2 are boosted with their assigned weights."


No comments: