Friday, November 12, 2004

Google Blog

Google Blog: "The documents in Google's index are in dozens of file types from HTML to PDF, including PowerPoint, Flash, PostScript and JavaScript. Together these pages represent a good chunk of the world's information, but hardly all of it. That's why we keep building more advanced systems for crawling the web and creating more sophisticated indices to sort what we find. So 8 billion pages is a milestone worth noting, but it's not the end of the road. The real test is how well we do in finding what you want from within those pages. We'll keep improving that too. "

Tuesday, November 09, 2004

eyefortravel.com - Travel Distribution News, Events and Analysis

eyefortravel.com - Travel Distribution News, Events and Analysis: "According to ClickZ News, Google has published its much-anticipated AdWords content policy. The policy outlines what is acceptable in its text advertisements and what is unacceptable"

Getting to Know Gmail

Getting to Know Gmail: "EmailLabs estimates that between 1.5 million and 2 million people are now registered Gmail users, and projects that this base could grow to between 5 million and 10 million over the next year. "

Affiliate sites in Google: thread & A study of host pairs with replicated content

Anyone besides me not swallowed the "Hilltop" magic pill yet?: Posred by "caveman Nov 4, 2004 (utc 0) WRT Hilltop, there are two differnet areas of assessment that we have paid a lot of attention to:

1) affiliation, and its consequences

2) themed links, and their consequences

Tested: "One thing we did was to identify a pair of very similar sites in different categories. The sites were deemed similar by virtue of size, construction, PR, linking patterns, and performance in the SERP's. Call them site A and site B.

For site A we went and got 20 good backlinks (PR 6-7) from non-affiliated sites, in categories unrelated to site A's category. No help; the site stayed buried.

For site B we went and got 8 good backlinks (PR 5-7) from closely related sites (two hubs, six authority). Within four weeks site B had popped back to its former glory while most webmasters in the immediate post Florida environment were still bemoaning the disappearance of their sites...

caveman concludes: "post Florida the URL's were typically associated with authority sites. Before Florida, when we saw that, the URL's more typically reflected high PR pages. The assumption here is that a really important backlink is displayed, but that seems a good assumption to me.

On a related note, though I can't call this technically Hilltop, we have virtual certainty that links from unaffiliated, relevant pages that are tightly connected to our own topics perform better than identical links from unrelated pages, for certain kw searches"

Caveman later posts: "The way I read it, the Hilltop/LocalRank "affiliate" filter is quite subtle... ...would need a pretty heavily cross/interlinked domain farm targeting a single category with relatively few "outside" links for a dramatic drop in the SERPs.

ciml; "Monika Henzinger co-wrote an interesting paper on affiliation detection " A study of host pairs with replicated content

we define two hosts to be mirrors if:

The paper proceeds as follows: in Section 2 we establish a classification of mirroring; Section 3 describes our approach to detecting and classifying mirrored hosts; Section 4 presents data from our experiment; Section 5 discusses motives for mirroring; Section 6 presents other applications of this technique; Section 7 mentions related work and in Section 8 we draw some conclusions.

A high percentage of paths (that is, the portions of the URL after the hostname) are valid on both web sites, and These common paths link to documents that have similar content. Therefore, hosts that replicate content but rename paths are not considered mirrors under our definition

Get Banned Fast: mildly amusing thread

Get Banned Fast: ". take a domain with already an established PR
2. name it google"

Monday, November 08, 2004

Slashdot | Google Image Index Just Not Updated

Slashdot | Google Image Index Just Not Updated: "We ran a story earlier today about the lack of Abu Ghraib photos in Google's image index. We now have a response from Google stating that the image index simply hasn't been updated recently, as well as a fairly convincing demonstration from a Slashdot reader"