Friday, October 17, 2003

Google Guy - URLS & session ids

Google makes inferences about URL paramters: "Google can do some smart stuff looking for duplicates, and sometimes inferring about the url parameters, but in general it's best to play it safe and avoid session-ids whenever you can. ...A session-id lets a site owner give each user a unique identifier. That identifier can reference customer data like the shopping cart contents stored in a database, for example.
Some people would mention that you could use a cookie to do the same thing and keep the urls much cleaner. That's true, but not every user has cookies enabled. Using session-id's is one way to try to guarantee that you know that state of a user, even if they don't allow cookies, for example.

So what's the problem with a session id, and why doesn't Googlebot crawl them? Well, we don't just have one machine for crawling. Instead, there are lots of bot machines fetching pages in parallel. For a really large site, it's easily possible to have many different machines at Google fetch a page from that site. The problem is that the web server would serve up a different session-id to each machine! That means that you'd get the exact same page multiple times--only the url would be different. It's things like that which keep some search engines from crawling dynamic pages, and especially pages with session-ids.



"

No comments: