Tuesday, January 20, 2004

Semantics -> High Rankings Search Engine Optimization Forum

Semantics -> High Rankings Search Engine Optimization Forum: This "paper that describes the processes that were going on over at Applied Semantics before Google bought them up. It's long and you probably won't even be able to get it in a single read - pretty heavy stuff in there. If you can muscle your way through and then go back to the stuff you aren't clear on, you'll have an excellent foundation for understanding how all this works.

Cover Page: "Patterns in Unstructured Data
Discovery, Aggregation, and Visualization"

This paper doesn't deal specifically with Google, but is an overview of the entire thing (though it does explain some potential uses for it - a search engine included). Once you wrap your mind around the concepts, you can almost look at Google and see it happening.

Also bear in mind that this is only part of it. There's Topic Sensitive PageRank and LocalRank that are also somewhat new to the game (within the last year) and vastly overlooked (I suppose due to the complexities of it all) by most. With the semantics kicking in, the levels of integration of these two are also boosted considerably....

Interesting theory:
So, in your competitive markets where people vie for the same terms and everyone is optimizing pages for the "terms" and not the "concept" of a "martin guitar" owned by "Scott Rahin", the whole semantics thing falls apart.

In the end, if there's a sector like this (think "real estate" and "hotels" and "airfares" and other highly optimized and competitive areas) then the semantics have no hope of working so it either works poorly, or Google has something in there to kick that part of the algo out. ...

Comments re:August 2009: How Google beat Amazon and Ebay to the Semantic Web (Ftrain.com): "Amazon, remember, scooped up Alexa a while back. Alexa isn't so much a "search" property, but rather, an "indexing engine" which is, as this article points out, the primary problem with the Semantic Web - you have no way of indexing all the documents...

How Consistent Page Structure Allows Google To Extract and Assume Specific Information, we can see how Google doesn't always need an RDF feed to make its Semantic extractions. It just needs to be set up in a way that it can identify what's what. In that example, you can also see that it doesn't necessarily have to be products for which it's extracting the information. And, even if you only have a few pages on your site, you can still be in good shape if there are enough people in your sector using the same "pointer words" and they have employed consistency factors (even if they aren't exactly consistent with your own layout). This post also talks about how URL structure (even though what you name the directories and pages isn't important) can help in this too.

This post about navigational structure helps you understand how you can use site structure to achieve the same types of results. The pyramid scheme (in this case, that's a good thing) allows your "pointers" to be pointing to concepts on the deeper pages and not on the same page as in the first example.

----

I should point out that these techniques now work very well on larger sites.


...the semantics portion of the algo is only going to increase in importance. It does work - if you are presenting the data to them in the right way. There will be a grace period (how long? I dunno) where you'll be able to rank well using the "keyword" model but over the next 12-24 months, expect that model to slowly and progressively become less effective as more and more SEOs embrace the new technology and those patterns and semantic extrapolations become more the norm in web design.
"

NEW CONCEPTS: TSPR (Topic Sensitive PageRank) and LocalRank

ARTICLES & PAPERS REFERRED TO ABOVE:
:August 2009: How Google beat Amazon and Ebay to the Semantic Web (Ftrain.com)

Cover Page: "Patterns in Unstructured Data
Discovery, Aggregation, and Visualization"

Scientific American: The Semantic Web: "The Semantic Web
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities
By Tim Berners-Lee, James Hendler and Ora Lassila "

Integrating Applications on the Semantic Web: "For the Web to reach its full potential, it must evolve into this Semantic Web, providing a universally accessible platform that allows data to be shared and processed by automated tools as well as by people"

RDF Semantics: "This is a specification of a precise semantics, and corresponding complete systems of inference rules, for the Resource Description Framework (RDF) and RDF Schema (RDFS)."

Heavy:


Sitemaps? -> High Rankings Search Engine Optimization Forum: "The page relationships are derived from linking structure, not URL structure. "
How Consistent Page Structure Allows Google To Extract and Assume Specific Information, we can see how Google doesn't always need an RDF feed to make its Semantic extractions. It just needs to be set up in a way that it can identify what's what. In that example, you can also see that it doesn't necessarily have to be products for which it's extracting the information. And, even if you only have a few pages on your site, you can still be in good shape if there are enough people in your sector using the same "pointer words" and they have employed consistency factors (even if they aren't exactly consistent with your own layout). This post also talks about how URL structure (even though what you name the directories and pages isn't important) can help in this too.


Learn From A Forum's Structural Design -> High Rankings Search Engine Optimization Forum: "a sitemap is murder on your PR. PR works in a way where a page has a certain 'number' assigned to it. 80% of that number is divided up equally amongst all the outgoing links on that page and passed along to the next page. So if you had regular navigation that passed through a couple of pages, your site would start with a high PR out front and get a little lower at the next level and a little lower at the next level and so on. Your 'broader' less focused pages (the ones with the more general and more competitive search terms) will have a higher PR and your more specific pages (the ones with focused terms that aren't as competitive and don't need as much PR to rank well) get less PR.

When you link through the sitemap, though, all of your pages get an equal share of a highly dilluted amount of pagerank.

---

Oh - and Google's only going to crawl a certain number of links on a page. I think their site says 100, but I suspect that it may be higher in some cases. PR of the page with all the links on it seems to have a bearing upon how many of those links on it Google will bother to crawl.

So, if your site has 125 pages, then 25 pages of your site will never get crawled. "

No comments: