I am writing this article in response to the recently posted“Google Quality Rater Guidelines 2012″ document, seemingly leaked, speculatively injected into the market. Before I dissect the specimen, let me start with a general Google overview first.
For all practical, technical, scientific, and artistic purposes, Google is not a search engine, at least not for me. Google Search is an advertising engine – period. It is so simply because all the top results it brings are information served to you from rich people, people who have enough SEO money to be on top, or at least enough money to advertise at the top of search terms. In this search mechanism, neither you nor I have any choice for seeing anything other than what other people have decided for us to see based on their financial power. This is not a definition of a search engine in my book, nor should it be in anyone’s.
For the sake of argument, let’s you and I have a thought experiment. Imagine that you have a two columned menu in your newly opened restaurant. Can you see it? On the left, you put your free (or very cheap) appetizers. On the right, you have the main courses, the real money makers. Now, how much attention do you want people to pay to the left column versus the right column? Especially, if you have to give the appetizers (or chicken wings or peanuts) free to draw people in, would you want them to just have the appetizers and then leave?
No. This is the very same delicate issue Google has been secretly battling. Many of Google’s ingenious people earn their lucrative salaries just to maintain this suggested balance only, offering up the freebies to get you to buy the main course – advertiser wares. The rest of Google’s ventures outside this model, are like Christmas ornaments for Wall Street.
To continue, just as the appetizers must be good enough to draw the people in to your restaurant, Google search results on the left column should be good too. But, if you (Google) make the appetizers too delicious, then your whole “business” is in trouble. And, if you make them too yucky (irrelevant), again business will suffer. In addition, you cannot list the same dish under both columns, for fairly obvious reasons. This is the balance I am referring to. Hence, improving Google search relevancy senselessly works against this balance. Google (your restaurant) just has to have the right dose of relevancy - nothing more, and nothing less.
To shed more light on this issue, I will make an example using a medical term, because medical advertising is a very expensive main course.
Google’s Circus Balancing Act
Google Search Term: Headache. The results for this query are shown below (search performed from New York in December 2, 2012. Google results may change in time and by the origin of IP)
Now Mr. Google, are you saying that the Advil.com Website has no good, relevant content that deserves to be in this search results, left column (in appetizers)? Of course it has. But, Advil is already paying an arm-and-a-leg for this term, so why make it a free appetizer? Another obvious oddity here is the situation for Aspirin and its manufacturer Bayer. On the entire first page of results, no mention of these words whatsoever shows for the same term “headache.” How is it possible that one of the most common, popular, and well-known medicine in the entire world for headache is nowhere to be seen? Is it not relevant? Where is bayer.com ? Where is aspirin.com?
One explanation could be that Bayer does not need Google to sell its aspirin since it is such a house-hold name. Thus, Bayer, probably, neither invests on SEO nor advertises its product. If you type “Aspirin” then you see Bayer and Aspirin in the appetizer menu, supporting the argument above: they are not getting into a bidding war for the term “headache.” Again, we are explaining search performance in terms of economics, hence the starting line “Relevancy not relevant to its success: What is Google?”
Almost all results from Google Search are distributed with this balancing act for the fat tail (short and popular queries). You can test this yourself. If you have enough time to do a good sampling, you will see that, with some exceptions, the appetizer menu and main course menu on Google are carefully balanced. This complex picture is somewhat simplified in the diagram below.
As a supplement to this diagram, we need to show how and where this balancing act occurs. So, for the query “headache”, relevancy (in the literal sense) becomes unimportant because there will be millions of pages relevant to this term. This is where Google’s next criteria kicks in, which is quality. Google has several definitions of quality (such as utility, usefulness, etc.) and this is mainly determined by a mixture of rudimentary patterns (such as how long your description meta tag is) and statistical methods (such as the historical data of user clicks on links following search.) However, what is disguised here is that there is also another criteria, namely economics, which Google uses to assesses the dollar value of each page in terms of its effect on advertising revenue.
For an example here, one simple balancing act is to cut off all results from Advil.com since they bid on the term heavily for “headache” and thus they pop up on the advertising column anyhow. But there are other means too, methods only known to a few people inside Google, things that affect this balance. Based on these considerations, a formula for Google’s ranking algorithm would be an unknown function of relevancy, quality, and economics:
Page Rank = f (Relevancy, Quality, Economics)
In the graph above, note the blue line which marks the “fringes” of the fat tail, a transition to the long tail. To illustrate here, I picked a query “what causes headache after biking” on purpose, which clearly marks the boundary where Google loses its control over revenue. In response to this query, you will see several results from biking sites on this page, and no advertising will appear. In these search situations usefulness, utility, and economics criteria are all in limbo. Now, only the shear “relevance” mechanism (variable) is left in action, and this aspect of Google is actually quite poor. The reader should also note that the long tail is a big business, as proven by iTunes and Amazon. Consequently, Google wants desperately to claim the long tail territory of search.
Human Labor & Other Factors
Now let’s interpret Google’s leaked document, which serves the purpose of training people for rating URLs, the so-called “Quality Rater Guidelines 2012.” Here are my observations and speculations on this document:
- Google has two different missions using human labor in rating Web pages for given queries. (1) Identify Web sources that practice ill-intention (cloaking, spamming, etc.) with clear boundaries. Google must have a huge black-list, and this exercise repeated quite often will update their black list. (2) Rate the quality of the Web pages (URLs) for a given query. This is the interesting part where the “balancing act” matters. I will iterate this part only going onward. But, I imagine that a big chunk of this effort is aimed towards the first mission.
- With its second mission, Google is trying the push the blue line (diagram above) to the right, in order to claim more territory in the long tail, doing so by using human labor. I’ll bet all queries supplied to the trainers will be on this boundary of long/fat tail. Then, Google will compare ratings by human labor to its algorithm and use the difference to adjust the algorithm accordingly for the best economic solution.
- Now here’s a key argument. Anyone even vaguely interested in mathematics should know that long tail is so huge, that any such human powered, inch-by-inch progress toward rating is absolutely futile. Therefore, I will stipulate that the second mission is a hoax, keeping people on payroll, and showing Wall Street (and others) that some real effort is put into progress for relevance.
- Based on this assessment, I would further claim that the Quality Rater document is leaked on purpose. It does not contain any secrets, but it shows 161 pages of supposed hard work as to how improving search relevancy is a commitment.
- SEO people will find nothing useful in this document, because they do not know one of the most important criteria, that is the economics rating, and the corresponding strategy behind it. For example here, if your page on headache is highly academic, you may have a better chance to rank high in the appetizer menu than if your page is commercial, so as not to step on advertiser toes. Google has “no free lunch” mentality. But, if your business is a good earner for Google, meaning you advertise heavily on Google, then your Webpages will carry a different importance. Once this information leaks out, if this is the case, obviously a huge uproar would be heard from the likes of Microsoft, all Google competitors, and agencies determined to monitor such machinations.
- For me, I also found this document entertaining from the aspect of information overload on trainers. I laughed to tears reading “act as a regular Joe Blow.” After presenting the quality raters information a volume only comparable to a users’ manual on how to launch a nuclear missile, how can anyone expect human raters to represent ordinary people given so many directives? This self-conflicting strategy is a sign of sloppiness in their rating mentality at best, and subterfuge in the worst case.
Google is an advertising engine with an ingenious principle: keep users in the zone of fat tail queries by the method of under-articulation of language (pigeon language) such as auto-query-fill, so that users do not wander around the long-tail, where money making is difficult. I bet you always thought auto-fill was an added user experience feature! As for assessing “what Google is”, any other consideration of Google outside being an ad engine is superfluous.
Finally, in science we deal in the concrete, as opposed to the intrinsic nature of things, even at this Google Search is not even “intrinsically” a true search engine, let along a good one. At the core a search engine is a computer program designed to find answer to queries from within a collection of information. Once the results deviate from actual fact (pure relevance), and toward manipulated result (ad and economic science), I know you see my point. ajmalseotips.blogspot.com
0 Comments