Over the last few months, I’ve made a number of observations, starting here, about the growing dominance of Wikipedia over search engine results for common terms. My findings were anecdotal, drawn from the searches I do day-in and day-out as well as a few dozen random ones I did for the express purpose of checking Wikipedia’s rank. Everything I saw seemed to point to an emerging Wikipedian search supremacy, but of course I was seeing only a tiny fraction of searches. It would have been nice to have a more rigorous sample.
Well, now we have such a sample, thanks to a student in Slovenia by the name of Jure Cuhalev. In a research project, Cuhalev gathered a random sample of about 1,000 of the 1.4 million topics covered by Wikipedia. He then ran the terms through the Google, Yahoo, and MSN search engines. He found that Wikipedia did in fact appear with remarkable consistency in the upper reaches of search results. On average, the online encyclopedia appeared in the top-ten search results 65% of the time – and 26% of the time it actually had two results in the top ten. (Cuhalev has posted a summary of his findings on his blog, and the full report can be downloaded here.)
But the findings get more interesting when you look beyond the averages to the particular results turned in by each of the three engines. It turns out that Google’s algorithm absolutely adores Wikipedia and that Yahoo’s passion for the online encyclopedia is nearly as ardent. But Microsoft’s MSN algorithm seems strikingly less enchanted by Wikipedia’s charms. Wikipedia turned up in Google’s top ten a whopping 81% of the time and in Yahoo’s 77%, but it appeared in MSN’s top ten just 38% of the time. What’s up with that?
Cuhalev also found that when Wikipedia does turn up in the top ten it tends to rank very highly indeed. It’s in the top three results 76% of the time at Yahoo, 66% at Google, and 54% at MSN.
I hope other researchers will look at this phenomenon from different angles, and also track changes over time, but in the meantime we now have the first solid evidence that, for Google and Yahoo at least, Wikipedia rules.
Thanks for drawing attention to the research, it is certainly very interesting. It is probably worth noting that Wikipedia turned up in Google’s top ten 81% of the time, not the 89% you cited. It seems pretty clear that MSN is the outlier.
Cool! Let’s see if three’s something I can do with this …
Regarding “But Microsoft’s MSN algorithm seems strikingly less enchanted by Wikipedia’s charms. … What’s up with that?
Although everbody has heard of PageRank, there’s actually a half-dozen factors which are critical to a page’s placement. PageRank is only the most famous factor. It’s well known in SEO circles that Yahoo basically cloned Google’s weightings (with a few tweaks favoring Yahoo’s directory), for whatever reason, and MSN has decided not to use the same weightings.
Thanks for noting the error, Oliver. I’ve corrected it in the text. Nick
I think it’s high time for anyone who has a cause at heart to be active on the corresponding Wikipedia entry.
Three obvious points:
– he says nothing about dynamics or an evolution: it’s a title about “attraction”
– what about Ask.com?
– his list of topics should be taken from a proper Zeitgest.
I’ll e-mail him about it.
This is fascinating, and not at all surprising. Wikipedia has a strangle hold on the search engine rankings.
Nick, when you wrote yesterday about Larrying Wikipedia, my immediate reaction was to wonder how much of Wikipedia’s potential ad revenue is driven by their dominance of the SERPS.
The next step of the analysis would be to try and figure out what percentage of Wikipedia’s overall traffic is via search engine queries, and what percentage is type in / tool bar / bookmark traffic.
My hunch is that a HUGE percentage of Wikipedia’s traffic is via their high SE rankings – traffic which is less valuable than say MySpace’s whose traffic I’d imagine comes primarily from type ins (and is therefore less susceptible to the changing algos of the SEs.)
One factor in Wikipedia’s high rankings might be the fact that they’re not commercial. If they started throwing affiliate links and ad code in there, my guess is they would plummet.
For that reason, a thorough discussion of Wikipedia’s potential value would have to include the SEO impact of commercializing the site.
@lorenzinho: I think you’ve hit the nail on the head – if Wikipedia were to become commercial, other webmasters might not link to it as freely as they do now. So over the longer term, there would be a potential hit to its SEO ranking, which in turn would negatively affect its valuation.
Ashkan Karbasfrooshan, founder of MojoSupreme, makes the same point in this post;
incidentally, he also has an interesting post stemming off of Nick’s post here.
One factor in Wikipedia’s high rankings might be the fact that they’re not commercial.
There are different types of “commercial”. For instance, if some brand of candy causes hair loss, and that fact is stated on their WP page, that product might lose sales. If, however, their WP page is full of “studies” showing it doesn’t cause hair loss, they might won’t lose as many sales.
To give examples:
en.wikipedia.org/wiki/Curt_Weldon
en.wikipedia.org/wiki/Antonio_Villaraigosa
I know very little about the first, but I can tell at a glance that that entry is hugely biased against him. I know quite a bit more about the second, and – as pointed out to some degree on the discussion page – the article does in fact not mention many negatives about him. In fact, I couldn’t find any negatives in the entry at all.
And, that WP entry is the third google result for his name. Perhaps google/yahoo should re-think whether they want to weight WP so high.
I agree with Bertil that the starting point needs to be what users search for not what Wikipedia covers. Who cares what the ranking is of a Wikipedia entry on a search term that no one ever queries? By sampling from existing Wikipedia entries, the researcher is sampling on the dependent variable. By definition the study is controlling for the fact that a relevant Wikipedia entry exists using that query. Note that the search terms were derived from Wikipedia titles! Queries on those exact terms are going to favor pages that have the term in the title. But who is to say that people search for those topics using those terms?
Sure, I realize that Wikipedia covers tons of topics. Regardless, the way Wikipedia organizes its topics may not be how users think about them as they search. It seems to me that this study is somewhat upside down. As is too often the case, some basic problematic choices about methodology up front are limiting its overall utility.
Google searches for “Rough Type” and “Nicholas Carr” both give this site as top hit. Following the logic of the survey, that points to this site being significantly more favoured by Google than Wikipedia. (i.e. what the previous commentators said about the variables used).
Eszter, yes, good point, there really should be normalization for the competitiveness of the term, because a high pagerank site with the search keywords in the title will tend to score well for any obscure term. On the other hand, there’s something to be said about dominance of “volume” results. Even though it’s SEO-obvious, it has some disturbing implications. It means, for example, that Wikipedia’s pages can rank highly for the names of people, and may be the top result for many people without a large Net/Media presence. That’s *scary*.
Danny: Yes, for people with a large Net presence, Google favors their site over Wikipedia. That’s good. It is problematic that MSN does NOT do this in certain cases – ie. some people’s Wikipedia bio pages come before their own site! e.g. search [Tim Bray] on MSN, the Wikipedia page is the first result.
Why is Google so popular, and MSN so marginal? I wanted to recheck the numbers just now, so I typed “search engine popularity” (no quotes) into MSN and Google. And just as anybody who is paying attention would expect, MSN returned a bunch of “link popularity checkers” (read “we get paid for our results”) while Google’s first hit was “Nielsen NetRatings Search Engine Ratings” (what I wanted). IMHO MSN’s 10% and Google’s astounding 50% are due to the fact that Google actually gives search results that are useful for the user.
I remember trying to decide between Metacrawler, Altavista, and Yahoo! once upon a time. Currently, though, I don’t know anybody who uses non-Google Search Engines EXCEPT when they need another perspective.
The reality “on the ground” is that MSN hardly ever gives the user the search results they were imagining in the first place. Google is the best in that sense and that’s reflected in the numbers, and Yahoo! isn’t bad either (hence its almost 25% share).
Wikipedia’s dominant presence in search results reflects the relevance of Google and Yahoo! for users’ interests. MSN has been fighting for years for users’ attention. Perhaps they should focus on giving search results that are actually interesting for the user. They could start by indexing the entire Wikipedia :)
See the
Walkthrough
for TheKBase Web – “Organize Your Data EVERY Way You Want To”
Daniel Rosenstark said:
I don’t know anybody who uses non-Google Search Engines
This just highlights that the people you know are not a representative sample of Internet users and so you should be careful about whatever conclusions you may draw regarding people’s Internet uses based on the experiences of those whom you know.
After all, data about search engine uses have consistently shown that a significant portion of Internet users (let’s say in the US, but elsewhere, too) do not use Google or certainly do not search at google.com.
Recent figures:
http://searchenginewatch.com/showPage.html?page=2156451
(For more on the distinction between using Google vs searching at google.com, see this piece:http://www.firstmonday.org/issues/issue9_3/hargittai/index.html)
Research I have done in the past comparing the efficiency of users (diverse group of adult users) across search engines has shown that what matters in finding the relevant information is savvy regarding the use of the most appropriate search query not which search engine one uses. Again, this highlights the importance of what queries users actually decide to use, which is all the more reason to start a study of this type with actual user queries as opposed to ones generated from the titles of potential result pages.
I don’t know anybody who uses non-Google Search Engines
To add some more “facts” to the discussion, among North American online households who use a search engine at least once a week 39% do not ever use Google. When we look at just those aged 18 to 40 — those considered the most “tech-savvy” — that number only falls to 36%. Furthermore, only 25% of the tech-savvy use Google exclusively, and only 23% of the population as a whole do the same.
To say that you don’t know anyone who uses non-Google search engines is to say you do not know three-fourths of the online North American population.
Source: Forrester Research’s NACTAS Benchmark Survey 2006
Oliver and Eszter’s comments are correct. For several months in 1995 — while I was doing my Master’s in Sociology at the University of Washington (UW) — I did manage to maintain friendships with a random sample of UW students. It was a very tiring experience and I never really got the sample size I was looking for. And the target population was only UW students! To say the least, I don’t think anybody knows a “representative sample of Internet users,” but of course I can’t even make that statement with a high degree of certainty.
What I do know is that the NACTAS 2006 numbers, though interesting, merely point to the strength of Google in a multi-search engine universe. 39% of people NEVER use Google. Which means that 61% occasionally, regularly or always use it. Now I admit that I have a selecion bias in my friendships and I would never talk to anyone who admits to using MSN as their search engine of choice.
However, why don’t we all have a quick look at the fine print: neither the Nielson numbers that I cite nor the Forrester Research numbers are based on random samples from the target population. The “sample used by TNS NFO is not a random sample” (I won’t cite this, you can search on the quote in Google). So yeah, my anecdotal evidence is terrible, and so are the “official” numbers. But all this aside, is anybody (who reads this blog, of course!) willing to question the “fact” that Google is the most used search engine? And furthermore, that a lot of the time, Wikipedia entries are what most Internet users are looking for?
See the
Walkthrough
for TheKBase Web – “Organize Your Data EVERY Way You Want To”
This wikipedia thing reminds me a bit about banking and loans. If you owe the bank $1000 and you are having trouble repaying the loan, its your problem. If on the other hand, you owe a few million, its the banks problem.
Purely by virtue of such high ranking in SE search results, it now forces everyone to examine Wikipedia entries for accuracy.
Also like Tom Sawyer’s approach to getting people to paint his fence :-) Enough analogies for now.