The way the creation of the Google search engine was inspired by the traditional method for measuring the value of scholarly works, with links becoming an analogue to citations, has become one of the web’s great origin myths. And the way the new search engine set off a rush to game the system, weakening the usefulness of links as markers of value, has become a lesson in the drawbacks of what might be called the automation of judgment. Every online currency inspires its own debasement, to one degree or another.
Now, in a perverse twist, the circle is completing itself, as Google provides web tools — Google Scholar Citations and Google Scholar Metrics — for tracking and measuring the value of academic articles and other scholarly works. The new tools offer a lot of benefits, but they also provide both the temptation and the means to game the scholarly citation system. Attempts to manipulate citations aren’t new, but now it’s possible to take the shenanigans to web scale, to bring black-hat techniques of search engine optimization to the ivory tower. Nat Torkington points to a 2012 paper, “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting,” in which three Spanish scholars describe how they used fake documents from a fake researcher to skew Google Scholar rankings and measures.
Over the course of a few hours, the researchers cobbled together six documents by cutting-and-pasting text and figures from other works. All the fake documents were attributed to the same, fake author. They included in each document citations to 129 other papers that were authored or coauthored by at least one member of the “EC3” research group to which they belong. They translated the documents into English using Google Translate. Then they created, within the University of Granada’s domain, a web page citing each of the six fake papers and including links to the full texts. At that point, they sat back and let Google take over:
Google indexed these documents nearly a month after they were uploaded, on 12 May, 2012. At that time the members of the research group [cited in the fake documents] along with the three co-authors of this paper, received an alert from GS Citations pointing out that [the fake scholar] had cited their Works. The citation explosion was thrilling, especially in the case of the youngest researchers where their citation rates were multiplied by six, notoriously increasing in size their profiles. …
The results of our experiment show how easy and simple it is to modify the citation profiles offered by Google. This exposes the dangers it may lead to in the hands of editors and researchers tempted to do “citations engineering.”
When the experiment was over, the researchers removed all trace of their work from the web, though the fake papers, and the fake author, lived on in the Google Scholar database. They conclude:
Even if we have previously argued in favour of Google Scholar as a research evaluation tool minimizing its biases and technical and methodological issues, in this paper we alert the research community over how easy it is to manipulate data and bibliometric indicators. Switching from a controlled environment where the production, dissemination and evaluation of scientific knowledge is monitored (even accepting all the shortcomings of peer review) to a environment that lacks any kind of control rather than researchers’ consciousness is a radical novelty that encounters many dangers. … [The Google tools] do not only awaken the Narcissus within researchers, but can unleash malpractices aiming at manipulating the orientation and meaning of numbers as a consequence of the ever growing pressure for publishing fuelled by the research evaluation exercises of each country.
Google, of course, only provides the temptation. It doesn’t force anyone to give in to it. Maybe, in the end, we’ll come to discover that Google was put on this earth to test our ethical mettle. That would give a deeper resonance to the origin myth.
Photo by Carlos Castillo.
Any scholar caught doing this would have their reputation destroyed. And reputation is probably the most important aspect for researchers. Seems no different than making up data to support your thesis, going to ruin your career if caught.
Any tool can be used for purposes other than it was originally designed for. There’s definitely temptation here to abuse the system, but cheating and plagiarism has been happening since education facilities opened their doors.
The problem I have with this whole thing is the assumption that Google scholar citations matter to anyone. As a professor of engineering at a tier-1 research institution, I’ve never seen anyone use them. Everyone has access to ISI, and that is what’s used to determine citations.
Perhaps things are different in other fields, I’m not sure.
I work in theoretical computer science, and people definitely use Google scholar. There is even a standard computer program (Publish or Perish, by Harzing.com) which automatically calculates h- and other indices based on GS.
GS isn’t super important in providing definitive citation counts, no, but it IS important in discovery. And since Google’s internal citation count is used in the relevance weighting, you can artificially pump up the apparent significance of your paper in GS by having fake citations attached to it.
Since lots of people will use GS in the course of their research, there is potential damage here in cases where an artificially weighted (but potentially low-quality) article gets found and cited by a “legit” researcher. Once this happens, then the ISI counts are going to start ticking up anyway.
EngProf, what makes you confident ISI is not similarly easy to hack? There’s been plenty of such incidents exposed recently after all. What the exposed cases seem to have in common is how obvious they were (a journal with a single owner and editor with all papers citing his own work); I would be very surprised if there isn’t a far larger volume of more subtle undiscovered “optimizations” out there as well. And ISI is worse in a way, as the methods they use are undisclosed, so there is no way to verify their data.
“Any scholar caught doing this would have their reputation destroyed.”
You’re assuming that they have a reputation. Someone without a reputation who wants to rank high and doesn’t care about reputation can game the system.
@Guy / @joe:
“Any scholar caught doing this would have their reputation destroyed.”
This is assuming that the perpetrator actually inflates his/her own ranking… there is nothing stopping you from inflating somebody else, in turn potentially destroying their reputation.