Citation counts and indices: Beware of bad data
DOI: 10.1063/PT.3.2463
In recent years, citation counts and associated indices have become increasingly important in a wide range of professional considerations, including hiring, promotions, and grant reviews. Citation counts can make the difference between a cut or an increase in funding for a university department by a national or local government. Also, it is not uncommon to see job applications listing citation counts for every paper on the candidate’s CV.
Citation counts are great for bureaucrats and administrators because, by definition, they are quantitative. They can be added, subtracted, normalized, and plotted. Indices based on these counts abound, the ubiquitous Hirsch or h-index being the most prominent.
Debate continues to rage as to whether such counts and indices actually mean anything. (See, for example, the Commentary by Orion Penner, Alexander Petersen, Raj Pan, and Santo Fortunato, Physics Today, April 2013, page 8
One issue that I have not seen debated is the accuracy of the actual citation data. We take for granted that every citation of every paper we have written has been counted. But that is not necessarily so. I offer a case in point:
In 2008 I wrote a paper for Astrophysical Journal Letters on the possibility of testing the black hole no-hair theorems using measurements of stars orbiting the galactic-center black hole Sagittarius A*. A year ago I went to INSPIRE—the high-energy physics information system that now combines SPIRES and CERN’s Invenio digital library technology—to check on the citations. I was astonished to find that it had been cited only 13 times in five years.
So I checked the NASA Astrophysics Data System (ADS) database and found that my article had been cited 73 times. What happened to the other 60 citations? Most of them were in standard journals like Physical Review D and Astrophysical Journal. I had assumed that with modern DOI (digital object identifier) designations and search engines, everything would be caught, but apparently not.
To correct the problem, I had to generate a list via NASA/ADS of all the missing citations and send them to INSPIRE. The staff there then entered the information by hand. The list is now accurate—and my own h-index went up by one! But I’m not obsessed. Really, I’m not.
My experience does raise a question: What else might be missing?
In all fairness, SPIRES was set up as a database primarily for the high-energy physics community, and the INSPIRE staff members admit that they have difficulty getting all the references from the various astronomy and astrophysics journals. In view of the increasing links between particle physics and astronomy, they told me, they are considering talking to the NASA/ADS staff about ways of better covering both fields.
But I don’t mean to pick on INSPIRE. Here’s another case. In 1976 I wrote a letter to Physical Review Letters with Mark Haugan on weak interactions and Eötvös experiments testing the equivalence principle. A few months ago, I discovered that the citation counts on that paper were wildly divergent: INSPIRE had 35 citations, NASA/ADS had 98, and Physical Review’s own website had 202. After some detective work, I discovered the problem. My paper with Haugan was Phys. Rev. Lett.37, 1 (1976) (doi:10.1103/PhysRevLett.37.1
So caveat emptor: A citation count provided by a system such as INSPIRE or NASA/ADS or even a journal might not be as accurate as you think. Like every good physicist, you should check the quality of the data before worrying too much about the interpretation.
More about the Authors
Clifford Will. (cmw@physics.ufl.edu) University of Florida, Gainesville .