Article Titled “Show me the Data”, Journal of Cell Biology, Vol. 179, No. 6, 1091-1092, 17 December 2007 (doi: 10.1083/jcb.200711140) is Misleading and Inaccurate
….In the same way that a scientist would use binoculars to view animal behavior, but would use a microscope to view cellular behavior, bibliometrics experts use the JCR to observe the journal, and Web of Science™ to observe the individual article/author….
Thomson Scientific’s Journal Citation Reports (JCR) is designed to provide journal-level statistics on citation and publication activity. Its compilation is intended to be comprehensive in the aggregation of citations at the journal level, and to take as broad a view as possible of the use of the journal in the surrounding literature. Although only the tally of citations seems to consume the interest -- and at times, fan the outrage -- of journal editors, the JCR presents a detailed year-by-year representation of the titles with which more than 6000 journals interact. This is an expansive and instructive view of the structure of the citation network around each title. Similar data are presented for each subject category so as to provide a well-informed context for understanding each journal title. Though it may seem obvious by its title, the JCR still requires explanation that it reports citations at the journal level.
A recent editorial in the Journal of Cell Biology by Rossner, Van Epps, and Hill argues that Thomson Scientific’s impact factor measure for the evaluation of journals should not be trusted since an article data set purchased from Thomson by The Rockefeller University Press did not exactly replicate the Journal Citation Reports data for its own -- and selected other -- journals.
When these data were questioned by The Rockefeller University Press, Thomson staff explained precisely the content of the data, as well as its derivation and use. Unfortunately for the readers of the Rossner editorial, the authors misunderstood much and as a result, misled readers about several matters, not only regarding the data but what Thomson representatives did and said from June to September 2007 in many email exchanges.
As background, Thomson Scientific’s Research Services Group will, at the request of individuals, editors, or publishers, provide further details of the citation network for a journal or journals by providing an extract from the Web of Science ™ , including article-by-article citation data. The Journal Analysis Database (JAD) product, which The Rockefeller University Press had purchased previously, provides insight into the particular articles, individuals and specific subjects that are most influential among the journals’ citing audience. Although the essentials of the data are available to millions of Web of Science subscribers, many publishers choose to contract with our Research Service Group to ensure an accurate collection of the data as well as obtain the continuing support of Thomson Scientific experts in the review and analysis of these data. Further, for influential journals like those of The Rockefeller University Press, many thousands of citations are collected each year and Thomson services provide efficiency in the collection of these items as well as a customized interface for their review. For citation data, as well as for readers, a “journal” is not merely a sum of a year’s articles. The journal, as a whole, is the product of its publisher’s and editorial board’s attention and expertise, as well as the earned prestige resulting from their dedication to the journal’s content over several years. Thomson Scientific is acutely aware of these differences and so makes more than a single metric available – not just for the articles, but for journals, institutions, and nations.
The Rockefeller University Press, as noted, recently purchased the JAD, and was provided with a complete explanation of its content and use, as well as precisely how and why it might vary from the Journal-level, aggregate statistics presented in the JCR. We inform customers in advance of these differences, and further, that data provided using this method will only approximate or simulate the impact factor calculation. This was stated when The Rockefeller University Press ordered article-level analyses of its own and competing journals. The Rockefeller University Press did not make known to us at that time that they anticipated using these data as some sort of “cross-check” of results, or “replication” as in a scientific experiment. Had they done so, we would have recommended a different methodology for using our data. In the same way that a scientist would use binoculars to view animal behavior, but would use a microscope to view cellular behavior, bibliometrics experts use the JCR to observe the journal, and Web of Science to observe the individual article/author. Proper methodology is critical to obtaining appropriate data.
When The Rockefeller University Press inquired about the discrepancy between the JCR and the JAD, we explained the differences again. We then went the extra mile of “replicating” the Impact Factor data using the Web of Science. We supplied the results as an additional data set, using the Journal of Cell Biology as an example to ensure we understood the customer’s needs. We can report that our methodology using Web of Science provided a confirmation of the accuracy of our data—to within 99.8%. The results came within 14 citation counts of the JCR Impact Factor (7,705 vs. 7,719) – a 0.2% difference in percentage terms. This result would meet stringent statistical criteria. It satisfied us. We believe any impartial observer would acknowledge that the JCR impact data are statistically valid and useful for the purposes they are intended.
Overall, the editorial presents several common misunderstandings of the construction and use of citation data. As these inaccuracies can obscure the discussion, we will address, specifically, some points raised in the JCB editorial:
Misleading Article Statement: “The impact factor calculation contains citation values in the numerator for which there is no corresponding value in the denominator.”
Clarification: The impact factor calculation has been explained for many decades in much detail. Its mathematics are of the simplest type and its content is not much more complicated. The numerator considers citations directed in any formally recognizable way to the journal title. To assess journal impact, such a generalized aggregation of journal citations is necessary. The denominator contains a count of indexed "citable items." Although all primary research articles and reviews (whether published in front-matter or anywhere else in the journal) are included, a citable item also includes substantive pieces published in the journal that are, bibliographically and bibliometrically, part of the scholarly contribution of the journal to the literature. Research at Thomson has shown that, across all journals, more than 98% of the citations in the numerator of the Impact Factor are to items considered “citable” and counted in the denominator.
Misleading Article Statement: Articles are designated as primary, review, or "front matter" by hand by Thomson Scientific employees examining journals using various bibliographic criteria, such as keywords and number of references.
Clarification: The coding of documents by Thomson Scientific is not based merely on “bibliographic criteria such as keywords and number of references,” as the article suggests. Document type coding is based on a detailed, journal-by-journal review of the presentation and labeling of articles in a journal, expanded by information provided by publishers regarding the content and structure of the journal, as well as key bibliometric characteristics. These methods have proven effective across many years, though they are not always satisfying to publishers and editors who request that certain types of articles not be included as citable.
Misleading Article Statement:“Some publishers negotiate with Thomson Scientific to change these designations in their favor. The specifics of these negotiations are not available to the public, but one can't help but wonder what has occurred when a journal experiences a sudden jump in impact factor.”
Clarification: Thomson Scientific never negotiates with publishers on coding articles, often to their chagrin and sometimes despite their strong objection. Many journals change their content across the years, and most publishers will cooperate with Thomson to alert us to coming changes so that we can ensure the continued correct indexing of materials.
At times, a journal’s content will be significantly modified but the effects of such a change on the impact factor will not be recognized by the publisher for a year or two. It is not uncommon for a publisher or editor to request a review of the indexing of their content and how past changes to that content could have affected the determination of “citable items.” Thomson staff will analyze and review up to three years of content to arrive at a fully informed determination of the proper indexing. Any required changes are then applied – most often from the current year onward rather than retroactively.
Misleading Article Statement:“Citations to retracted articles are counted in the impact factor calculation. In a particularly egregious example, Woo Suk Hwang's stem cell papers in Science from 2004 and 2005, both subsequently retracted, have been cited a total of 419 times (as of November 20, 2007). We won't cite them again here to prevent the creation of even more citations to this work.”
Clarification: True, Thomson Scientific does not eliminate citations to retracted papers from the impact factor calculation. These citations are part of the history of a journal and science itself, and they provide a valuable source of information necessary for the full understanding of the scientific impact of an erroneous or fraudulent paper. The fact that the scientific content of the report is overturned does not change the fact that the materials were published – and cited. To alter the publication history of the journal by refusing to acknowledge citations to any specific article would be editorializing on our part.
When a journal publishes a formal retraction, we index it, incorporating the full title of the original paper, along with all the original authors, and a full citation note of the location of the original paper. We also change the original paper's title to indicate "[Retracted]." This ensures that any title, author or source search that retrieves the original paper will retreive the retraction. Further, we connect the Retraction and the original paper by citation to further ensure that the retraction is visible to all.
Misleading Article Statement:“Because the impact factor calculation is a mean, it can be badly skewed by a ‘blockbuster’ paper.”… “ When we asked Thomson Scientific if they would consider providing a median calculation in addition to the mean they already publish, they replied, ‘It's an interesting suggestion...The median ... would typically be much lower than the mean. There are other statistical measures to describe the nature of the citation frequency distribution skewness, but the median is probably not the right choice.’ Perhaps so, but it can't hurt to provide the community with measures other than the mean, which, by Thomson Scientific's own admission, is a poor reflection of the average number of citations gleaned by most papers.”
Clarification: The Journal Impact Factor is a ratio of total citations to the number of citable items which closely approximates the arithmetic mean or average citation rate for articles in a journal, as discussed in point number one above . This is clearly stated, often repeated, and transparent in the description of the calculation presented in the product. As a mean, it is not meant to demonstrate citations to any particular paper, but to the collection of papers, as a whole population. While the Journal Impact Factor is influenced by papers that are highly cited, these papers are validly considered as important contributions to their parent journal. The mean is simply one measure; the median would be another, but owing to the skewed nature of citation distributions, it would be a lower number, much lower.
[Incidentally, an invitation to speak with the Chief Scientist at Thomson Scientific about other measures of skewed distributions was not taken up by Dr. Rossner.]
Incorrect Statement: “When queried about the discrepancy, Thomson Scientific explained that they have two separate databases—one for their "Research Group" and one used for the published impact factors (the JCR). We had been sold the database from the ‘Research Group’, which has fewer citations in it because the data have been vetted for erroneous records. ‘The JCR staff matches citations to journal titles, whereas the Research Services Group matches citations to individual articles,’ explained a Thomson Scientific representative. ‘Because some cited references are in error in terms of volume or page number, name of first author, and other data, these are missed by the Research Services Group.’"
Correction: In fact, there is a single data source at Thomson Scientific, however various products employ different methodologies for different purposes and applications. The implication that JCR data have not been vetted whereas Research Services Group data have been is simply incorrect. Because a citation used in the JCR for the impact factor uses a variant of a journal’s name does not mean the citation should not be counted, if it is plainly intended by the author to be a citation to the journal.
The authors close their review by speaking of “hidden data” and Thomson Scientific’s “ill-defined and manifestly unscientific number” (meaning the impact factor). To say on this evidence that the impact factor calculation is “ill-defined” is simply incorrect, and to call this calculation “manifestly unscientific” does an injustice to Thomson Scientific staff who have earned the trust of users through careful attention to data integrity.
In closing, we felt that because the Journal of Cell Biology editorial was inaccurate, and as a result, misleading to its audience and the scientific community as a whole, it was important to provide real transparency on the issues raised in that editorial with the goal of clarifying misconceptions.
David A. Pendlebury
Research Services Group
Message Edited by ThomsonReuters on 06-26-2008 03:30 PM