Friday, November 30, 2012

How to assess publication records

Publish or perish has become such a well-known expression that it should be nothing new even to those who have never played with the thought of pursuing an academic career (see Wikipedia, or, more cynically, SMBC). The idea is that there is ever increasing pressure on academics not necessarily to produce work of high utility and quality but simply to produce lots of publications in renowned journals.

Academics are evaluated on the basis of their publication record in nearly all professionally relevant situations: when applying for a job, when considered for a promotion, when applying for a research grant. By extension, research groups within an institute, institutes within a university, and universities among universities are evaluated based on the publication record of their researchers. So, how does one assess publication records?

There are different ways people do this, and they are listed below in order of decreasing stupidity.

Number of publications

The simplest approach is to look at the record of a colleague (let us call him Nathanial Blinckhorn, PhD) and say, gazooks, Blinckhorn sure has a lot of publications. The only thing that this demonstrates, however, is that he is writing very industriously. What it does not show is that his writing actually makes any sense, that his work has any use for anybody, or even just that he has not committed fraud and manufactured all his data. And even if Blinckhorn's work is okay, quantity is not substance. Coming from my botanical perspective again, maybe he is pushing out five papers per year on the lines of "one new species of plants from Brazil" while his colleague across the corridor published ten new species in one single paper. Surely we need to take more into account than just raw numbers.

Quality of journals

The most straightforward way to reduce the likelihood of useless, faulty or even fraudulent papers being published is known as peer review: before a journal accepts a manuscript for publication, the editor sends it out to two or more scientists from the same field who examine it critically and may either recommend acceptance, recommend rejection, or suggest changes that would have to be made to make it acceptable. So if you want to know if Blinckhorn produces quality work, the first thing to check is whether he publishes in peer reviewed, international, renowned journals. If most of his publications are in obscure journals that do not have proper peer review, that should raise a red flag right there.

Things have become ever more complicated with the rise of the internet and the subsequent advent of open access publishing in academia. More on that perhaps another time - suffice to say, there are now literally hundreds of junk journals with legit sounding names out there that will "publish", online only, any manure that is submitted to them in exchange for a processing fee. To appear more legitimate, they often simulate peer review - I have sometimes been contacted by them with review requests, but no serious journal would have sent those manuscripts out for review in the first place. Ultimately, they will accept submissions no matter how bad they are because that is their business model.

So, peer review is the gold standard, and publications that did not go through it count less. However, even the number of legitimate, respectable journals is huge. Perhaps we can figure out which of them are better than the others, so that we can judge if Blinckhorn manages to publish in the "good" journals?

Journal impact factors

The pencil pusher mindset that wants to turn everything into numbers to be added up and compared brought us metrics to compare journals. The best known of them is the impact factor. To cite Wikipedia, "in a given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years." The impact factors can be found here, but only if you are at a computer of an institution that has paid a fee to be able to access them.

An astounding number of colleagues is satisfied with this level of analysis. They will look at Blinckhorn's publication record and say, wow, he has got a paper in Nature, that is the journal with the highest impact factor of all, so he must be a really good scientist. There are so many problems with that it is not funny.

First, you may notice that the impact factor counts citations only for the last two years. This leads to some severe distortions between fields of science. Some areas have an extremely fast turnover, such as molecular and developmental biology, and papers that are more than five years old are ancient history. Mathematics, for example, moves much slower, and papers from much earlier are still regularly cited. The same goes for my field - due to the rule of priority for scientific names, papers are actually more important for taxonomy the older they are.

Another distorting factor is that journals serving fields that have many practitioners, like medicine, have much higher impact factors than those in fields with fewer practitioners. It is thus important not to make comparisons based on impact factors across fields, but sadly not everybody got that memo. Many a university department has replaced its retired plant systematist with a genomics researcher because they get into journals that rank higher in the citation reports. Following this myopic approach, one could just as well shut down all institutions dedicated to human knowledge except the medical ones and be done with it, because those guys get the most and quickest citations (and the most grant money, but that is another issue).

The most obvious problem, however, is that looking at the journals Blinckhorn manages to publish in tells us only that: He can write papers that, with a bit of luck, get into Nature. For some faculty heads, maybe that is enough. After all, "last year, my staff got seven papers into Nature" makes better bragging than "last year, my staff have solved the Hankendorff paradox", especially if the competing faculty has never heard of said paradox because that is not their field.

But still, all this name-dropping of high impact journals does not really tell us anything about the utility and quality of Blinckhorn's work. Maybe he got lucky and and his paper was accepted there because its topic seemed very promising at the time but subsequently everybody realized the idea was stupid and ignored it. By the way, even the very people who compose the journal citation reports warn against using them to judge scientists. The idea was to compare journals, perhaps if you want to decide what to subscribe to for your university library or where to submit an article for greatest visibility, but the metric is not meant for comparing people.

Number of citations

Much more relevant for the impact that a scientist has on their field is how often their work is cited. All else being equal, we might assume that somebody who is ignored by other scientists has not done anything useful, and that somebody who is cited frequently is producing interesting work. So often people look at the lifetime citations, i.e. the sum of all the instances in which any work published by that scientist has been cited.

However, there are many pitfalls here. A minor issue is that a particularly bad study that made it into a widely read journal may be cited a lot of times by people who debunk or criticize it. More important, however, is where we get the citation data from. The most popular source is Thomson Reuters/ISI Web of Knowledge, where one can enter an author name and see how often all their papers have been cited; the same data is behind the stats calculated by ResearcherID. The problem is that by far not every reputable scientific periodical on the planet is in that database, as it accepts only journals that have above a certain number of issues per year, much less books. In my area, floras and series of taxonomic monographs are examples of important research outlets that are not found in Web of Knowledge. Somebody might publish, after years of work, a monograph of all the oaks of North America, for example, but even if it is highly useful and people cite it continually, that never registers. Google Scholar is better in that regard in that it registers any kind of publication, but it is much less accepted in formal evaluations because it has a higher rate of noise.

Another problem is that monographs or floras may be highly useful but still go uncited. I cannot count how often I am using floras in my work to identify specimens for a study - but nobody requires me to cite them in the resulting paper.

The fourth issue is one that probably does not get enough attention, or if so then only implicitly. Scientist A has published, together with 19 co-authors, an influential paper receiving 100 citations; scientist B has, without co-authors, published a paper receiving 10 citations. Who deserves the louder accolades? This gets even more complicated because the 20 authors on the first paper will have made contributions of very varying importance, and because co-authorship policies differ widely in the various fields of science. Some fields will rarely publish a study with less than at least six authors, but in some humanities everybody who has even just one co-author is suspect of being too incompetent to write without assistance. In some labs, the department head demands to be senior author of every paper any of their staff publish even if they contributed nothing whatsoever and would not understand a word of it; in other areas, even thesis supervisors only want their name on a paper if they were directly involved with the analysis. Despite all these issues, being one of thirty co-authors on a Nature paper still sounds more impressive to many than being sole author of a small phylogenetic study in a specialist journal, although citations divided by number of co-authors likely comes out the same.

Lastly, and cunningly leading up to the next metric, lots of lifetime citations do not demonstrate that the scientist is consistently doing a good job. Maybe one of their early papers hit jackpot, introducing a new method that everybody started to use, but after that they never managed to get something reasonable done again.


This last problem is what the h-index is meant to solve, and here we are really coming to the number crunching. The index is calculated as follows: Sort all of the scientist's publications by the citations they received, from highest to lowest. Go down the list while counting up from one. When the next number you would have to count is higher than the next paper's number of citations, stop. The number you counted last is the h-index.

Example: Assume that Blinckhorn has ten publications that have been cited 25, 18, 13, 9, 9, 8, 5, 3, 1, and 1 times, respectively, then his h-index is 6. If they have been cited 2718, 3, 2, 1, 1, 1, 1, 0, 0, and 0 times, the h-index would be 2. This shows very nicely what the index is designed to achieve; it is difficult to get a high h-index unless you consistently produce papers that are cited frequently. Conversely, a single windfall does not influence the metric much.

Still, all the other problems I mentioned under the "citations" section apply in full force also to the h-index: a paper might get citations because it was so offensively horrible; many citations don't show up in Web of Knowledge; some works are used but aren't cited; and some people get high stats out of being fifteenth of seventeen authors.

And finally, the best way to assess a publication record:

Actually reading Blinckhorn's publications. That is really the best way to assess somebody's work. And if I, personally, don't understand them, because these days we are all highly specialized, maybe I can get an external opinion from somebody who does. Some search or promotion committees do just that and have candidates' CVs and publication lists peer-reviewed, but of course many an institution is not willing to invest the effort and relies on number crunching instead.

Note, by the way, also the potential for gaming the system with nearly all the metrics used to evaluate publication records. You assess your staff by number of publications? They will aim for the smallest publishable unit, trying to divide their manuscripts into smaller papers to inflate their list of publications. You assess your staff by the impact factor of the journals they manage to be published in? They will go only for the flashy, charismatic science that is most attractive to those journals, but they may neglect important but less charismatic work. In my field, that would be the writing of taxonomic treatments, a thankless task from the citation report perspective but crucial so that people can, you know, actually identify plant and animal species. You assess your staff by number of citations or by their h-index? They will start citing their own papers, supervisors will demand students cite theirs, and peer reviewers will demand prospective authors cite theirs, even when there is only the most tenuous or no relevance for the present manuscript.


Similarities with any real life Blinckhorns or Hankendorffs are unintended and purely coincidental.

No comments:

Post a Comment