Friday, September 27, 2013

Manuscript writing: what to avoid

Previously I wrote a post about peer reviewers from hell. But of course I am not only an author of papers but also a peer reviewer myself, so perhaps I should write something from that perspective. What are the things that annoy me about manuscripts? What mistakes or faults do I appear to encounter particularly often in my field?

Poor English

Unsurprising and unspectacular, of course. If there is no native speaker of English among the authors it is usually sound advice to ask for help from a colleague who is before submitting the paper to a journal. There are even native speakers who earn a bit of extra money during their PhD/postdoc by offering the service of correcting the English of scientific manuscripts, using their own scientific expertise to get the technical language right, something a non-scientist professional translation service may struggle with.

No study design at all

Some of the saddest manuscripts I had to review were those where the authors quite clearly started generating some data - sometimes large amounts of them and consuming large amounts of time - and only then sat down and wondered what to do with them. They must have thought something like this: "I know how to produce this type of data, so let's do that. Hm. Now I have the data and want to get a publication out of all my effort. But merely a table does not cut it, you have to have some quantitative analysis to be able to publish in a decent journal. Hey, look here, these people used an ordination, they look so sciencey and I always wanted to do one of those! And now, what question should I pretend to have been interested in when I started the study?" Sorry, but it does not work that way.

If you only start to think about what analysis to conduct after you have collected your data you are doing it wrong. A scientific study should start with the publication that will result from it already in mind. What do you want to find out? From that, you get how you have to design the test, i.e. what analysis to use. From that, you get the type and amount of data you will need to be able to do that analysis. NOW you start collecting data, after all that is decided.

Uninspired repetition of the one thing the main author knows to do

Already considerably less exasperating than the previous one but still sad is another problem that seems to be particularly evident in manuscripts coming from one specific country. In this case, a scientist has learned one methodology, for example generating microsatellite data and calculating Fst values, or generating Sanger sequences of ITS and chloroplast data and inferring a gene tree, and then they go and do the same one thing over and over and over again, on one study group after the other. Without, and this is the problem, ever stopping to ask (1) whether there is actually a need to do such a study on this specific study group and (2) whether these data are actually useful to answer the research question in this particular case.

In other words, the researchers suffer either from a lack of imagination or from an inability to adopt new, more appropriate methods. If it is the former, the resulting manuscripts are merely very uninspired, leaving the reader to wonder why the study was ever conducted and whether one should waste journal pages on publishing it. If it is the latter, one is sometimes left with no choice but to reject the manuscript outright because it cannot actually answer the questions it claims to answer.

Statistician-speak instead of clear language

What I mean here is that some of the authors are thinking entirely in terms of the statistic or modelling methodology instead of the biological relevance, and they explain the analysis and its results in words that fail to make clear what is actually going on from the perspective of the research question. As an example, somebody might write "the variable populations influences the variable pollinators", and the reader is left guessing what the heck any of that means. Does this simply mean that different pollinator groups dominated in different plant populations, and if yes, so what?

Sometimes the problem appears to be that a biologist has teamed up with a statistician and let them write the relevant parts of the methods and results sections; in other cases a biologist has become so deeply immersed in the cogs and wheels of the method that they have lost sight of the fact that the readers will lack the same degree of immersion, and that they want the results expressed in a way that clearly describes their biological relevance.

No concept of what the various sections are for

A typical beginner's issue is lack of understanding of what goes where in a manuscript. As most readers will likely know, the typical scientific research paper has the following sections: abstract (summary), introduction, materials & methods, results, discussion, acknowledgements, references (literature cited). Rarely there are variants on this. The acknowledgements might be at the beginning instead, the results and discussion section might be combined into one, perhaps there is an extra conclusions section, and, in a few supposedly prestigious journals (a.k.a. the rainbow press of science), the methods section is placed after the discussion, which means that you bizarrely get the results without being able to assess what they mean and whether the study design is at all defensible. (Really, few things in science publishing are more twee than reading a Nature paper that presents a new methodology because the important part is at the very end.) But mostly abstract, intro, m&m, results, discussion, refs is the sequence you will find.

Abstract, acknowledgements and references are easy but the rest of them appear to be harder. Typical problems in manuscripts written by beginners are:
  • Important information that should have been in the introduction is only mentioned in the discussion for the first time.
  • The discussion repeats parts of the introduction; in extreme cases, I have seen manuscripts where the authors literally copied and pasted several paragraphs from the introduction into the discussion.
  • The results section contains paragraphs that should be in the discussion.
  • The discussion repeats parts of the results section.
  • Some results are not mentioned in the results section but appear in the discussion section for the first time.
  • Additional tests are suddenly found in the results or discussion section that were not explained in the methods section.
I will admit that it is difficult to be sure what to do if some of the results of an early step in the methodological pipeline influenced what analysis had to be used further downstream. In that case, one would basically have to mention some of the results already in the methods section to justify why a certain method was used instead of alternatives. But apart from that, it really should not be that hard to get it right.

The introduction provides all the background information on the state of knowledge, what has been done before and what questions are currently unanswered, and it ends with the aims of the study. It must contain numerous references to document the state of knowledge, of course. The methods section can be really short and should only explain how the study was done. It is often full of references to papers explaining the methods or announcing software tools. The results section can also be really short and should only state the results - describe the phylogenetic trees, list P values or cases of non-significance, describe the position of the dots on a map or graph, etc. There should be no interpretation whatsoever, and consequently a results section also contains zero references.

Finally, the discussion section places the results in the context of the background knowledge mentioned in the introduction and provides suggestions for further study. It should contain numerous references, often greatly overlapping with those already used in the introduction. In fact, if a paragraph in the discussion does not contain any references one should consider whether it would not be better placed in the results or removed from the manuscript entirely.


The second typical beginner's mistake is making manuscripts too long. It is clear where that comes from. Just thinking of my own schooldays I remember how we were always forced to write a certain minimum number of pages in exams or assignments. "At least six pages." "Minimum of four pages." I hated that because I usually thought that I had said everything of importance after two. The same seems to apply in many universities where it is apparently a big no-no to submit an honours/master thesis of less than a hundred pages or a dissertation of less than at least two hundred and fifty.

In other words, we were systematically trained to needlessly repeat ourselves and to add superfluous blather. The problem is, if you are then starting to write manuscripts to be submitted to journals, you have to unlearn the needless repetition and blather again. In science publishing, conciseness is a virtue. Some journals demand page charges, others demand page charges above a certain number of pages, and some editors rigorously enforce word limits. Anyway, the major areas where I find people write too much are the introduction and the discussion (see also the next point).

Rambling discussion that has nothing to do with the results

As mentioned above, the point of the discussion is to place one's very specific results into the context of overall research in the field. It is not the point of a discussion section to take the results as a starting point for a long, meandering and rambling discussion of otherwise interesting issues that one's results cannot really address.

I once reviewed a paper where people had generated certain measurements for a number of plant species. What they should have done was to compare them against the previously available measurements in the genus and then maybe mention two possible processes that might explain why some of them were a bit unexpected, in a few sentences and with the appropriate references. What they did instead was fill two thirds of the discussion with highly speculative ruminations on those possible processes that never went anywhere because they did not have any data that would help them decide which of the two processes it was. That is simply a waste of everyone's time, and bad writing.

No comments:

Post a Comment