Friday, June 29, 2018

TreeBASE and Dryad

It is now generally expected that scientists, unless working on commercial or otherwise confidential projects, make the data underlying their scientific publications freely and publicly available, so that the studies can be replicated if necessary and so that others can use the data for further research.

Sometimes the data are submitted as supplementary material to be published on the journal website, together with the article itself. Some research organisations have their own data repositories. In many cases, however specialised databases are used. GenBank, for example, is a repository of DNA sequence data. Further down the analysis pipeline, I have in the past used TreeBASE to make available sequence alignment matrices and phylogenetic trees, and in one case I have reanalysed other people's data after obtaining them from there.

Recently I had reason to submit another such set of data matrices and phylogenetic trees to a database, and I thought I would go back to TreeBASE. Somehow it did not work out as well as it did a few years ago.

I was able to log in, I created a new submission, I submitted my files, and I described our analysis. The latter process is rather clunky, but okay, it works. Then it turned out that we needed to redo one of the phylogenetic analyses minus one sequence, so I had to delete one of the matrices and one of the trees and replace them with updated versions. That is when the fun started.

Although googling around a bit suggests that other people can do so, I find it impossible to delete anything in TreeBASE. There is no delete button next to anything except co-authors and submissions (i.e. the entire studies). Being unable to change data in a submission, I decided to delete the entire submission and start from scratch. That is surely not how it is meant to work, and it is a lot of extra effort, but what can I do?

As it turns out, not even that. When I ask that a submission be deleted, the web interface thinks for a bit an then throws a Java error at me. I now have three submissions under identical names and cannot delete the first two. Hurray.

At some point I thought I could maybe try out the alternative data repository Dryad.  Perhaps that would work more reliably? At least I have seen it used in several publications lately. I have now twice submitted my eMail address on their 'sign up for a new account' form, been told twice that a confirmation eMail has been sent, and days later neither I nor my spam folder have received any such message.

Perhaps the journal will accept our manuscript without us having the matrix and trees in a public repository? This process is becoming somewhat off-putting.

Update: After a mere four days I have now finally been sent a confirmation link by Dryad. Will see how that repository works.

Saturday, June 9, 2018

A particularly striking example of how paraphyletic taxa confuse our thinking about evolution

I recently reread Jason Rosenhouse's Among the Creationists and came across the following extended quote from Stephen Jay Gould, a widely admired and famous evolutionary biologist.
If mammals had arisen late and helped to drive dinosaurs to their doom, then we could legitimately propose a scenario of expected progress. But dinosaurs remained dominant and probably became extinct only as a quirky result of the most unpredictable of all events - a mass dying triggered by extraterrestrial impact. If dinosaurs had not died in this event, they would probably still dominate the domain of large-bodied vertebrates, as they had for so long with such conspicuous success, and mammals would still be small creatures in the interstices of their world. [...] Since dinosaurs were not moving toward markedly larger brains, and since such a prospect may lie outside the capabilities of reptilian design, we must assume that consciousness would not have evolved on our planet if a cosmic catastrophe had not claimed the dinosaurs as victims. (Gould 1989, 318)
The context is the controversy around convergence and contingency in evolution. Rosenhouse discusses convergence as one of the hopes of Christians trying to reconcile evolution and Christian teachings, citing various proponents of the idea that their god set up the universe in a way that human-like intelligence was guaranteed to arise, thus producing beings that can have a "relationship" with said god.

Convergence is, of course, not only an observation considered helpful by the proponents of one variant of theistic evolution. To what degree the organisms that evolved on our planet would again turn out to be kind of similar if we replayed the tape or if organisms on other planets can be expected to look very similar to those on ours are very interesting questions of broad interest. Even an atheist may ask if we can expect lots of other planets where life arose to produce land plants, something a bit like insects, and perhaps even sentient beings given enough time, or if the vast majority of them will, for example, remain populated only by bacteria, because even evolving as much as multicellularity was a rare fluke.

Rosenhouse cites Gould as a well-known proponent of the importance of contingency. Although I tend much more towards the opposite view, I understand Gould's position. I believe the strongest argument for the contingency side is that while there are many impressive cases of convergence there are also quite a few crucial events in the history of life on this planet that appear to have happened only once: complex Eukaryotic cells; colonisation of dry land by multi-cellular plants; vertebrates; and of course human-like intelligence.

If, for example, the independent evolution of wings by insects, pterosaurs, birds and bats is counted as evidence for the importance of convergence, should something happening only once not be counted as evidence for the importance of contingency? My response would be competition, or in other words the change in the adaptive landscape caused by the first organisms to settle on a new peak. Where there may have been a ridge connecting the niches "kelp" and "large land-living plant" when nobody had occupied the latter, the first lineage to do so quickly became so good at being large land-living plants that the ridge crumbled away and became a canyon. If all land plants were wiped out, however, I would expect the land to be colonised anew, this time perhaps by red or brown algae.

But that is not actually about the main argument Gould is quoted as making in the above excerpt, and not what I found interesting about the quote. To take it in smaller pieces:
If mammals had arisen late and helped to drive dinosaurs to their doom, then we could legitimately propose a scenario of expected progress.
"Expected progress" is a bit of an odd term here. I am not sure if that is what is meant, but it could be read as if any group of animals that does not evolve towards large brains and intelligence is a refutation of the possibility that one group on each planet might evolve towards larger brains. But I do not think that this works as a refutation. And few proponents of the importance of convergence would argue that it is all about one linear progression towards large brains anyway. There are also progressions, for example towards body shapes that work well for swimming, towards paternal care for the young, towards powered flight, etc., and all of these happen at the same time but only in those lineages for which they solve relevant problems or create new opportunities.

If I understand the argument correctly, it is like pointing at a hole in the ground and saying, "if I now throw a pebble into the air and it does not end up in this specific hole, gravity is refuted", whereas the argument for convergence is that, what with evolution throwing thousands of pebbles into the air every year, we are very likely to find a few of them at the bottom of this hole as opposed to half way up its wall.
But dinosaurs remained dominant and probably became extinct only as a quirky result of the most unpredictable of all events - a mass dying triggered by extraterrestrial impact. If dinosaurs had not died in this event, they would probably still dominate the domain of large-bodied vertebrates, as they had for so long with such conspicuous success, and mammals would still be small creatures in the interstices of their world.
Although this is not my field, and I understand that it is an active area of research, I believe it can already be said with some confidence that mass extinction is not random. There are generally some reasons for why an extinction event claims this lineage here but leaves that other one over there largely intact. If a mass extinction of marine life is caused, for example, by a massive drop in the oxygen content of the oceans, then we would expect lineages that can survive under low oxygen conditions to come out in relatively good shape, all things considered, while those with a high oxygen need would be hammered.

In the present case, if we hypothesise that the impact of a large meteorite would have caused massive shockwaves followed by a few years of something like nuclear winter, we could expect the following: Species of small animals may find it easier to survive because they need less food per number of individuals. Bonus points if you have a burrow to hide in when the devastation sweeps across your area (small mammals) or if you can move easily to other areas where a bit more food is left (flight-capable birds). Large animals that can go with little food for long times may also have a good chance, in other words being cold-blooded may help to survive several bad years (crocodiles). If, however, you are large and (!) at the same time you have a high rate of metabolism then you might be in trouble, as you constantly need lots of food per number of individuals. As far as I understand, that describes the non-avian dinosaurs: large and warm-blooded.

The point is, catastrophes do happen from time to time, and once one happened it would probably have decimated the largest animals, even if it had come ten million years later than it did. Their niches are filled up again by small animals evolving to be large (another good example of convergence). What killed off the pterosaur lineage, for example, may well have been that the birds had already out-competed all small pterosaurs, leaving only the very large species when the meteorite struck. But again, this is not my area of expertise really.
Since dinosaurs were not moving toward markedly larger brains, and since such a prospect may lie outside the capabilities of reptilian design, we must assume that consciousness would not have evolved on our planet if a cosmic catastrophe had not claimed the dinosaurs as victims.
And this last part is really what I find the most interesting, because it illustrates so nicely how paraphyletic taxa can confuse the thinking even of the smartest of us, even of experts in evolutionary biology. What is the problem with the argument here?

First, and most obviously, birds are dinosaurs. Second, corvids (crows and ravens) and parrots are highly intelligent. Not quite human-level intelligence, but in some experiments corvids have proved to be smarter even than chimpanzees, our closest relatives. It follows that  dinosaurs have actually "moved toward markedly larger brains", meaning here relative to the size of the body as a whole and, crucially, in terms of actual intelligence. Gould's premise is simply false, but his mistake is understandable, because at fault is really a misleading, i.e. non-phylogenetic, classification.

"Outside the capabilities of reptilian design" is, by the way, the same mistake at a deeper phylogenetic level. Mammals were not created fully formed, as mammals. Some of our ancestors were "reptiles", and here we are, having human-like intelligence by definition, what with us being humans and all that, so apparently there was a way of evolving human-like intelligence from a reptilian starting point. And from a fish starting point, and from a worm starting point, and from a bacterial starting point. All it took was lots of time and open niches waiting to be filled.

But I am not saying that anything here decisively refutes the idea that our sentience is a very rare fluke, unlikely to happen again should we go extinct. Maybe it is. The point is really how corrosive paraphyletic taxa are to reasoning about evolutionary processes.


Gould SJ, 1989. Wonderful Life: The Burgess Shale and the Nature of History. W.W. Norton, New York.

Wednesday, June 6, 2018

Manuscript submission then and now

When I started in science, back in the dark ages, submitting a manuscript to a journal was still quite simple, if perhaps a bit inefficient:
  1. Print the manuscript in triplicate.
  2. Write a cover letter and print it.
  3. Put everything into an envelope and send it off to the editor.
And that was that.

The first innovation was that you only had to send the manuscript to the editor as an eMail attachment, which was actually faster and saved a lot of paper. Unfortunately, however, things have changed again since then.

This is how it works today:
  1. Log into the editorial management software of the journal of my choice. If I do not have an account with that journal yet, create one first.
  2. Go to the author interface, click new submission.
  3. Select the type of article.
  4. Paste the title and abstract into an online form.
  5. Select key words or topics that supposedly help the journal to assign editors and/or reviewers. Click 'save and continue'.
  6. Upload main manuscript file, generally as an MS Word document.
  7. Upload all the figures as separate files, generally as TIF or EPS, although JPGs may be acceptable at the review stage. Paste figure legends and write 'link texts' into the form fields.
  8. Upload all the supplementary data files. If necessary, update the order of the files. Click 'save and continue'.
  9. Next, the authorship page. As the corresponding author with an account at that journal I am already in, but I may be asked to link my ORCID. (I have no idea if anybody actually uses it for anything - I only ever look people up with their ResearcherID or Google Scholar.)
  10. Search for my co-authors by name or eMail. I find the second co-author, great. The first and third co-authors aren't in the system, so I create entries for them. Click 'save and continue'.
  11. Error: No telephone number provided for second co-author. But he was in the system, so you accepted him before! Also, will any editor really ever want to use it? Argh. Let's look up his number. Okay, edited. Click 'save and continue'.
  12. Suggesting an editor for the manuscript. Oh dear, that's a long list. Hm. I know this guy hates one of the methods we used, he is out. This one is highly qualified but he will probably require us to add this other analysis that he likes. Ah well, worse things could happen. This one is also very qualified, but she works at a university I have a connection to - is that already a conflict of interest? Well, they can always choose somebody else, done.
  13. Okay, suggesting peer reviewers and providing their contact information. This guy is an obvious choice as he is the expert for one of the analyses we used, but darn, he is currently between institutions. Let's google his name. No, that's outdated. This one too. Ah, I'm lucky: he has an updated CV on this third page I found, complete with the new phone number and eMail address. Okay, now for reviewer suggestion number two. She is another obvious choice as one of the world experts on our study group. Easy to find her information on a staff page, so that's good. Who else? Maybe two more experts on the study group? Ah yes, she would be interested in this, and I have her contact details. And then this other guy from Europe. Google. Darn, nothing, despite the unique name. Perhaps there is contact info on recent papers. No, he is too senior, the corresponding authors are always others. Ah, wait, here? No, an eMail address from 2012 going "" sounds fishy, most likely somebody else is director now. More Google. Ah, finally, was able to click myself through to a staff website, well hidden and not in English. Ye gods. Four qualified reviewers, that should be enough to get going. Click 'save and continue'.
  14. Long, complicated page with miscellaneous information and declarations. First, write or upload cover letter. Done.
  15. Next, declare that we have not submitted this manuscript elsewhere. Okay.
  16. Is this a resubmission? No.
  17. Declare that we have followed protocol so-and-so on ethical collection practices. Yes.
  18. Declare that we have added a section on data availability. Wait, was that in the instructions to authors? Don't remember that. Argh. Save. Open manuscript file. Add data availability section. Back to file upload. Delete manuscript file. Re-upload manuscript file. Reorder files. Click 'save and continue'. Back to declaration. Yes, we now have a section on data availability.
  19. Declare no conflicts of interest. Okay. Click 'save and continue'.
  20. Large summary page. Check everything I entered so far. Down at the bottom: have to check PDF proofs before being allowed to submit. Click button, wait while the editorial manager bundles everything into a PDF.
  21. Open PDF. One of the EPS figures does not display. Argh. Argh. Argh. Back to file upload. Delete offending figure. Re-upload figure - as a TIF this time, that should be foolproof. Reorder files. Click 'save and continue'. Back to summary page.
  22. Re-check everything I entered so far. Click button, wait while the editorial manager generates a new PDF. Looks good this time.
  23. The big moment is there: click here to submit. "Are you certain? This will submit your manuscript." Yes!
Yay, progress?