Sunday, August 30, 2015

PhyloPic

A few weeks ago we discussed a paper in our journal club that used dinosaur silhouettes in its figures and referenced PhyloPic as the source. So today I finally decided to check the website out.

PhyloPic is a repository of black/white silhouettes of organisms across the tree of life. The idea is that everybody can sign up and submit their artwork under some kind of creative commons or public domain license, and everybody can use the silhouettes to do just what the authors in the aforementioned paper did: decorate phylogenetic tree figures in publications or talk slides, educate, etc.

The idea sounds great, so I played around a bit today to see how well it works. The experience was, alas, a bit mixed.

There are currently around 2,000 images in there. It sounds like a lot, but for now the content seems to be mostly focused on dinosaurs, primates and a few arthropods and birds. If you want plant pictures, you are out of luck. Still, that is for the user base to change.

So I drew a quick outline of a horsetail (Equisetum telmateia) I had photographed in southern France and submitted it. That at least works fairly smoothly. The website guessed the species name from my file name, sorted it into the right place in the classification hierarchy, transformed the SVG I provided into different size files in its database, and even populated higher level taxa that did not have a proper picture, such as the whole genus and family. All good.

What works much less well is searching for pictures. There appear to be only two options at the moment. The first is to click "browse", in which case the website will show the last 72 silhouettes that have been submitted together with a little Load More button. But you may be interested in, say, seeing all the flowering plants, so to use the second option you click on "search" and enter angiosperms.

Sadly, the result is not, as you would expect, a page showing all silhouettes of flowering plants. Instead, the website searches for taxa. One of the hits will be the angiosperms, so you click on the taxon name and are shown one or perhaps two representative pictures and the next lower taxa in the system. From there on you can navigate through the classification by clicking on these lower taxa, but as there are gazillions of levels in the hierarchy that can take a distressing amount of time.

The situation isn't helped by the fact that the database seems to be rather slow; when I arrived at the Asteraceae (daisy/sunflower) family, loading the list of next lower taxa took so long that my browser suggested the script on the website might have crashed.

To make matters worse, you are likely to go through all that for nothing. The creators of PhyloPic have apparently copied the classification of uBio to fill their own database, but that doesn't mean that any of the taxa in it necessarily have pictures, and of course nearly all of them don't. So you navigate all the way down to, say, a certain plant family only to find that there aren't any pictures for that family at all. Just as an aside, the website is also in a bit of trouble when you submit a silhouette of a species that isn't yet in its classification, as I did with my second submission.

Anyway, to make the website user-friendly, it would be better if there was a search function that does what I expected it to do in the first place: retrieve all pictures from the whole clade, down to species, whose name was entered into the search field. For the navigation through the classification it would be good if the website would at least show if there are any pictures at all downstream in the clade or not (perhaps with different colour buttons?).

So in summary, this is an idea with a lot of potential, but at this stage the implementation leaves something to be desired.

8 comments:

  1. Hi Alex. I'm the creator of PhyloPic. Glad you found some use in it, and I agree there's a long way to go before it's all ideally set up. Unfortunately it's just a project I work on alone in my increasingly rare spare time.

    The initial idea for the site was to enable people to find illustrations of smaller taxa, like species, or close approximations when not available. For higher taxa, it simply shows silhouettes that approximate the ancestral form (which has the nice side-effect of making the lineage pages possible). I would like to provide a way to view all silhouettes within a clade, in line with your expectation, but it wasn't an initial goal of the project.

    The problem with finding all silhouettes in a large clade is not a trivial one. Searches from ancestral nodes into descendant nodes are enormously expensive, and any solution has to work for the largest clade (Panbiota). I do have a solution in mind, but it will take a while to implement. There's an outstanding ticket for it.

    I agree that getting more plant silhouettes is a priority. I said as much in this interview.

    Unfortunately uBio went down last month, which threw the site into a bit of chaos. I've been meaning to switch over to Encyclopedia of Life, but the APIs are quite different and that will require some time. Hopefully the interim process for entering taxonomic names isn't too painful, though.

    As a final note, the site is open-source, so other developers are free to clone the repository, work on issues, and make pull requests. (None have yet.) As well, if you'll allow a bit of self-promotion, I recently launched a Patreon page for people who want to support more work on it (and other stuff).

    Tschüß!

    ReplyDelete
    Replies
    1. Thanks for your comment! I understand that there are severe constraints, and much of it is down to what contributors are willing to supply in terms of silhouettes anyway.

      I am, however, a bit surprised by what you describe as the initial goal of the project. I would have thought that by far the most obvious use of such a database would be for somebody who needs to illustrate a paper or talk containing a phylogeny of, say, ferns to enter "pteridophyta" into the search field and find a representative selection of silhouettes across all major lineages of ferns.

      Other uses did not really occur to me, and that is what informed my post. If somebody simply wants to know what a single species looks like, Google seems like a more promising venue than PhyloPic, for example.

      Delete
    2. But Google can't take a search for, say, "Sinosaurus silhouette" and find you a Zupaysaurus silhouette that could work just as well. It doesn't have the phylogenetic/taxonomic know-how to find relatives.

      The problem with finding a "representative" selection for large clades is that there isn't really an objective way to do it. What's a "representative" vertebrate? The closest objective solution to this I can think of is to show the ancestral form (or an approximation), which is what PhyloPic currently does.

      I do understand that this doesn't work for all purposes (such as if you're trying to differentiate sister clades), and I do want to work on that. But that functionality isn't something I was going to delay the launch of the site for.

      Delete
    3. One of the big goals of PhyloPic is to provide an automated generator for illustrated cladograms. The idea would be to enter a Newick string (and, later, other formats) and get back a cladogram with silhouettes for all nodes. For this, the ability query taxonomic units (which are often species, or other small taxa) is necessary.

      Delete
    4. I may still misunderstand, but it seems to me as if you are conflating two things. Yes, just Googling won't work if you want the next best silhouette, but it will if you want to know what species A looks like, especially given that "looks like" usually includes colours and surface structures.

      And somebody who wants silhouettes to decorate a phylogeny will, in my opinion at least, be better served with a search function that returns all silhouettes of the group they are working on, allowing them to pick six to ten representatives. An automated generator might be a bit overly optimistic except for a very small number of groups (basically birds, apes and non-avian dinosaurs), and I am skeptical of the concept of inferring "the ancestral form" from one extant side branch anyway.

      You appear to be worried that the function of displaying all pics for one clade will not work for panbiota or vertebrates, but really how many people do phylogenies at that level? Most of us are doing phylogenies of that family of frogs, that order or insects, or this genus of flowering plants. The real challenge would appear to have more than one or two silhouettes at that level and not how to display five hundred.

      Delete
    5. Of course, Google Images is great for that, but that's not the point of PhyloPic. The point is to allow users to find freely-reusable silhouettes for a given taxon, or at least to find freely-reusable silhouettes for close relatives in cases when there are none for that exact taxon.

      "birds, apes and non-avian dinosaurs"

      You can just say "apes and dinosaurs". :)
      (Great apes, really -- hylobatid coverage is not very good at present.)

      That is, of course, an artifact of the current coverage, not intrinsic to the structure of the site itself. Imagine if the coverage for angiosperms were as good as it currently is for amniotes. I hope that some day it will be (or even better, really).

      "I am skeptical of the concept of inferring 'the ancestral form' from one extant side branch anyway."

      Doesn't have to be extant. And there are some silhouettes done expressly as hypothetical ancestral forms, although admittedly that's just a few.

      I will say this doesn't work quite as well for certain plant taxa, at least for the silhouettes that show the whole organism (or the above-ground portion), since some plant lineages can change in gross form rather quickly and frequently (vines, shrubs, trees, etc.). But I think it can still work reasonably well for silhouettes of parts of the anatomy in those cases.

      And, despite some issues, I find this approach more sound than arbitrarily picking some derived species as the "typical" one. Not a fan of that kind of typological thinking in general.

      "You appear to be worried that the function of displaying all pics for one clade will not work for panbiota or vertebrates, but really how many people do phylogenies at that level?"

      People working on higher-level animal phylogeny certainly might use Vertebrata! At least in overview diagrams, if not as an OTU in a matrix itself.

      But you are proposing having an arbitrary cutoff level for the size of a clade? How does one come up with that?

      Delete
    6. Re ancestral form: I think we are talking past each other. We agree entirely on the problems with the typological approach, that is just what I meant. But as far as I can tell, at the moment the 'lineage' function of PhyloPic relies mostly on using "typical" derived species.

      Re displaying pics for a clade: I was thinking that PhyloPic already has a function for displaying large numbers of silhouettes, as happens when the user clicks on "browse". It displays the last 72 submissions and a "more" button. I merely envisioned that the same could happen with all vertebrate submissions if somebody uses the relevant search function and enters vertebrata or whatever.

      Delete
    7. "But as far as I can tell, at the moment the 'lineage' function of PhyloPic relies mostly on using "typical" derived species."

      It shouldn't! If there are any instances where you think a silhouette attached to a clade is not a reasonable approximation of the ancestral form, please let me know. I've done my best, but I'm not an expert on all branches of the Tree of Life.

      Yes, PhyloPic does have a method for displaying large numbers of images, and that will be repurposed. But that's not the problem. The problem is querying for the right images. It's very easy to make a query that pulls all of the images. But with the current data schema, it's practically impossible to create a performant query that finds all images in a large clade, since it would involve traversing every node within that clade.

      The solution will be first to create image sets. (These will work similarly to the name sets that are used for lineage pages with multiple termini.) The primary purpose of this is to allow people to create a sort of "shopping cart" of images, with auto-generated attributions, a single download, etc. That will make it a lot easier to collect images for a diagram, without losing the license information, etc.

      Once this is done, I'll be able to add associations between taxa and image sets. When an image is added to a lower taxon, a script will run to update the image sets for all ancestral nodes. Traversing the tree toward the root is far, far faster than traversing toward the leaves.

      Delete