Why you should use ete for tree exploration and visualisation in python !09 Jan 2016
While each one of those packages has its own unique strengths and weaknesses, I particularly like the ETE module. Here is why !
This post is based on one of my past presentation at monbug. I actually convert the ipython notebook to this markdown with nbconvert as described by Christopher S. Corley on his blog. The config I used with nbconvert can be found here. The github repository with all the original files for the presentation can be found here : monbug_ete. You can use nbviewer to view the notebook directly if you prefer.
What’s ETE ??
ETE is a python Environment for Tree Exploration created by Jaime Huerta-Cepas.
It’s a framework that assists in the manipulation of any type of hierarchical tree (ie reading, writing, visualisation, annotation, etc). The current latest version is ete3.
You can install ETE with pip :
pip install ete3. Check this link for more details about optional/unmet dependencies : http://etetoolkit.org/download/
Quick introduction to the API
A great in-depth tutorial for working with tree data structure in ETE is provided by the authors : http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html. I’m going to make a light introduction to the API here, but I really recommend you to read the official doc!
Let’s take a quick glance at the available tree data structure in ete :
['ClusterTree', 'EvolTree', 'NexmlTree', 'PhyloTree', 'PhyloxmlTree', 'Tree']
As you can see, you have a basic tree data structure (
Tree) and more specialized tree structures, like
PhyloTree for phylogenetics
=> ETE can read tree from a string or a file
=> In ete, a tree is a Node. This implies that the root is a Node, so are all its descendants.
/-a /-| /-| \-b | | /-| \-c | | --| \-d | | /-e \-| \-f
=> You can add information to nodes by adding features
The following code will traverse the tree
t1 and add a feature
sexiness to each leaf.
=> Features are just attributes.
/-a, 8 /-| /-| \-b, 1 | | /-| \-c, 9 | | --| \-d, 3 | | /-e, 9 \-| \-f, 3
=> You can search by features
[Tree node 'a' (-0x7ffff810443aa570)] [Tree node 'a' (-0x7ffff810443aa570)]
=> Here is a quick list of useful functions
SISTERS of a : [Tree node 'b' (0x7efbbc55ab0)]
FIRST CHILD OF ROOT /-a /-| /-| \-b | | --| \-c | \-d
LCA (a, b) : /-a --| \-b
RF DISTANCE between t1 and t2 : 0
Introduction to tree visualization with ete
Data : a random tree with random branches * Tree rendering * Tree Style
/-G, 0.47936 /, 0.11319 | | /-F, 0.53403 | \, 0.52094 -, 1.0 \-E, 0.89822 | | /-L, 0.27682 \, 0.32620 | /-K, 0.50173 \, 0.07320 | /-J, 0.14208 \, 0.93141 | /-I, 0.05555 \, 0.87512 \-H, 0.81088
=> Trees can be saved as images. Supported format are png, pdf and svg.
=> You can use
TreeStyle to change how the tree is displayed
Let’s draw a circular tree now
faces are wonderful
faces allow you to add graphical informations to a node. It can be a simple Text, an Image or a more useful information like a Chart or Sequence domains.
Here is the list of available faces :
['AttrFace', 'BarChartFace', 'CircleFace', 'DynamicItemFace', 'Face', 'ImgFace', 'OLD_SequenceFace', 'PieChartFace', 'ProfileFace', 'RandomFace', 'RectFace', 'SeqMotifFace', 'SequenceFace', 'SequencePlotFace', 'StackedBarFace', 'StaticItemFace', 'TextFace', 'TreeFace']
Faces can be added at different areas around a node.
With Faces, you can actually make things like this (treeception) :
It’s also possible to define a layout function that will determine how a node will be rendered. Let’s see how to do that and in which cases this could be useful with the next example.
Application 1 : Duplication|Loss history of a gene familly
Data : genetree newick where I have already added a feature (states) :
- states = 1 ==> internal node with duplication
- states = 0 ==> internal node with speciation
/-Dre_1, 0 /, 0 | | /-Cfa_1, 0 | \, 0 -, 1 \-Hsa_1, 0 | | /-Dre_2, 0 \, 0 \-Cfa_2, 0
Application 2 : Phylogenetic tree, protein sequence and information content
Data : - An alignment - A tree constructed using that alignment (Actually those two were randomly generated)
>A MAEIPDETIQQFMALT---SNIAVQYLSEFGDLNEALNSY >B MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSY >C MAEIPDATIQ----LTNVSHNIAVQYLSEFGDLNEALNSY >D MAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--- (A,(D,(B,C)));
You can do a lot of things with ete if you take the time to learn how to use it. I didn’t have time to talk about
EvolNode or all the other great modules of ete, but I hope this post spark your interest and was useful to you.
Also, READ THE DOCS.