Plot a phylogeny in R using ggtree

3 minute read

Published:

This post will show you how to generate a phylogeny plot in R, with a heatmap alongside it. This example uses binary data, but it can be adapted to be plotted with a continuous variable.

First open your tree result (e.g. from IQTREE-2 or MrBayes) in FigTree, and format it there (rooting, displaying bootstrap or posterior probability values, branch swapping, etc.). Then save it as a Nexus file using the Export Trees option and select the checkboxes for Save as currently displayed and Include Annotations. The Nexus file can then be read into R using the treeio library function read.mrbayes() (even though this is an ML tree from IQTREE).

The binary data Excel file should follow a format like the one shown below. Sample names, exactly as they are in the tree file, should be in the first column. The other column names could be anything for which you have presence-absence data (e.g. presence on a particular host plant, sex, dead or alive, etc.).

sample_IDArundo_donaxCynodon_dactylonEragrostis_bifloraEragrostis_capensis
MN1332850011
MN1332881000
MN1332960010
KCP_2020_000370100
GFS_2020_006140100
GFS_2020_005620011

The image below is a cropped version of a phylogeny I have created using the code below.


################################
# Maximum Likelihood Phylogeny
################################

iqtree_result = treeio::read.mrbayes("C:/Users/s1000334/Documents/PhD_Tetramesa/Raw Sequences/Tetramesa/ALIGNED_READY/Tetramesa/Concatenated/Bruchophagus_removed/Singletons_removed/ML_concat_tree_R.nex")

# Have a look at the names of the features associated with the tree. Use the name given for the support values in the next bit of code (e.g boots, UFboot, bootstrap, in the case of an ML tree, or prob if it's a Bayesian tree)

iqtree_result 

#change between UFboot and boots depending on how you have named this in FigTree

iqtree_iotree = iqtree_result %>% ggtree::ggtree(., color="black", layout = "rectangular", lwd=0.5) + 
  geom_tiplab(size=1, color="black", font = 1) +
  geom_label2(aes(subset=!isTip, label=round(as.numeric(boots),2)), size=3, color="black", alpha=0, label.size = 0, nudge_x = -0.001) +  # add node numbers  +
  geom_treescale(fontsize=4, linesize=1)

# have a quick look at the tree

iqtree_iotree 

##############################
# add host plant info
##############################

# read in an Excel file with values associated with each sequence in the phylogeny (e.g. presence-absence data)

host.plants = readxl::read_xlsx("G:/My Drive/Tetramesa Project/Source Modifiers/source_modifiers_updated_10_06_2021.xlsx", 
                                sheet = "host_plants")
                                
head(host.plants)

# change the NA cells to zeros

host.plants[is.na(host.plants)] <- 0

# change the numeric values to characters

host.plants[] <- lapply(host.plants, as.character)

# make the first column the row names

host.plants = tibble::column_to_rownames(host.plants, var = "sample_id")

# add heatmap alongside the phylogeny plotting the binary values

binary_phylo =  ggtree::gheatmap(iqtree_iotree, host.plants, 
                         color="black",
                         offset=0.05, width=0.2, colnames_position = "top", 
                         colnames_angle = 90, font.size=1,  
                         colnames_offset_y = 2
                         ) + 
  scale_fill_manual(values=c("white", "darkblue")) +
  theme(legend.position="bottom") +
  geom_treescale() 

plot(binary_phylo)

# Additional plot:
# Plot the phylogeny skeleton (no tip labels)

frame = iqtree_result %>% ggtree::ggtree(., color="black", layout = "rectangular", lwd=0.5) + 
  #geom_tiplab(size=1, color="black", font = 1) +
  #geom_label2(aes(subset=!isTip, label=round(as.numeric(boots),2)), size=3, color="black", alpha=0, label.size = 0, nudge_x = -0.001) +  # add node numbers  +
  geom_treescale(fontsize=4, linesize=1)

frame