Blog posts

2022

Multiple sequence alignment Fasta file manipulation

5 minute read

Published:

This bit of code can be a great help to subset a Fasta file alignment based on certain conditions taken from a sample information file (e.g. subset only the samples collected in Spring, or those in Winter and greater than y cm, or everything except those in Autumn, etc.).

Plot a phylogeny in R using ggtree

3 minute read

Published:

This post will show you how to generate a phylogeny plot in R, with a heatmap alongside it. This example uses binary data, but it can be adapted to be plotted with a continuous variable.

Creating a climate map with GPS coordinate points

3 minute read

Published:

This R code shows you how to plot a map for a desired country or geographic region, and how to display a chosen climatic variable (e.g. mean annual temperature, annual precipitation) as an overlay. The WorldClim dataset has 19 variables to choose from.

Creating an elevation map with GPS coordinate points

1 minute read

Published:

This R script plots the map of a desired country, and colours it by elevation. It then adds GPS coordinate points to it (for example sampling sites) from an Excel file. Check out the final image at the bottom of this post πŸ˜ƒ. The North arrow and Tetramesa wasp was added manually later in Inkscape.

2021

Renaming all the sequences in a FASTA file automatically

2 minute read

Published:

I came across the problem of renaming sequences in a FASTA sequence alignment after downloading over 200 sequences from GenBank for four different genes. The sequence names were assigned by the GenBank accession number (e.g. MK1526) only. I wanted the sequence names to have the species name in it as well, for example MK1526_Canis_lupis. To avoid the tedious task of manually doing this, I wrote a few lines of R code that I hope will be of use to others with the same issue.

Download multiple GenBank sequences from R

1 minute read

Published:

Perhaps a published paper lists their GenBank accession numbers as a range; for example KC664779 - KC665461. In R, one can use the following code to download all the sequences in this range, and save them as a FASTA file:

Combining single fasta sequences into a combined multiple sequence alignment file

1 minute read

Published:

This snippet of Python or R code (whichever you prefer!) enables you to input a folder directory containing many FASTA files containing a single sequence in each, and combine them all into one FASTA file. I initially found it cumbersome to manually copy and paste the FASTA sequence from each trimmed chromatogram file into one big FASTA file. This automates the process, and could be useful to folks who are generating FASTA files for alignments πŸ˜ƒ. This works correctly in Python 3.9. If you’re more of an R person, the R script is here too 😎.