Download multiple GenBank sequences from R

1 minute read

Published:

Perhaps a published paper lists their GenBank accession numbers as a range; for example KC664779 - KC665461. In R, one can use the following code to download all the sequences in this range, and save them as a FASTA file:

start = 664779
end = 665461

# create an empty vector to store the sequences
desired_sequences = c()

# run a loop to generate all the sequence IDs
k = 0
for (i in start:end){
  desired_sequences[k] = paste("KC", i, sep = "")
  k = k + 1
}

# access the GenBank sequences in the list using the ape read.GenBank function
output = ape::read.GenBank(desired_sequences)
# store the GenBank sequences as a FASTA file
ape::write.dna(output, file ="my_seqs.fasta", format = "fasta", append = FALSE, nbcol = 6, colsep = "", colw = 10)

Otherwise, you can specify particular accession numbers in a predefined list, as shown below. These accession numbers are all the COI and 28S sequences from the hymenopterans in the study by Chen et al. 2004.

# 28S sequences
desired_seqs_28S = c("AY317172", "AY317173","AY317170", "AY317164", "AY317169", "AY317160", "AY317155", "AY317171", "AY317163", "AY317178", "AY317175", "AY317166", "AY317162", "AY317168", "AY317176", "AY317179", "AY317181", "AY317167", "AY317159", "AY317158", "AY317157", "AY317174", "AY317180", "AY317156", "AY317161", "AY317165", "AY317177")

eurytomid_28S = ape::read.GenBank(desired_seqs_28S)

ape::write.dna(eurytomid_28S, file = "eurytomids_28S_genbank.fas", format = "fasta", append = F)

# COI sequences
desired_seqs_coi = c("AY317223", "AY317226", "AY317239", "AY317227", "AY317233", "AY317225", "AY317236", "AY317224", "AY317235", "AY317238", "AY317228", "AY317222", "AY317232", "AY317230", "AY317231", "AY317234", "AY317242", "AY317229", "AY317243", "AY317240", "AY317241", "AY317221")

eurytomid_coi = ape::read.GenBank(desired_seqs_coi)

ape::write.dna(eurytomid_coi, file = "eurytomids_COI_genbank.fas", format = "fasta", append = F)