Download multiple GenBank sequences from R
Published:
Perhaps a published paper lists their GenBank accession numbers as a range; for example KC664779 - KC665461. In R, one can use the following code to download all the sequences in this range, and save them as a FASTA file:
start = 664779
end = 665461
# create an empty vector to store the sequences
desired_sequences = c()
# run a loop to generate all the sequence IDs
k = 0
for (i in start:end){
desired_sequences[k] = paste("KC", i, sep = "")
k = k + 1
}
# access the GenBank sequences in the list using the ape read.GenBank function
output = ape::read.GenBank(desired_sequences)
# store the GenBank sequences as a FASTA file
ape::write.dna(output, file ="my_seqs.fasta", format = "fasta", append = FALSE, nbcol = 6, colsep = "", colw = 10)
Otherwise, you can specify particular accession numbers in a predefined list, as shown below. These accession numbers are all the COI and 28S sequences from the hymenopterans in the study by Chen et al. 2004.
# 28S sequences
desired_seqs_28S = c("AY317172", "AY317173","AY317170", "AY317164", "AY317169", "AY317160", "AY317155", "AY317171", "AY317163", "AY317178", "AY317175", "AY317166", "AY317162", "AY317168", "AY317176", "AY317179", "AY317181", "AY317167", "AY317159", "AY317158", "AY317157", "AY317174", "AY317180", "AY317156", "AY317161", "AY317165", "AY317177")
eurytomid_28S = ape::read.GenBank(desired_seqs_28S)
ape::write.dna(eurytomid_28S, file = "eurytomids_28S_genbank.fas", format = "fasta", append = F)
# COI sequences
desired_seqs_coi = c("AY317223", "AY317226", "AY317239", "AY317227", "AY317233", "AY317225", "AY317236", "AY317224", "AY317235", "AY317238", "AY317228", "AY317222", "AY317232", "AY317230", "AY317231", "AY317234", "AY317242", "AY317229", "AY317243", "AY317240", "AY317241", "AY317221")
eurytomid_coi = ape::read.GenBank(desired_seqs_coi)
ape::write.dna(eurytomid_coi, file = "eurytomids_COI_genbank.fas", format = "fasta", append = F)