Combining single fasta sequences into a combined multiple sequence alignment file
Published:
This snippet of Python or R code (whichever you prefer!) enables you to input a folder directory containing many FASTA files containing a single sequence in each, and combine them all into one FASTA file. I initially found it cumbersome to manually copy and paste the FASTA sequence from each trimmed chromatogram file into one big FASTA file. This automates the process, and could be useful to folks who are generating FASTA files for alignments ๐. This works correctly in Python 3.9. If youโre more of an R person, the R script is here too ๐.
import os
DIR = input("\nInput folder path containing FASTA files to combine into one FASTA file: ")
os.chdir(DIR)
FILE_NAME = input("\nWhat would you like to name your output file (e.g. combo.fas)? Note: "
"Please add the .fas extension: ")
output_fas = open(FILE_NAME, 'w')
file_count = 0
for f in os.listdir(DIR):
if f.endswith((".fas", ".fasta", ".FASTA")):
file_count += 1
fh = open(os.path.join(DIR, f))
for line in fh:
output_fas.write(line)
fh.close()
output_fas.close()
print(str(file_count) + " FASTA files were merged into one file, which can be found here: " + DIR)
And hereโs the R code:
# get all the fasta files from your desired directory
fasta_files = list.files(path = "C:/Users/s1000334/Documents/R_fasta_combiner/210508EN-005",
pattern = ".fasta|.fas|.FASTA",
full.names = TRUE)
# have a quick check
fasta_files
# confirm the number of fasta files
length(fasta_files)
# create an empty file to which each single fasta sequence will be written to
file.create("combo.fas")
# loop through each fasta file, and write it to the empty file created above
for( i in 1:length(fasta_files) ){
f = ape::read.FASTA(fasta_files[i])
ape::write.FASTA(f, file = "combo.fas", append = TRUE)
}