Combining single fasta sequences into a combined multiple sequence alignment file

1 minute read

Published:

This snippet of Python or R code (whichever you prefer!) enables you to input a folder directory containing many FASTA files containing a single sequence in each, and combine them all into one FASTA file. I initially found it cumbersome to manually copy and paste the FASTA sequence from each trimmed chromatogram file into one big FASTA file. This automates the process, and could be useful to folks who are generating FASTA files for alignments ๐Ÿ˜ƒ. This works correctly in Python 3.9. If youโ€™re more of an R person, the R script is here too ๐Ÿ˜Ž.


import os

DIR = input("\nInput folder path containing FASTA files to combine into one FASTA file: ")
os.chdir(DIR)
FILE_NAME = input("\nWhat would you like to name your output file (e.g. combo.fas)? Note: "
                  "Please add the .fas extension: ")
output_fas = open(FILE_NAME, 'w')
file_count = 0

for f in os.listdir(DIR):
    if f.endswith((".fas", ".fasta", ".FASTA")):
        file_count += 1
        fh = open(os.path.join(DIR, f))
        for line in fh:
            output_fas.write(line)
        fh.close()

output_fas.close()
print(str(file_count) + " FASTA files were merged into one file, which can be found here: " + DIR)

And hereโ€™s the R code:


# get all the fasta files from your desired directory

fasta_files = list.files(path = "C:/Users/s1000334/Documents/R_fasta_combiner/210508EN-005",
           pattern = ".fasta|.fas|.FASTA", 
           full.names = TRUE)


# have a quick check

fasta_files

# confirm the number of fasta files

length(fasta_files)

# create an empty file to which each single fasta sequence will be written to

file.create("combo.fas")

# loop through each fasta file, and write it to the empty file created above

for( i  in 1:length(fasta_files) ){
  f = ape::read.FASTA(fasta_files[i])
  ape::write.FASTA(f, file = "combo.fas", append = TRUE)
}