# HG changeset patch # User jasper # Date 1486144211 18000 # Node ID 851b9da82fb03a3d02855faa23c01b53ea517bc1 # Parent 11fd914a3dfe2d4661b0ba45db876f9d68961ea6 Uploaded diff -r 11fd914a3dfe -r 851b9da82fb0 align_back_trans.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/align_back_trans.xml Fri Feb 03 12:50:11 2017 -0500 @@ -0,0 +1,129 @@ + + Gives a codon aware alignment + + biopython + Bio + + + + + + + align_back_trans.py --version + +align_back_trans.py $prot_align.ext "$prot_align" "$nuc_file" "$out_nuc_align" "$table" + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Takes an input file of aligned protein sequences (typically FASTA or Clustal +format), and a matching file of unaligned nucleotide sequences (FASTA format, +using the same identifiers), and threads the nucleotide sequences onto the +protein alignment to produce a codon aware nucleotide alignment - which can +be viewed as a back translation. + +If you specify one of the standard NCBI genetic codes (recommended), then the +translation is verified. This will allow fuzzy matching if stop codons in the +protein sequence have been reprented as X, and will allow for a trailing stop +codon present in the nucleotide sequences but not the protein. + +Note - the protein and nucleotide sequences must use the same identifers. + +Note - If no translation table is specified, the provided nucleotide sequences +should be exactly three times the length of the protein sequences (exluding the gaps). + +Note - the nucleotide FASTA file may contain extra sequences not in the +protein alignment, they will be ignored. This can be useful if for example +you have a nucleotide FASTA file containing all the genes in an organism, +while the protein alignment is for a specific gene family. + +**Example** + +Given this protein alignment in FASTA format:: + + >Alpha + DEER + >Beta + DE-R + >Gamma + D--R + +and this matching unaligned nucleotide FASTA file:: + + >Alpha + GATGAGGAACGA + >Beta + GATGAGCGU + >Gamma + GATCGG + +the tool would return this nucleotide alignment:: + + >Alpha + GATGAGGAACGA + >Beta + GATGAG---CGU + >Gamma + GAT------CGG + +Notice that all the gaps are multiples of three in length. + + +**Citation** + +This tool uses Biopython, so if you use this Galaxy tool in work leading to a +scientific publication please cite the following paper: + +Cock et al (2009). Biopython: freely available Python tools for computational +molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. +http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. + +This tool is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/align_back_trans + + + 10.7717/peerj.167 + 10.1093/bioinformatics/btp163 + +