comparison align_back_trans.xml @ 2:851b9da82fb0 draft

Uploaded
author jasper
date Fri, 03 Feb 2017 12:50:11 -0500
parents
children
comparison
equal deleted inserted replaced
1:11fd914a3dfe 2:851b9da82fb0
1 <tool id="align_back_trans" name="Thread nucleotides onto a protein alignment (back-translation)" version="0.0.6">
2 <description>Gives a codon aware alignment</description>
3 <requirements>
4 <requirement type="package" version="1.63">biopython</requirement>
5 <requirement type="python-module">Bio</requirement>
6 </requirements>
7 <stdio>
8 <!-- Anything other than zero is an error -->
9 <exit_code range="1:" />
10 <exit_code range=":-1" />
11 </stdio>
12 <version_command interpreter="python">align_back_trans.py --version</version_command>
13 <command interpreter="python">
14 align_back_trans.py $prot_align.ext "$prot_align" "$nuc_file" "$out_nuc_align" "$table"
15 </command>
16 <inputs>
17 <param name="prot_align" type="data" format="fasta,muscle,clustal" label="Aligned protein file" help="Mutliple sequence file in FASTA, ClustalW or PHYLIP format." />
18 <param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons">
19 <option value="1">1. Standard</option>
20 <option value="2">2. Vertebrate Mitochondrial</option>
21 <option value="3">3. Yeast Mitochondrial</option>
22 <option value="4">4. Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma</option>
23 <option value="5">5. Invertebrate Mitochondrial</option>
24 <option value="6">6. Ciliate Macronuclear and Dasycladacean</option>
25 <option value="9">9. Echinoderm Mitochondrial</option>
26 <option value="10">10. Euplotid Nuclear</option>
27 <option value="11">11. Bacterial</option>
28 <option value="12">12. Alternative Yeast Nuclear</option>
29 <option value="13">13. Ascidian Mitochondrial</option>
30 <option value="14">14. Flatworm Mitochondrial</option>
31 <option value="15">15. Blepharisma Macronuclear</option>
32 <option value="16">16. Chlorophycean Mitochondrial</option>
33 <option value="21">21. Trematode Mitochondrial</option>
34 <option value="22">22. Scenedesmus obliquus</option>
35 <option value="23">23. Thraustochytrium Mitochondrial</option>
36 <option value="0">Don't check the translation</option>
37 </param>
38 <param name="nuc_file" type="data" format="fasta" label="Unaligned nucleotide sequences" help="FASTA format, using same identifiers as your protein alignment" />
39 </inputs>
40 <outputs>
41 <data name="out_nuc_align" format_source="prot_align" metadata_source="prot_align" label="${prot_align.name} (back-translated)"/>
42 </outputs>
43 <tests>
44 <test>
45 <param name="prot_align" value="demo_prot_align.fasta" />
46 <param name="nuc_file" value="demo_nucs.fasta" />
47 <param name="table" value="0" />
48 <output name="out_nuc_align" file="demo_nuc_align.fasta" />
49 </test>
50 <test>
51 <param name="prot_align" value="demo_prot_align.fasta" />
52 <param name="nuc_file" value="demo_nucs_trailing_stop.fasta" />
53 <param name="table" value="11" />
54 <output name="out_nuc_align" file="demo_nuc_align.fasta" />
55 </test>
56 </tests>
57 <help>
58 **What it does**
59
60 Takes an input file of aligned protein sequences (typically FASTA or Clustal
61 format), and a matching file of unaligned nucleotide sequences (FASTA format,
62 using the same identifiers), and threads the nucleotide sequences onto the
63 protein alignment to produce a codon aware nucleotide alignment - which can
64 be viewed as a back translation.
65
66 If you specify one of the standard NCBI genetic codes (recommended), then the
67 translation is verified. This will allow fuzzy matching if stop codons in the
68 protein sequence have been reprented as X, and will allow for a trailing stop
69 codon present in the nucleotide sequences but not the protein.
70
71 Note - the protein and nucleotide sequences must use the same identifers.
72
73 Note - If no translation table is specified, the provided nucleotide sequences
74 should be exactly three times the length of the protein sequences (exluding the gaps).
75
76 Note - the nucleotide FASTA file may contain extra sequences not in the
77 protein alignment, they will be ignored. This can be useful if for example
78 you have a nucleotide FASTA file containing all the genes in an organism,
79 while the protein alignment is for a specific gene family.
80
81 **Example**
82
83 Given this protein alignment in FASTA format::
84
85 >Alpha
86 DEER
87 >Beta
88 DE-R
89 >Gamma
90 D--R
91
92 and this matching unaligned nucleotide FASTA file::
93
94 >Alpha
95 GATGAGGAACGA
96 >Beta
97 GATGAGCGU
98 >Gamma
99 GATCGG
100
101 the tool would return this nucleotide alignment::
102
103 >Alpha
104 GATGAGGAACGA
105 >Beta
106 GATGAG---CGU
107 >Gamma
108 GAT------CGG
109
110 Notice that all the gaps are multiples of three in length.
111
112
113 **Citation**
114
115 This tool uses Biopython, so if you use this Galaxy tool in work leading to a
116 scientific publication please cite the following paper:
117
118 Cock et al (2009). Biopython: freely available Python tools for computational
119 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
120 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
121
122 This tool is available to install into other Galaxy Instances via the Galaxy
123 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/align_back_trans
124 </help>
125 <citations>
126 <citation type="doi">10.7717/peerj.167</citation>
127 <citation type="doi">10.1093/bioinformatics/btp163</citation>
128 </citations>
129 </tool>