0
|
1 Galaxy workflow for the identification of candidate genes clusters
|
|
2 ------------------------------------------------------------------
|
|
3
|
3
|
4 This approach screens three proteins against a given genome sequence, leading to a genome position
|
|
5 were all three genes are located nearby. As usual in Galaxy workflows every
|
0
|
6 parameter, including the proximity distance, can be changed and additional steps
|
|
7 can be easily added. For example additional filtering to refine the initial BLAST
|
|
8 hits, or inclusion of a third query sequence.
|
|
9
|
3
|
10 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/find_three_genes_located_nearby.png
|
0
|
11
|
|
12
|
|
13 Sample Data
|
|
14 ===========
|
|
15
|
3
|
16 As an example, we will use three protein sequences from *Pan troglodytes* (Chimpanzee)
|
|
17 which are part of the β-globin cluster.
|
0
|
18
|
3
|
19 You can upload all sequences directly into Galaxy using the "Upload tool"
|
0
|
20 with either of these URLs - Galaxy should recognise this is FASTA files.
|
|
21
|
3
|
22 Query sequences:
|
0
|
23
|
3
|
24 * `P61920.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61920.fasta>`_
|
|
25 * `P61921.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61921.fasta>`_
|
|
26 * `Q6LDH1.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/Q6LDH1.fasta>`_
|
|
27
|
|
28 Genome sequence:
|
|
29
|
|
30 * http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz
|
0
|
31
|
|
32
|
3
|
33 In addition you can find the query sequences at the UniProt server:
|
|
34 * http://www.uniprot.org/uniprot/P61920 (Hemoglobin subunit gamma-1)
|
|
35 ::
|
|
36
|
|
37 >sp|P61920|HBG1_PANTR Hemoglobin subunit gamma-1 OS=Pan troglodytes GN=HBG1 PE=1 SV=2
|
|
38 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
|
|
39 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
|
|
40 KEFTPEVQASWQKMVTAVASALSSRYH
|
|
41
|
|
42
|
|
43 * http://www.uniprot.org/uniprot/P61921 (Hemoglobin subunit gamma-2)
|
0
|
44 ::
|
3
|
45
|
|
46 >sp|P61921|HBG2_PANTR Hemoglobin subunit gamma-2 OS=Pan troglodytes GN=HBG2 PE=1 SV=2
|
|
47 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
|
|
48 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
|
|
49 KEFTPEVQASWQKMVTGVASALSSRYH
|
|
50
|
|
51
|
|
52 * http://www.uniprot.org/uniprot/Q6LDH1 (Hemoglobin subunit epsilon)
|
|
53 ::
|
|
54
|
|
55 >sp|Q6LDH1|HBE_PANTR Hemoglobin subunit epsilon OS=Pan troglodytes GN=HBE1 PE=2 SV=3
|
|
56 MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK
|
|
57 VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG
|
|
58 KEFTPEVQAAWQKLVSAVAIALAHKYH
|
0
|
59
|
|
60
|
|
61 Citation
|
|
62 ========
|
|
63
|
|
64 If you use this workflow directly, or a derivative of it, or the associated
|
|
65 NCBI BLAST wrappers for Galaxy, in work leading to a scientific publication,
|
|
66 please cite:
|
|
67
|
|
68 Peter J. A. Cock, John M. Chilton, Björn Grüning, James E. Johnson, Nicola Soranzo
|
|
69 NCBI BLAST+ integrated into Galaxy
|
|
70
|
|
71 * http://biorxiv.org/content/early/2015/01/21/014043
|
|
72 * http://dx.doi.org/10.1101/014043
|
|
73
|
|
74
|
|
75 Availability
|
|
76 ============
|
|
77
|
|
78 This workflow is available on the main Galaxy Tool Shed:
|
|
79
|
3
|
80 http://toolshed.g2.bx.psu.edu/view/bgruening/find_three_genes_located_nearby_workflow
|
0
|
81
|
|
82 Development is being done on github:
|
|
83
|
3
|
84 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby
|
0
|
85
|
|
86
|
|
87 Dependencies
|
|
88 ============
|
|
89
|
|
90 These dependencies should be resolved automatically via the Galaxy Tool Shed:
|
|
91
|
|
92 * http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
|