comparison readme.rst @ 3:257453ff1f3d draft default tip

Uploaded
author bgruening
date Tue, 17 Mar 2015 13:39:34 -0400
parents b876c71cc0b1
children
comparison
equal deleted inserted replaced
2:b876c71cc0b1 3:257453ff1f3d
1 Galaxy workflow for the identification of candidate genes clusters 1 Galaxy workflow for the identification of candidate genes clusters
2 ------------------------------------------------------------------ 2 ------------------------------------------------------------------
3 3
4 This approach screens two proteins against all nucleotide sequence from the 4 This approach screens three proteins against a given genome sequence, leading to a genome position
5 NCBI nt database within hours on our cluster, leading to all organisms with an inter- 5 were all three genes are located nearby. As usual in Galaxy workflows every
6 esting gene structure for further investigation. As usual in Galaxy workflows every
7 parameter, including the proximity distance, can be changed and additional steps 6 parameter, including the proximity distance, can be changed and additional steps
8 can be easily added. For example additional filtering to refine the initial BLAST 7 can be easily added. For example additional filtering to refine the initial BLAST
9 hits, or inclusion of a third query sequence. 8 hits, or inclusion of a third query sequence.
10 9
11 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png 10 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/find_three_genes_located_nearby.png
12 11
13 12
14 Sample Data 13 Sample Data
15 =========== 14 ===========
16 15
17 As an example, we will use two protein sequences from *Streptomyces aurantiacus* 16 As an example, we will use three protein sequences from *Pan troglodytes* (Chimpanzee)
18 that are part of a gene cluster, responsible for metabolite producion. 17 which are part of the β-globin cluster.
19 18
20 You can upload both sequences directly into Galaxy using the "Upload File" tool 19 You can upload all sequences directly into Galaxy using the "Upload tool"
21 with either of these URLs - Galaxy should recognise this is FASTA files. 20 with either of these URLs - Galaxy should recognise this is FASTA files.
22 21
23 * `WP_037658548.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta>`_ 22 Query sequences:
24 * `WP_037658557.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta>`_
25 23
26 In addition you can find both sequences at the NCBI server: 24 * `P61920.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61920.fasta>`_
27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450) 25 * `P61921.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61921.fasta>`_
28 :: 26 * `Q6LDH1.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/Q6LDH1.fasta>`_
29 27
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus] 28 Genome sequence:
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA 29
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL 30 * http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW
37 31
38 32
39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase) 33 In addition you can find the query sequences at the UniProt server:
34 * http://www.uniprot.org/uniprot/P61920 (Hemoglobin subunit gamma-1)
40 :: 35 ::
41 36
42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus] 37 >sp|P61920|HBG1_PANTR Hemoglobin subunit gamma-1 OS=Pan troglodytes GN=HBG1 PE=1 SV=2
43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR 38 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF 39 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD 40 KEFTPEVQASWQKMVTAVASALSSRYH
46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR 41
47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG 42
48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER 43 * http://www.uniprot.org/uniprot/P61921 (Hemoglobin subunit gamma-2)
49 NAA 44 ::
45
46 >sp|P61921|HBG2_PANTR Hemoglobin subunit gamma-2 OS=Pan troglodytes GN=HBG2 PE=1 SV=2
47 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
48 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
49 KEFTPEVQASWQKMVTGVASALSSRYH
50
51
52 * http://www.uniprot.org/uniprot/Q6LDH1 (Hemoglobin subunit epsilon)
53 ::
54
55 >sp|Q6LDH1|HBE_PANTR Hemoglobin subunit epsilon OS=Pan troglodytes GN=HBE1 PE=2 SV=3
56 MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK
57 VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG
58 KEFTPEVQAAWQKLVSAVAIALAHKYH
50 59
51 60
52 Citation 61 Citation
53 ======== 62 ========
54 63
66 Availability 75 Availability
67 ============ 76 ============
68 77
69 This workflow is available on the main Galaxy Tool Shed: 78 This workflow is available on the main Galaxy Tool Shed:
70 79
71 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow 80 http://toolshed.g2.bx.psu.edu/view/bgruening/find_three_genes_located_nearby_workflow
72 81
73 Development is being done on github: 82 Development is being done on github:
74 83
75 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_genes_located_nearby 84 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby
76 85
77 86
78 Dependencies 87 Dependencies
79 ============ 88 ============
80 89