comparison readme.rst @ 2:b876c71cc0b1 draft

Uploaded
author bgruening
date Tue, 17 Mar 2015 13:33:43 -0400
parents 3dd8f7b4703b
children 257453ff1f3d
comparison
equal deleted inserted replaced
1:3dd8f7b4703b 2:b876c71cc0b1
1 Galaxy workflow for the identification of candidate genes clusters 1 Galaxy workflow for the identification of candidate genes clusters
2 ------------------------------------------------------------------ 2 ------------------------------------------------------------------
3 3
4 This approach screens three proteins against a given genome sequence, leading to a genome position 4 This approach screens two proteins against all nucleotide sequence from the
5 were all three genes are located nearby. As usual in Galaxy workflows every 5 NCBI nt database within hours on our cluster, leading to all organisms with an inter-
6 esting gene structure for further investigation. As usual in Galaxy workflows every
6 parameter, including the proximity distance, can be changed and additional steps 7 parameter, including the proximity distance, can be changed and additional steps
7 can be easily added. For example additional filtering to refine the initial BLAST 8 can be easily added. For example additional filtering to refine the initial BLAST
8 hits, or inclusion of a third query sequence. 9 hits, or inclusion of a third query sequence.
9 10
10 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/find_three_genes_located_nearby.png 11 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png
11 12
12 13
13 Sample Data 14 Sample Data
14 =========== 15 ===========
15 16
16 As an example, we will use three protein sequences from *Pan troglodytes* (Chimpanzee) 17 As an example, we will use two protein sequences from *Streptomyces aurantiacus*
17 which are part of the β-globin cluster. 18 that are part of a gene cluster, responsible for metabolite producion.
18 19
19 You can upload all sequences directly into Galaxy using the "Upload tool" 20 You can upload both sequences directly into Galaxy using the "Upload File" tool
20 with either of these URLs - Galaxy should recognise this is FASTA files. 21 with either of these URLs - Galaxy should recognise this is FASTA files.
21 22
22 Query sequences. 23 * `WP_037658548.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta>`_
23 * `P61920.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61920.fasta>`_ 24 * `WP_037658557.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta>`_
24 * `P61921.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61921.fasta>`_
25 * `Q6LDH1.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/Q6LDH1.fasta>`_
26 25
27 Genome sequence: 26 In addition you can find both sequences at the NCBI server:
28 * http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz 27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450)
28 ::
29
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus]
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW
29 37
30 38
31 In addition you can find the query sequences at the UniProt server: 39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase)
32 * http://www.uniprot.org/uniprot/P61920 (Hemoglobin subunit gamma-1)
33 :: 40 ::
34 41
35 >sp|P61920|HBG1_PANTR Hemoglobin subunit gamma-1 OS=Pan troglodytes GN=HBG1 PE=1 SV=2 42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus]
36 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK 43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR
37 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG 44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF
38 KEFTPEVQASWQKMVTAVASALSSRYH 45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD
39 46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR
40 47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG
41 * http://www.uniprot.org/uniprot/P61921 (Hemoglobin subunit gamma-2) 48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER
42 :: 49 NAA
43
44 >sp|P61921|HBG2_PANTR Hemoglobin subunit gamma-2 OS=Pan troglodytes GN=HBG2 PE=1 SV=2
45 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
46 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
47 KEFTPEVQASWQKMVTGVASALSSRYH
48
49
50 * http://www.uniprot.org/uniprot/Q6LDH1 (Hemoglobin subunit epsilon)
51 ::
52
53 >sp|Q6LDH1|HBE_PANTR Hemoglobin subunit epsilon OS=Pan troglodytes GN=HBE1 PE=2 SV=3
54 MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK
55 VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG
56 KEFTPEVQAAWQKLVSAVAIALAHKYH
57 50
58 51
59 Citation 52 Citation
60 ======== 53 ========
61 54
73 Availability 66 Availability
74 ============ 67 ============
75 68
76 This workflow is available on the main Galaxy Tool Shed: 69 This workflow is available on the main Galaxy Tool Shed:
77 70
78 http://toolshed.g2.bx.psu.edu/view/bgruening/find_three_genes_located_nearby_workflow 71 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow
79 72
80 Development is being done on github: 73 Development is being done on github:
81 74
82 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby 75 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_genes_located_nearby
83 76
84 77
85 Dependencies 78 Dependencies
86 ============ 79 ============
87 80