Mercurial > repos > bgruening > find_three_genes_located_nearby_workflow
comparison readme.rst @ 3:257453ff1f3d draft default tip
Uploaded
author | bgruening |
---|---|
date | Tue, 17 Mar 2015 13:39:34 -0400 |
parents | b876c71cc0b1 |
children |
comparison
equal
deleted
inserted
replaced
2:b876c71cc0b1 | 3:257453ff1f3d |
---|---|
1 Galaxy workflow for the identification of candidate genes clusters | 1 Galaxy workflow for the identification of candidate genes clusters |
2 ------------------------------------------------------------------ | 2 ------------------------------------------------------------------ |
3 | 3 |
4 This approach screens two proteins against all nucleotide sequence from the | 4 This approach screens three proteins against a given genome sequence, leading to a genome position |
5 NCBI nt database within hours on our cluster, leading to all organisms with an inter- | 5 were all three genes are located nearby. As usual in Galaxy workflows every |
6 esting gene structure for further investigation. As usual in Galaxy workflows every | |
7 parameter, including the proximity distance, can be changed and additional steps | 6 parameter, including the proximity distance, can be changed and additional steps |
8 can be easily added. For example additional filtering to refine the initial BLAST | 7 can be easily added. For example additional filtering to refine the initial BLAST |
9 hits, or inclusion of a third query sequence. | 8 hits, or inclusion of a third query sequence. |
10 | 9 |
11 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png | 10 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/find_three_genes_located_nearby.png |
12 | 11 |
13 | 12 |
14 Sample Data | 13 Sample Data |
15 =========== | 14 =========== |
16 | 15 |
17 As an example, we will use two protein sequences from *Streptomyces aurantiacus* | 16 As an example, we will use three protein sequences from *Pan troglodytes* (Chimpanzee) |
18 that are part of a gene cluster, responsible for metabolite producion. | 17 which are part of the β-globin cluster. |
19 | 18 |
20 You can upload both sequences directly into Galaxy using the "Upload File" tool | 19 You can upload all sequences directly into Galaxy using the "Upload tool" |
21 with either of these URLs - Galaxy should recognise this is FASTA files. | 20 with either of these URLs - Galaxy should recognise this is FASTA files. |
22 | 21 |
23 * `WP_037658548.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta>`_ | 22 Query sequences: |
24 * `WP_037658557.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta>`_ | |
25 | 23 |
26 In addition you can find both sequences at the NCBI server: | 24 * `P61920.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61920.fasta>`_ |
27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450) | 25 * `P61921.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61921.fasta>`_ |
28 :: | 26 * `Q6LDH1.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/Q6LDH1.fasta>`_ |
29 | 27 |
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus] | 28 Genome sequence: |
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA | 29 |
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL | 30 * http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz |
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ | |
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH | |
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ | |
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW | |
37 | 31 |
38 | 32 |
39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase) | 33 In addition you can find the query sequences at the UniProt server: |
34 * http://www.uniprot.org/uniprot/P61920 (Hemoglobin subunit gamma-1) | |
40 :: | 35 :: |
41 | 36 |
42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus] | 37 >sp|P61920|HBG1_PANTR Hemoglobin subunit gamma-1 OS=Pan troglodytes GN=HBG1 PE=1 SV=2 |
43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR | 38 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK |
44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF | 39 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG |
45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD | 40 KEFTPEVQASWQKMVTAVASALSSRYH |
46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR | 41 |
47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG | 42 |
48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER | 43 * http://www.uniprot.org/uniprot/P61921 (Hemoglobin subunit gamma-2) |
49 NAA | 44 :: |
45 | |
46 >sp|P61921|HBG2_PANTR Hemoglobin subunit gamma-2 OS=Pan troglodytes GN=HBG2 PE=1 SV=2 | |
47 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK | |
48 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG | |
49 KEFTPEVQASWQKMVTGVASALSSRYH | |
50 | |
51 | |
52 * http://www.uniprot.org/uniprot/Q6LDH1 (Hemoglobin subunit epsilon) | |
53 :: | |
54 | |
55 >sp|Q6LDH1|HBE_PANTR Hemoglobin subunit epsilon OS=Pan troglodytes GN=HBE1 PE=2 SV=3 | |
56 MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK | |
57 VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG | |
58 KEFTPEVQAAWQKLVSAVAIALAHKYH | |
50 | 59 |
51 | 60 |
52 Citation | 61 Citation |
53 ======== | 62 ======== |
54 | 63 |
66 Availability | 75 Availability |
67 ============ | 76 ============ |
68 | 77 |
69 This workflow is available on the main Galaxy Tool Shed: | 78 This workflow is available on the main Galaxy Tool Shed: |
70 | 79 |
71 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow | 80 http://toolshed.g2.bx.psu.edu/view/bgruening/find_three_genes_located_nearby_workflow |
72 | 81 |
73 Development is being done on github: | 82 Development is being done on github: |
74 | 83 |
75 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_genes_located_nearby | 84 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby |
76 | 85 |
77 | 86 |
78 Dependencies | 87 Dependencies |
79 ============ | 88 ============ |
80 | 89 |