Mercurial > repos > bgruening > find_three_genes_located_nearby_workflow
comparison readme.rst @ 2:b876c71cc0b1 draft
Uploaded
author | bgruening |
---|---|
date | Tue, 17 Mar 2015 13:33:43 -0400 |
parents | 3dd8f7b4703b |
children | 257453ff1f3d |
comparison
equal
deleted
inserted
replaced
1:3dd8f7b4703b | 2:b876c71cc0b1 |
---|---|
1 Galaxy workflow for the identification of candidate genes clusters | 1 Galaxy workflow for the identification of candidate genes clusters |
2 ------------------------------------------------------------------ | 2 ------------------------------------------------------------------ |
3 | 3 |
4 This approach screens three proteins against a given genome sequence, leading to a genome position | 4 This approach screens two proteins against all nucleotide sequence from the |
5 were all three genes are located nearby. As usual in Galaxy workflows every | 5 NCBI nt database within hours on our cluster, leading to all organisms with an inter- |
6 esting gene structure for further investigation. As usual in Galaxy workflows every | |
6 parameter, including the proximity distance, can be changed and additional steps | 7 parameter, including the proximity distance, can be changed and additional steps |
7 can be easily added. For example additional filtering to refine the initial BLAST | 8 can be easily added. For example additional filtering to refine the initial BLAST |
8 hits, or inclusion of a third query sequence. | 9 hits, or inclusion of a third query sequence. |
9 | 10 |
10 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/find_three_genes_located_nearby.png | 11 .. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png |
11 | 12 |
12 | 13 |
13 Sample Data | 14 Sample Data |
14 =========== | 15 =========== |
15 | 16 |
16 As an example, we will use three protein sequences from *Pan troglodytes* (Chimpanzee) | 17 As an example, we will use two protein sequences from *Streptomyces aurantiacus* |
17 which are part of the β-globin cluster. | 18 that are part of a gene cluster, responsible for metabolite producion. |
18 | 19 |
19 You can upload all sequences directly into Galaxy using the "Upload tool" | 20 You can upload both sequences directly into Galaxy using the "Upload File" tool |
20 with either of these URLs - Galaxy should recognise this is FASTA files. | 21 with either of these URLs - Galaxy should recognise this is FASTA files. |
21 | 22 |
22 Query sequences. | 23 * `WP_037658548.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta>`_ |
23 * `P61920.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61920.fasta>`_ | 24 * `WP_037658557.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta>`_ |
24 * `P61921.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61921.fasta>`_ | |
25 * `Q6LDH1.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/Q6LDH1.fasta>`_ | |
26 | 25 |
27 Genome sequence: | 26 In addition you can find both sequences at the NCBI server: |
28 * http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz | 27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450) |
28 :: | |
29 | |
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus] | |
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA | |
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL | |
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ | |
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH | |
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ | |
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW | |
29 | 37 |
30 | 38 |
31 In addition you can find the query sequences at the UniProt server: | 39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase) |
32 * http://www.uniprot.org/uniprot/P61920 (Hemoglobin subunit gamma-1) | |
33 :: | 40 :: |
34 | 41 |
35 >sp|P61920|HBG1_PANTR Hemoglobin subunit gamma-1 OS=Pan troglodytes GN=HBG1 PE=1 SV=2 | 42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus] |
36 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK | 43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR |
37 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG | 44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF |
38 KEFTPEVQASWQKMVTAVASALSSRYH | 45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD |
39 | 46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR |
40 | 47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG |
41 * http://www.uniprot.org/uniprot/P61921 (Hemoglobin subunit gamma-2) | 48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER |
42 :: | 49 NAA |
43 | |
44 >sp|P61921|HBG2_PANTR Hemoglobin subunit gamma-2 OS=Pan troglodytes GN=HBG2 PE=1 SV=2 | |
45 MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK | |
46 VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG | |
47 KEFTPEVQASWQKMVTGVASALSSRYH | |
48 | |
49 | |
50 * http://www.uniprot.org/uniprot/Q6LDH1 (Hemoglobin subunit epsilon) | |
51 :: | |
52 | |
53 >sp|Q6LDH1|HBE_PANTR Hemoglobin subunit epsilon OS=Pan troglodytes GN=HBE1 PE=2 SV=3 | |
54 MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK | |
55 VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG | |
56 KEFTPEVQAAWQKLVSAVAIALAHKYH | |
57 | 50 |
58 | 51 |
59 Citation | 52 Citation |
60 ======== | 53 ======== |
61 | 54 |
73 Availability | 66 Availability |
74 ============ | 67 ============ |
75 | 68 |
76 This workflow is available on the main Galaxy Tool Shed: | 69 This workflow is available on the main Galaxy Tool Shed: |
77 | 70 |
78 http://toolshed.g2.bx.psu.edu/view/bgruening/find_three_genes_located_nearby_workflow | 71 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow |
79 | 72 |
80 Development is being done on github: | 73 Development is being done on github: |
81 | 74 |
82 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby | 75 https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_genes_located_nearby |
83 | 76 |
84 | 77 |
85 Dependencies | 78 Dependencies |
86 ============ | 79 ============ |
87 | 80 |