Mercurial > repos > bgruening > find_genes_located_nearby_workflow
comparison readme.md @ 0:904dd53d622c draft
Uploaded
author | bgruening |
---|---|
date | Sun, 22 Feb 2015 12:20:55 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:904dd53d622c |
---|---|
1 Galaxy workflow for the identification of candidate genes clusters | |
2 ------------------------------------------------------------------ | |
3 | |
4 This approach screens two proteins against all nucleotide sequence from the | |
5 NCBI nt database within hours on our cluster, leading to all organisms with an inter- | |
6 esting gene structure for further investigation. As usual in Galaxy workflows every | |
7 parameter, including the proximity distance, can be changed and additional steps | |
8 can be easily added. For example additional filtering to refine the initial BLAST | |
9 hits, or inclusion of a third query sequence. | |
10 | |
11  | |
12 | |
13 | |
14 Sample Data | |
15 =========== | |
16 | |
17 As an example, we will use two protein sequences from *Streptomyces aurantiacus* | |
18 that are part of a gene cluster, responsible for metabolite producion. | |
19 | |
20 You can upload both sequences directly into Galaxy using the "Upload File" tool | |
21 with either of these URLs - Galaxy should recognise this is FASTA files. | |
22 | |
23 * https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta | |
24 * https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta | |
25 | |
26 In addition you can find both sequences at the NCBI server: | |
27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450) | |
28 | |
29 ```text | |
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus] | |
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA | |
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL | |
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ | |
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH | |
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ | |
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW | |
37 ``` | |
38 | |
39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase) | |
40 | |
41 ``` | |
42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus] | |
43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR | |
44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF | |
45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD | |
46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR | |
47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG | |
48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER | |
49 NAA | |
50 ``` | |
51 | |
52 | |
53 Citation | |
54 ======== | |
55 | |
56 If you use this workflow directly, or a derivative of it, or the associated | |
57 NCBI BLAST wrappers for Galaxy, in work leading to a scientific publication, | |
58 please cite: | |
59 | |
60 Peter J. A. Cock, John M. Chilton, Björn Grüning, James E. Johnson, Nicola Soranzo | |
61 NCBI BLAST+ integrated into Galaxy | |
62 | |
63 http://biorxiv.org/content/early/2015/01/21/014043 | |
64 http://dx.doi.org/10.1101/014043 | |
65 | |
66 | |
67 Availability | |
68 ============ | |
69 | |
70 This workflow is available on the main Galaxy Tool Shed: | |
71 | |
72 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow | |
73 | |
74 Development is being done on github: | |
75 | |
76 https://github.com/bgruening/galaxytools/workflows/ncbi_blast_plus/ | |
77 | |
78 | |
79 Dependencies | |
80 ============ | |
81 | |
82 These dependencies should be resolved automatically via the Galaxy Tool Shed: | |
83 | |
84 * http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus |