annotate readme.md @ 0:904dd53d622c draft

Uploaded
author bgruening
date Sun, 22 Feb 2015 12:20:55 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
904dd53d622c Uploaded
bgruening
parents:
diff changeset
1 Galaxy workflow for the identification of candidate genes clusters
904dd53d622c Uploaded
bgruening
parents:
diff changeset
2 ------------------------------------------------------------------
904dd53d622c Uploaded
bgruening
parents:
diff changeset
3
904dd53d622c Uploaded
bgruening
parents:
diff changeset
4 This approach screens two proteins against all nucleotide sequence from the
904dd53d622c Uploaded
bgruening
parents:
diff changeset
5 NCBI nt database within hours on our cluster, leading to all organisms with an inter-
904dd53d622c Uploaded
bgruening
parents:
diff changeset
6 esting gene structure for further investigation. As usual in Galaxy workflows every
904dd53d622c Uploaded
bgruening
parents:
diff changeset
7 parameter, including the proximity distance, can be changed and additional steps
904dd53d622c Uploaded
bgruening
parents:
diff changeset
8 can be easily added. For example additional filtering to refine the initial BLAST
904dd53d622c Uploaded
bgruening
parents:
diff changeset
9 hits, or inclusion of a third query sequence.
904dd53d622c Uploaded
bgruening
parents:
diff changeset
10
904dd53d622c Uploaded
bgruening
parents:
diff changeset
11 ![Workflow Image](https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png)
904dd53d622c Uploaded
bgruening
parents:
diff changeset
12
904dd53d622c Uploaded
bgruening
parents:
diff changeset
13
904dd53d622c Uploaded
bgruening
parents:
diff changeset
14 Sample Data
904dd53d622c Uploaded
bgruening
parents:
diff changeset
15 ===========
904dd53d622c Uploaded
bgruening
parents:
diff changeset
16
904dd53d622c Uploaded
bgruening
parents:
diff changeset
17 As an example, we will use two protein sequences from *Streptomyces aurantiacus*
904dd53d622c Uploaded
bgruening
parents:
diff changeset
18 that are part of a gene cluster, responsible for metabolite producion.
904dd53d622c Uploaded
bgruening
parents:
diff changeset
19
904dd53d622c Uploaded
bgruening
parents:
diff changeset
20 You can upload both sequences directly into Galaxy using the "Upload File" tool
904dd53d622c Uploaded
bgruening
parents:
diff changeset
21 with either of these URLs - Galaxy should recognise this is FASTA files.
904dd53d622c Uploaded
bgruening
parents:
diff changeset
22
904dd53d622c Uploaded
bgruening
parents:
diff changeset
23 * https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta
904dd53d622c Uploaded
bgruening
parents:
diff changeset
24 * https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta
904dd53d622c Uploaded
bgruening
parents:
diff changeset
25
904dd53d622c Uploaded
bgruening
parents:
diff changeset
26 In addition you can find both sequences at the NCBI server:
904dd53d622c Uploaded
bgruening
parents:
diff changeset
27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450)
904dd53d622c Uploaded
bgruening
parents:
diff changeset
28
904dd53d622c Uploaded
bgruening
parents:
diff changeset
29 ```text
904dd53d622c Uploaded
bgruening
parents:
diff changeset
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus]
904dd53d622c Uploaded
bgruening
parents:
diff changeset
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA
904dd53d622c Uploaded
bgruening
parents:
diff changeset
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL
904dd53d622c Uploaded
bgruening
parents:
diff changeset
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ
904dd53d622c Uploaded
bgruening
parents:
diff changeset
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH
904dd53d622c Uploaded
bgruening
parents:
diff changeset
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ
904dd53d622c Uploaded
bgruening
parents:
diff changeset
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW
904dd53d622c Uploaded
bgruening
parents:
diff changeset
37 ```
904dd53d622c Uploaded
bgruening
parents:
diff changeset
38
904dd53d622c Uploaded
bgruening
parents:
diff changeset
39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase)
904dd53d622c Uploaded
bgruening
parents:
diff changeset
40
904dd53d622c Uploaded
bgruening
parents:
diff changeset
41 ```
904dd53d622c Uploaded
bgruening
parents:
diff changeset
42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus]
904dd53d622c Uploaded
bgruening
parents:
diff changeset
43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR
904dd53d622c Uploaded
bgruening
parents:
diff changeset
44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF
904dd53d622c Uploaded
bgruening
parents:
diff changeset
45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD
904dd53d622c Uploaded
bgruening
parents:
diff changeset
46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR
904dd53d622c Uploaded
bgruening
parents:
diff changeset
47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG
904dd53d622c Uploaded
bgruening
parents:
diff changeset
48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER
904dd53d622c Uploaded
bgruening
parents:
diff changeset
49 NAA
904dd53d622c Uploaded
bgruening
parents:
diff changeset
50 ```
904dd53d622c Uploaded
bgruening
parents:
diff changeset
51
904dd53d622c Uploaded
bgruening
parents:
diff changeset
52
904dd53d622c Uploaded
bgruening
parents:
diff changeset
53 Citation
904dd53d622c Uploaded
bgruening
parents:
diff changeset
54 ========
904dd53d622c Uploaded
bgruening
parents:
diff changeset
55
904dd53d622c Uploaded
bgruening
parents:
diff changeset
56 If you use this workflow directly, or a derivative of it, or the associated
904dd53d622c Uploaded
bgruening
parents:
diff changeset
57 NCBI BLAST wrappers for Galaxy, in work leading to a scientific publication,
904dd53d622c Uploaded
bgruening
parents:
diff changeset
58 please cite:
904dd53d622c Uploaded
bgruening
parents:
diff changeset
59
904dd53d622c Uploaded
bgruening
parents:
diff changeset
60 Peter J. A. Cock, John M. Chilton, Björn Grüning, James E. Johnson, Nicola Soranzo
904dd53d622c Uploaded
bgruening
parents:
diff changeset
61 NCBI BLAST+ integrated into Galaxy
904dd53d622c Uploaded
bgruening
parents:
diff changeset
62
904dd53d622c Uploaded
bgruening
parents:
diff changeset
63 http://biorxiv.org/content/early/2015/01/21/014043
904dd53d622c Uploaded
bgruening
parents:
diff changeset
64 http://dx.doi.org/10.1101/014043
904dd53d622c Uploaded
bgruening
parents:
diff changeset
65
904dd53d622c Uploaded
bgruening
parents:
diff changeset
66
904dd53d622c Uploaded
bgruening
parents:
diff changeset
67 Availability
904dd53d622c Uploaded
bgruening
parents:
diff changeset
68 ============
904dd53d622c Uploaded
bgruening
parents:
diff changeset
69
904dd53d622c Uploaded
bgruening
parents:
diff changeset
70 This workflow is available on the main Galaxy Tool Shed:
904dd53d622c Uploaded
bgruening
parents:
diff changeset
71
904dd53d622c Uploaded
bgruening
parents:
diff changeset
72 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow
904dd53d622c Uploaded
bgruening
parents:
diff changeset
73
904dd53d622c Uploaded
bgruening
parents:
diff changeset
74 Development is being done on github:
904dd53d622c Uploaded
bgruening
parents:
diff changeset
75
904dd53d622c Uploaded
bgruening
parents:
diff changeset
76 https://github.com/bgruening/galaxytools/workflows/ncbi_blast_plus/
904dd53d622c Uploaded
bgruening
parents:
diff changeset
77
904dd53d622c Uploaded
bgruening
parents:
diff changeset
78
904dd53d622c Uploaded
bgruening
parents:
diff changeset
79 Dependencies
904dd53d622c Uploaded
bgruening
parents:
diff changeset
80 ============
904dd53d622c Uploaded
bgruening
parents:
diff changeset
81
904dd53d622c Uploaded
bgruening
parents:
diff changeset
82 These dependencies should be resolved automatically via the Galaxy Tool Shed:
904dd53d622c Uploaded
bgruening
parents:
diff changeset
83
904dd53d622c Uploaded
bgruening
parents:
diff changeset
84 * http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus