diff readme.rst @ 3:257453ff1f3d draft default tip

Uploaded
author bgruening
date Tue, 17 Mar 2015 13:39:34 -0400
parents b876c71cc0b1
children
line wrap: on
line diff
--- a/readme.rst	Tue Mar 17 13:33:43 2015 -0400
+++ b/readme.rst	Tue Mar 17 13:39:34 2015 -0400
@@ -1,52 +1,61 @@
 Galaxy workflow for the identification of candidate genes clusters
 ------------------------------------------------------------------
 
-This approach screens two proteins against all nucleotide sequence from the
-NCBI nt database within hours on our cluster, leading to all organisms with an inter-
-esting gene structure for further investigation. As usual in Galaxy workflows every
+This approach screens three proteins against a given genome sequence, leading to a genome position
+were all three genes are located nearby. As usual in Galaxy workflows every
 parameter, including the proximity distance, can be changed and additional steps
 can be easily added. For example additional filtering to refine the initial BLAST
 hits, or inclusion of a third query sequence.
 
-.. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png
+.. image:: https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/find_three_genes_located_nearby.png
 
 
 Sample Data
 ===========
 
-As an example, we will use two protein sequences from *Streptomyces aurantiacus*
-that are part of a gene cluster, responsible for metabolite producion.
+As an example, we will use three protein sequences from *Pan troglodytes* (Chimpanzee)
+which are part of the β-globin cluster.
 
-You can upload both sequences directly into Galaxy using the "Upload File" tool
+You can upload all sequences directly into Galaxy using the "Upload tool"
 with either of these URLs - Galaxy should recognise this is FASTA files.
 
-* `WP_037658548.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta>`_
-* `WP_037658557.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta>`_
+Query sequences:
 
-In addition you can find both sequences at the NCBI server:
- * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450)
-   ::
-   
-     >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus]
-     MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA
-     LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL
-     VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ
-     GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH
-     RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ
-     VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW
+* `P61920.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61920.fasta>`_
+* `P61921.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/P61921.fasta>`_
+* `Q6LDH1.fasta <https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby/Q6LDH1.fasta>`_
+
+Genome sequence:
+
+* http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz
 
 
- * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase)
+In addition you can find the query sequences at the UniProt server:
+ * http://www.uniprot.org/uniprot/P61920 (Hemoglobin subunit gamma-1)
+   ::
+
+     >sp|P61920|HBG1_PANTR Hemoglobin subunit gamma-1 OS=Pan troglodytes GN=HBG1 PE=1 SV=2
+     MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
+     VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
+     KEFTPEVQASWQKMVTAVASALSSRYH
+
+
+ * http://www.uniprot.org/uniprot/P61921 (Hemoglobin subunit gamma-2)
    ::
-  
-     >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus]
-     MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR
-     MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF
-     YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD
-     AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR
-     PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG
-     AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER
-     NAA
+
+     >sp|P61921|HBG2_PANTR Hemoglobin subunit gamma-2 OS=Pan troglodytes GN=HBG2 PE=1 SV=2
+     MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
+     VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
+     KEFTPEVQASWQKMVTGVASALSSRYH
+
+
+ * http://www.uniprot.org/uniprot/Q6LDH1 (Hemoglobin subunit epsilon)
+   ::
+
+     >sp|Q6LDH1|HBE_PANTR Hemoglobin subunit epsilon OS=Pan troglodytes GN=HBE1 PE=2 SV=3
+     MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK
+     VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG
+     KEFTPEVQAAWQKLVSAVAIALAHKYH
 
 
 Citation
@@ -68,11 +77,11 @@
 
 This workflow is available on the main Galaxy Tool Shed:
 
-http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow
+http://toolshed.g2.bx.psu.edu/view/bgruening/find_three_genes_located_nearby_workflow
 
 Development is being done on github:
 
-https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_genes_located_nearby
+https://github.com/bgruening/galaxytools/tree/master/workflows/ncbi_blast_plus/find_three_genes_located_nearby
 
 
 Dependencies