Mercurial > repos > peterjc > secreted_protein_workflow

Binary file N_abberans_piechart_mouseover.png has changed
--- a/README.rst	Mon Mar 30 11:46:13 2015 -0400
+++ b/README.rst	Wed Feb 01 13:21:32 2017 -0500
@@ -1,180 +1,99 @@
-Introduction
-============
-
-Galaxy is a web-based platform for biological data analysis, supporting
-extension with additional tools (often wrappers for existing command line
-tools) and datatypes. See http://www.galaxyproject.org/ and the public
-server at http://usegalaxy.org for an example.
+This is package is a Galaxy workflow for the identification of candidate
+secreted proteins from a given protein FASTA file.

-The NCBI BLAST suite is a widely used set of tools for biological sequence
-comparison. It is available as standalone binaries for use at the command
-line, and via the NCBI website for smaller searches. For more details see
-http://blast.ncbi.nlm.nih.gov/Blast.cgi
+It runs SignalP v3.0 (Bendtsen et al. 2004) and selects only proteins with a
+strong predicted signal peptide, and then runs TMHMM v2.0 (Krogh et al. 2001)
+on those, and selects only proteins without a predicted trans-membrane helix.
+This workflow was used in Kikuchi et al. (2011), and is a simplification of
+the candidate effector protocol described in Jones et al. (2009).

-This is an example workflow using the Galaxy wrappers for NCBI BLAST+,
-see https://github.com/peterjc/galaxy_blast
+See http://www.galaxyproject.org for information about the Galaxy Project.


-Galaxy workflow for counting species of top BLAST hits
-======================================================
+Availability
+============

-This Galaxy workflow (file ``blast_top_hit_species.ga``) is intended for an
-initial assessment of a transcriptome assembly to give a crude indication of
-any major contamination present based on the species of the top BLAST hit
-of 1000 representative sequences.
+This workflow is available to download and/or install from the main
+Galaxy Tool Shed:

-.. image:: https://raw.githubusercontent.com/peterjc/galaxy_blast/master/workflows/blast_top_hit_species/blast_top_hit_species.png
-
-In words, the workflow proceeds as follows:
+http://toolshed.g2.bx.psu.edu/view/peterjc/secreted_protein_workflow

-1. Upload/import your transcriptome assembly or any nucleotide FASTA file.
-2. Samples 1000 representative sequences, selected uniformly/evenly though
-   the file.
-3. Convert the sampled FASTA file into a three column tabular file.
-4. Runs NCBI BLASTX of the sampled FASTA file against the latest NCBI ``nr``
-   database (assuming this is already available setup on your local Galaxy
-   under the alias ``nr``), requesting tabular output including the taxonomy
-   fields, and at most one matching target sequence.
-5. Remove any duplicate alignments (multiple HSPs for the same match).
-6. Combine the filtered BLAST output with the tabular version of the 1000
-   sequences to give a new tabular file with exactly 1000 lines, adding
-   ``None`` for sequences missing a BLAST hit.
-7. Count the BLAST species names in this file.
-8. Sort the counts.
+Test releases (which should not normally be used) are on the Test Tool Shed:
+
+http://testtoolshed.g2.bx.psu.edu/view/peterjc/secreted_protein_workflow

-Finally we would suggest visualising the sorted tally table as a Pie Chart.
+Development is being done on github here:
+
+https://github.com/peterjc/pico_galaxy/tree/master/workflows/secreted_protein_workflow


 Sample Data
 ===========

-As an example, you can upload the transcriptome assembly of the nematode
-*Nacobbus abberans* from Eves van den Akker *et al.* (2015),
-http://dx.doi.org/10.1093/gbe/evu171 using this URL:
-
-http://nematode.net/Data/nacobbus_aberrans_transcript_assembly/N.abberans_reference_no_contam.zip
-
-Running this workflow with a copy of the NCBI non-redundant ``nr`` database
-from 16 Oct 2014 (which did **not** contain this *N. abberans* dataset) gave
-the following results - note 609 out of the 1000 sequences gave no BLAST hit.
-
-===== ==================
-Count Subject Blast Name
------ ------------------
-  609 None
-  244 nematodes
-   30 ascomycetes
-   27 eukaryotes
-    8 basidiomycetes
-    6 aphids
-    5 eudicots
-    5 flies
-  ... ...
-===== ==================
+This workflow was developed and run on several nematode species. For example,
+try the protein set for *Bursaphelenchus xylophilus* (Kikuchi et al. 2011):

-As you might guess from	the filename ``N.abberans_reference_no_contam.fasta``,
-this transcriptome assembly has already had obvious contamination removed.
-
-At the time of writing, Galaxy's visualizations could not be included in
-a workflow. You can generate a pie chart from the final count file using
-the counts (c1) and labels (c2), like this:
-
-.. image:: https://raw.githubusercontent.com/peterjc/galaxy_blast/master/workflows/blast_top_hit_species/N_abberans_piechart_mouseover.png
-
-Note the nematode count in this image was shown as a mouse-over effect.
-
-
-Disclaimer
-==========
-
-Species assignment by top BLAST hit is not suitable for any in depth
-analysis. It is particularly prone to false positives where contaminants
-in public datasets are mislabelled. See for example Ed Yong (2015),
-"There's No Plague on the NYC Subway. No Platypuses Either.":
-
-http://phenomena.nationalgeographic.com/2015/02/10/theres-no-plague-on-the-nyc-subway-no-platypuses-either/
-
-
-Known Issues
-============
+ftp://ftp.sanger.ac.uk/pub/pathogens/Bursaphelenchus/xylophilus/Assembly-v1.2/BUX.v1.2.genedb.protein.fa.gz

-Counts
-------
-
-This workflow uses the Galaxy "Count" tool, version 1.0.0, as shipped with
-the current stable release (Galaxy v15.03, i.e. March 2015).
-
-The updated "Count" tool version 1.0.1 includes a fix not to remove spaces
-in the fields being counted. In the example above, while the top hits are
-not affected, minor entries like "cellular slime molds" are shown as
-"cellularslimemolds" instead (look closely at the Pie Chart key)..
-
-The updated "Count" tool version 1.0.1 also adds a new option to sort the
-output, which avoids the additional sorting step in the current version of
-the workflow.
-
-A future update to this workflow will use the revised "Count" tool, once
-this is included in the next stable Galaxy release - or migrated to the
-Galaxy Tool Shed.
-
-NCBI nr database
-----------------
-
-The use of external datasets within Galaxy via the ``*.loc`` configuration
-files undermines provenance tracking within Galaxy. This is exacerbated
-by the lack of officially versioned BLAST database releases by the NCBI.
-
-This workflow assumes that you have an entry ``nr`` in your ``blastdb_p.loc``
-(the configuration file listing locally installed BLAST databases external
-to Galaxy - consult the NCBI BLAST+ wrapper documentation for more details),
-and that this points to a mirror of the latest NCBI "non-redundant" database
-from ftp://ftp.ncbi.nlm.nih.gov/blast/db/
-
-i.e. The workflow is intended to be used against the *latest* nr database,
-and thus is not reproducible over the long term as the database changes.
-
-
-Availability
-============
-
-This workflow is available to download and/or install from the main Galaxy Tool Shed:
-
-http://toolshed.g2.bx.psu.edu/view/peterjc/blast_top_hit_species
-
-Test releases (which should not normally be used) are on the Test Tool Shed:
-
-http://testtoolshed.g2.bx.psu.edu/view/peterjc/blast_top_hit_species
-
-Development is being done on github here:
-
-https://github.com/peterjc/galaxy_blast/tree/master/workflows/blast_top_hit_species
+You can upload this directly into Galaxy via this URL. Galaxy will handle
+removing the gzip compression to give you the FASTA protein file which has
+18,074 sequences. The expected result (selecting organism type Eukaryote)
+is a FASTA protein file of 2,297 predicted secreted protein sequences.


 Citation
 ========

-Please cite the following paper (currently available as a preprint):
+If you use this workflow directly, or a derivative of it, in work leading
+to a scientific publication, please cite:

-NCBI BLAST+ integrated into Galaxy.
-P.J.A. Cock, J.M. Chilton, B. Gruening, J.E. Johnson, N. Soranzo
-bioRxiv DOI: http://dx.doi.org/10.1101/014043 (preprint)
+Cock, P.J.A. and Pritchard, L. (2014). Galaxy as a platform for identifying
+candidate pathogen effectors. Chapter 1 in "Plant-Pathogen Interactions:
+Methods and Protocols (Second Edition)"; P. Birch, J. Jones, and J.I. Bos, eds.
+Methods in Molecular Biology. Humana Press, Springer. ISBN 978-1-62703-985-7.
+http://www.springer.com/life+sciences/plant+sciences/book/978-1-62703-985-7

-You should also cite Galaxy, and the NCBI BLAST+ tools:
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167

-BLAST+: architecture and applications.
-C. Camacho et al. BMC Bioinformatics 2009, 10:421.
-DOI: http://dx.doi.org/10.1186/1471-2105-10-421
+Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S. (2004)
+Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340: 783–95.
+http://dx.doi.org/10.1016/j.jmb.2004.05.028
+
+Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E. (2001)
+Predicting transmembrane protein topology with a hidden Markov model:
+application to complete genomes. J Mol Biol 305: 567- 580.
+http://dx.doi.org/10.1006/jmbi.2000.4315


-Automated Installation
-======================
+Additional References
+=====================
+
+Kikuchi, T., Cotton, J.A., Dalzell, J.J., Hasegawa. K., et al. (2011)
+Genomic insights into the origin of parasitism in the emerging plant
+pathogen *Bursaphelenchus xylophilus*. PLoS Pathog 7: e1002219.
+http://dx.doi.org/10.1371/journal.ppat.1002219

-Installation via the Galaxy Tool Shed should take care of the dependencies
-on Galaxy tools including the NCBI BLAST+ wrappers and associated binaries.
+Jones, J.T., Kumar, A., Pylypenko, L.A., Thirugnanasambandam, A., et al. (2009)
+Identification and functional characterization of effectors in expressed
+sequence tags from various life cycle stages of the potato cyst nematode
+*Globodera pallida*. Mol Plant Pathol 10: 815–28.
+http://dx.doi.org/10.1111/j.1364-3703.2009.00585.x
+

-However, this workflow requires a current version of the NCBI nr protein
-BLAST database to be listed in ``blastdb_p.loc`` with the key ``nr`` (lower
-case).
+Dependencies
+============
+
+These dependencies should be resolved automatically via the Galaxy Tool Shed:
+
+* http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp
+* http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id
+
+However, at the time of writing those Galaxy tools have their own
+dependencies required for this workflow which require manual
+installation (SignalP v3.0 and TMHMM v2.0).


 History
@@ -183,7 +102,13 @@
 ======= ======================================================================
 Version Changes
 ------- ----------------------------------------------------------------------
-v0.1.0  - Initial Tool Shed release, targetting NCBI BLAST+ 2.2.29
+v0.0.1  - Initial release to Tool Shed (May, 2013)
+        - Expanded README file to include example data
+v0.0.2  - Updated versions of the tools used, inclulding core Galaxy Filter
+          tool to avoid warning about new ``header_lines`` parameter.
+        - Added link to Tool Shed in the workflow annotation explaining there
+          is a README file with sample data, and a requested citation.
+v0.0.3  - Use MIT licence.
 ======= ======================================================================


@@ -192,20 +117,18 @@

 This workflow is under source code control here:

-https://github.com/peterjc/galaxy_blast/tree/master/workflows/blast_top_hit_species
+https://github.com/peterjc/pico_galaxy/tree/master/workflows/secreted_protein_workflow

 To prepare the tar-ball for uploading to the Tool Shed, I use this:

-    $ tar -cf blast_top_hit_species.tar.gz README.rst repository_dependencies.xml blast_top_hit_species.ga blast_top_hit_species.png N_abberans_piechart_mouseover.png
+    $ tar -cf secreted_protein_workflow.tar.gz README.rst repository_dependencies.xml secreted_protein_workflow.ga

 Check this,

-    $ tar -tzf blast_top_hit_species.tar.gz
+    $ tar -tzf secreted_protein_workflow.tar.gz
     README.rst
     repository_dependencies.xml
-    blast_top_hit_species.ga
-    blast_top_hit_species.png
-    N_abberans_piechart_mouseover.png
+    secreted_protein_workflow.ga


 Licence (MIT)
--- a/blast_top_hit_species.ga	Mon Mar 30 11:46:13 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,331 +0,0 @@
-{
-    "a_galaxy_workflow": "true",
-    "annotation": "",
-    "format-version": "0.1",
-    "name": "Species of top BLAST hits",
-    "steps": {
-        "0": {
-            "annotation": "",
-            "id": 0,
-            "input_connections": {},
-            "inputs": [
-                {
-                    "description": "",
-                    "name": "Transcriptome FASTA file"
-                }
-            ],
-            "label": null,
-            "name": "Input dataset",
-            "outputs": [],
-            "position": {
-                "left": 242,
-                "top": 119
-            },
-            "tool_errors": null,
-            "tool_id": null,
-            "tool_state": "{\"name\": \"Transcriptome FASTA file\"}",
-            "tool_version": null,
-            "type": "data_input",
-            "user_outputs": [],
-            "uuid": "e445b44b-02a7-4fd1-8944-cd680f967062"
-        },
-        "1": {
-            "annotation": "This workflow is deliberately a simple/crude assessment, and there is no need to run BLASTX on all the sequences - a sample of 1000 should be enough.",
-            "id": 1,
-            "input_connections": {
-                "input_file": {
-                    "id": 0,
-                    "output_name": "output"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "Sub-sample sequences files",
-            "outputs": [
-                {
-                    "name": "output_file",
-                    "type": "input"
-                }
-            ],
-            "position": {
-                "left": 435,
-                "top": 119
-            },
-            "post_job_actions": {
-                "RenameDatasetActionoutput_file": {
-                    "action_arguments": {
-                        "newname": "1000 sequences from #{input_file}"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "output_file"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "toolshed.g2.bx.psu.edu/repos/peterjc/sample_seqs/sample_seqs/0.2.1",
-            "tool_state": "{\"__page__\": 0, \"input_file\": \"null\", \"__rerun_remap_job_id__\": null, \"sampling\": \"{\\\"count\\\": \\\"1000\\\", \\\"type\\\": \\\"desired_count\\\", \\\"__current_case__\\\": 2}\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"interleaved\": \"\\\"False\\\"\"}",
-            "tool_version": "0.2.1",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "87ce69ef-5fb0-41b0-9575-d3b96544f8be"
-        },
-        "2": {
-            "annotation": "We only want one line per query, so limit this to the best scoring target sequence. Assumes current NCBI nr database is available locally as \"nr\".",
-            "id": 2,
-            "input_connections": {
-                "query": {
-                    "id": 1,
-                    "output_name": "output_file"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "NCBI BLAST+ blastx",
-            "outputs": [
-                {
-                    "name": "output1",
-                    "type": "tabular"
-                }
-            ],
-            "position": {
-                "left": 489,
-                "top": 263
-            },
-            "post_job_actions": {
-                "RenameDatasetActionoutput1": {
-                    "action_arguments": {
-                        "newname": "Top BLAST match"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "output1"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/0.1.01",
-            "tool_state": "{\"evalue_cutoff\": \"\\\"0.001\\\"\", \"__page__\": 0, \"adv_opts\": \"{\\\"adv_optional_id_files_opts\\\": {\\\"adv_optional_id_files_opts_selector\\\": \\\"none\\\", \\\"__current_case__\\\": 0}, \\\"matrix\\\": \\\"BLOSUM62\\\", \\\"adv_opts_selector\\\": \\\"advanced\\\", \\\"ungapped\\\": \\\"False\\\", \\\"filter_query\\\": \\\"True\\\", \\\"word_size\\\": \\\"0\\\", \\\"__current_case__\\\": 1, \\\"parse_deflines\\\": \\\"False\\\", \\\"strand\\\": \\\"-strand both\\\", \\\"max_hits\\\": \\\"1\\\"}\", \"__rerun_remap_job_id__\": null, \"db_opts\": \"{\\\"db_opts_selector\\\": \\\"db\\\", \\\"subject\\\": \\\"\\\", \\\"histdb\\\": \\\"\\\", \\\"__current_case__\\\": 0, \\\"database\\\": \\\"nr\\\"}\", \"query_gencode\": \"\\\"1\\\"\", \"query\": \"null\", \"output\": \"{\\\"out_format\\\": \\\"cols\\\", \\\"std_cols\\\": [\\\"qseqid\\\", \\\"sseqid\\\", \\\"pident\\\", \\\"length\\\", \\\"mismatch\\\", \\\"gapopen\\\", \\\"qstart\\\", \\\"qend\\\", \\\"sstart\\\", \\\"send\\\", \\\"evalue\\\", \\\"bitscore\\\"], \\\"ids_cols\\\": null, \\\"tax_cols\\\": [\\\"staxids\\\", \\\"sscinames\\\", \\\"scomnames\\\", \\\"sblastnames\\\", \\\"sskingdoms\\\"], \\\"__current_case__\\\": 2, \\\"misc_cols\\\": null, \\\"ext_cols\\\": null}\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\"}",
-            "tool_version": "0.1.01",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "1559a0b0-0b66-40f9-b777-2f062fcda4cc"
-        },
-        "3": {
-            "annotation": "Having a tabular file of all 1000 sequences is used in the \"join\" step to count the sequences giving no BLAST hit.",
-            "id": 3,
-            "input_connections": {
-                "input": {
-                    "id": 1,
-                    "output_name": "output_file"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "FASTA-to-Tabular",
-            "outputs": [
-                {
-                    "name": "output",
-                    "type": "tabular"
-                }
-            ],
-            "position": {
-                "left": 696,
-                "top": 139
-            },
-            "post_job_actions": {
-                "HideDatasetActionoutput": {
-                    "action_arguments": {},
-                    "action_type": "HideDatasetAction",
-                    "output_name": "output"
-                },
-                "RenameDatasetActionoutput": {
-                    "action_arguments": {
-                        "newname": "1000 sequences as tabular"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "output"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/fasta_to_tabular/fasta2tab/1.1.0",
-            "tool_state": "{\"__page__\": 0, \"keep_first\": \"\\\"0\\\"\", \"descr_columns\": \"\\\"2\\\"\", \"input\": \"null\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"__rerun_remap_job_id__\": null}",
-            "tool_version": "1.1.0",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "31f11208-b2bd-4d9d-9745-dc1a6ed7ccf9"
-        },
-        "4": {
-            "annotation": "Some BLAST matches will give multiple HSPs, and thus multiple lines in the tabular output. We only want one line per query.",
-            "id": 4,
-            "input_connections": {
-                "input": {
-                    "id": 2,
-                    "output_name": "output1"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "Unique",
-            "outputs": [
-                {
-                    "name": "outfile",
-                    "type": "input"
-                }
-            ],
-            "position": {
-                "left": 665,
-                "top": 376
-            },
-            "post_job_actions": {
-                "HideDatasetActionoutfile": {
-                    "action_arguments": {},
-                    "action_type": "HideDatasetAction",
-                    "output_name": "outfile"
-                },
-                "RenameDatasetActionoutfile": {
-                    "action_arguments": {
-                        "newname": "One HSP per BLAST hit"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "outfile"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/unique/bg_uniq/0.3",
-            "tool_state": "{\"__page__\": 0, \"ignore_case\": \"\\\"False\\\"\", \"adv_opts\": \"{\\\"column_end\\\": {\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": \\\"2\\\"}, \\\"column_start\\\": {\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": \\\"1\\\"}, \\\"adv_opts_selector\\\": \\\"advanced\\\", \\\"__current_case__\\\": 1}\", \"__rerun_remap_job_id__\": null, \"is_numeric\": \"\\\"False\\\"\", \"input\": \"null\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\"}",
-            "tool_version": "0.3",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "acf948e3-71dc-4f35-8357-3998bd0abdd8"
-        },
-        "5": {
-            "annotation": "We don't need all the columns in this join, but the key is to assign \"None\" to the sequences with no BLAST hits.",
-            "id": 5,
-            "input_connections": {
-                "input1": {
-                    "id": 3,
-                    "output_name": "output"
-                },
-                "input2": {
-                    "id": 4,
-                    "output_name": "outfile"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "Join two Datasets",
-            "outputs": [
-                {
-                    "name": "out_file1",
-                    "type": "input"
-                }
-            ],
-            "position": {
-                "left": 827,
-                "top": 263
-            },
-            "post_job_actions": {
-                "HideDatasetActionout_file1": {
-                    "action_arguments": {},
-                    "action_type": "HideDatasetAction",
-                    "output_name": "out_file1"
-                },
-                "RenameDatasetActionout_file1": {
-                    "action_arguments": {
-                        "newname": "Top BLAST hits or None"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "out_file1"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "join1",
-            "tool_state": "{\"input2\": \"null\", \"__page__\": 0, \"field1\": \"{\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": \\\"1\\\"}\", \"partial\": \"\\\"\\\"\", \"field2\": \"{\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": \\\"1\\\"}\", \"__rerun_remap_job_id__\": null, \"fill_empty_columns\": \"{\\\"fill_empty_columns_switch\\\": \\\"fill_empty\\\", \\\"do_fill_empty_columns\\\": {\\\"column_fill_type\\\": \\\"single_fill_value\\\", \\\"fill_value\\\": \\\"None\\\", \\\"__current_case__\\\": 0}, \\\"fill_columns_by\\\": \\\"fill_unjoined_only\\\", \\\"__current_case__\\\": 1}\", \"unmatched\": \"\\\"-u\\\"\", \"input1\": \"null\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\"}",
-            "tool_version": "2.0.2",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "4c280b0e-b4a6-4ae4-8a81-d6e93932ef71"
-        },
-        "6": {
-            "annotation": "Here we make a tally table of the BLAST species name column",
-            "id": 6,
-            "input_connections": {
-                "input": {
-                    "id": 5,
-                    "output_name": "out_file1"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "Count",
-            "outputs": [
-                {
-                    "name": "out_file1",
-                    "type": "tabular"
-                }
-            ],
-            "position": {
-                "left": 952,
-                "top": 398
-            },
-            "post_job_actions": {
-                "HideDatasetActionout_file1": {
-                    "action_arguments": {},
-                    "action_type": "HideDatasetAction",
-                    "output_name": "out_file1"
-                },
-                "RenameDatasetActionout_file1": {
-                    "action_arguments": {
-                        "newname": "Top BLAST hit species counts (unsorted)"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "out_file1"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "Count1",
-            "tool_state": "{\"__page__\": 0, \"column\": \"{\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": [\\\"19\\\"]}\", \"__rerun_remap_job_id__\": null, \"delim\": \"\\\"T\\\"\", \"input\": \"null\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\"}",
-            "tool_version": "1.0.0",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "d3322137-1911-426d-87a7-c82b5fc16825"
-        },
-        "7": {
-            "annotation": "Sorting the counts makes the results easier to interpret directly.",
-            "id": 7,
-            "input_connections": {
-                "input": {
-                    "id": 6,
-                    "output_name": "out_file1"
-                }
-            },
-            "inputs": [],
-            "label": null,
-            "name": "Sort",
-            "outputs": [
-                {
-                    "name": "out_file1",
-                    "type": "input"
-                }
-            ],
-            "position": {
-                "left": 1056,
-                "top": 506
-            },
-            "post_job_actions": {
-                "RenameDatasetActionout_file1": {
-                    "action_arguments": {
-                        "newname": "Top BLAST hit species counts"
-                    },
-                    "action_type": "RenameDatasetAction",
-                    "output_name": "out_file1"
-                }
-            },
-            "tool_errors": null,
-            "tool_id": "sort1",
-            "tool_state": "{\"__page__\": 0, \"style\": \"\\\"num\\\"\", \"column\": \"{\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": \\\"1\\\"}\", \"__rerun_remap_job_id__\": null, \"column_set\": \"[]\", \"input\": \"null\", \"chromInfo\": \"\\\"/mnt/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"order\": \"\\\"DESC\\\"\"}",
-            "tool_version": "1.0.3",
-            "type": "tool",
-            "user_outputs": [],
-            "uuid": "c81cc61d-52a3-44ee-b646-b23e0e004c38"
-        }
-    },
-    "uuid": "9fe8754a-3a87-4f6a-89a2-141b02b4793e"
-}
\ No newline at end of file
Binary file blast_top_hit_species.png has changed
--- a/repository_dependencies.xml	Mon Mar 30 11:46:13 2015 -0400
+++ b/repository_dependencies.xml	Wed Feb 01 13:21:32 2017 -0500
@@ -1,9 +1,7 @@
 <?xml version="1.0"?>
-<repositories description="This workflow requires the NCBI BLAST+ tools etc">
-    <repository changeset_revision="5e9d5e536b79" name="ncbi_blast_plus" owner="devteam" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-    <repository changeset_revision="ae709fd50581" name="fasta_to_tabular" owner="devteam" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-    <repository changeset_revision="4231c585b6dd" name="sample_seqs" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-    <repository changeset_revision="2064ae2602b1" name="unique" owner="bgruening" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-    <!-- Also uses tool_id join1, Count1, and sort1 which are currently
-         still shipped with Galaxy itself rather than via the Tool Shed -->
+<repositories description="This requires my SignalP and TMHMM wrapers, and my FASTA filtering tool.">
+    <!-- Revision 15:6abd809cefdd on the main tool shed is v0.2.4, the current latest - but older should be OK -->
+    <repository changeset_revision="3cb02adf4326" name="tmhmm_and_signalp" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
+    <!-- Revision 2:abdd608c869b on the main tool shed is v0.0.5, the current latest - but older should be OK -->
+    <repository changeset_revision="bc263e94ea98" name="seq_filter_by_id" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
 </repositories>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/secreted_protein_workflow.ga	Wed Feb 01 13:21:32 2017 -0500
@@ -0,0 +1,288 @@
+{
+    "a_galaxy_workflow": "true",
+    "annotation": "Runs SignalP v3.0 and TMHMM v2.0 to look for secreted proteins.<br />\n<br />\nThis workflow is <a href=\"http://toolshed.g2.bx.psu.edu/view/peterjc/secreted_protein_workflow\" target=\"_blank\">available on the Galaxy Tool Shed</a> with a README file giving more information including sample data, and full citation details (Cock and Pritchard 2014).",
+    "format-version": "0.1",
+    "name": "Find secreted proteins with TMHMM and SignalP",
+    "steps": {
+        "0": {
+            "annotation": "",
+            "id": 0,
+            "input_connections": {},
+            "inputs": [
+                {
+                    "description": "",
+                    "name": "Input Dataset"
+                }
+            ],
+            "name": "Input dataset",
+            "outputs": [],
+            "position": {
+                "left": 200,
+                "top": 200
+            },
+            "tool_errors": null,
+            "tool_id": null,
+            "tool_state": "{\"name\": \"Input Dataset\"}",
+            "tool_version": null,
+            "type": "data_input",
+            "user_outputs": []
+        },
+        "1": {
+            "annotation": "",
+            "id": 1,
+            "input_connections": {
+                "fasta_file": {
+                    "id": 0,
+                    "output_name": "output"
+                }
+            },
+            "inputs": [
+                {
+                    "description": "runtime parameter for tool SignalP 3.0",
+                    "name": "organism"
+                }
+            ],
+            "name": "SignalP 3.0",
+            "outputs": [
+                {
+                    "name": "tabular_file",
+                    "type": "tabular"
+                }
+            ],
+            "position": {
+                "left": 240,
+                "top": 341
+            },
+            "post_job_actions": {
+                "HideDatasetActiontabular_file": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "tabular_file"
+                }
+            },
+            "tool_errors": null,
+            "tool_id": "signalp3",
+            "tool_state": "{\"__page__\": 0, \"truncate\": \"\\\"60\\\"\", \"chromInfo\": \"\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"fasta_file\": \"null\", \"organism\": \"{\\\"__class__\\\": \\\"RuntimeValue\\\"}\", \"__rerun_remap_job_id__\": null}",
+            "tool_version": "0.0.12",
+            "type": "tool",
+            "user_outputs": []
+        },
+        "2": {
+            "annotation": "Select proteins with predicted signal peptide (SignalP NN D-Score or HMM)",
+            "id": 2,
+            "input_connections": {
+                "input": {
+                    "id": 1,
+                    "output_name": "tabular_file"
+                }
+            },
+            "inputs": [],
+            "name": "Filter",
+            "outputs": [
+                {
+                    "name": "out_file1",
+                    "type": "input"
+                }
+            ],
+            "position": {
+                "left": 323,
+                "top": 528
+            },
+            "post_job_actions": {
+                "HideDatasetActionout_file1": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "out_file1"
+                },
+                "RenameDatasetActionout_file1": {
+                    "action_arguments": {
+                        "newname": "Filtered SignalP results"
+                    },
+                    "action_type": "RenameDatasetAction",
+                    "output_name": "out_file1"
+                }
+            },
+            "tool_errors": null,
+            "tool_id": "Filter1",
+            "tool_state": "{\"__page__\": 0, \"__rerun_remap_job_id__\": null, \"cond\": \"\\\"c14=='Y' or c15=='S'\\\"\", \"input\": \"null\", \"header_lines\": \"\\\"0\\\"\", \"chromInfo\": \"\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\"}",
+            "tool_version": "1.1.0",
+            "type": "tool",
+            "user_outputs": []
+        },
+        "3": {
+            "annotation": "Select those sequences with signal peptides.",
+            "id": 3,
+            "input_connections": {
+                "input_file": {
+                    "id": 0,
+                    "output_name": "output"
+                },
+                "input_tabular": {
+                    "id": 2,
+                    "output_name": "out_file1"
+                }
+            },
+            "inputs": [],
+            "name": "Filter sequences by ID",
+            "outputs": [
+                {
+                    "name": "output_pos",
+                    "type": "fasta"
+                },
+                {
+                    "name": "output_neg",
+                    "type": "fasta"
+                }
+            ],
+            "position": {
+                "left": 527,
+                "top": 200
+            },
+            "post_job_actions": {
+                "HideDatasetActionoutput_neg": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "output_neg"
+                },
+                "HideDatasetActionoutput_pos": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "output_pos"
+                }
+            },
+            "tool_errors": null,
+            "tool_id": "seq_filter_by_id",
+            "tool_state": "{\"__page__\": 0, \"output_choice_cond\": \"{\\\"output_choice\\\": \\\"pos\\\", \\\"__current_case__\\\": 1}\", \"input_file\": \"null\", \"__rerun_remap_job_id__\": null, \"input_tabular\": \"null\", \"chromInfo\": \"\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"columns\": \"{\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": [\\\"1\\\"]}\"}",
+            "tool_version": "0.0.5",
+            "type": "tool",
+            "user_outputs": []
+        },
+        "4": {
+            "annotation": "",
+            "id": 4,
+            "input_connections": {
+                "fasta_file": {
+                    "id": 3,
+                    "output_name": "output_pos"
+                }
+            },
+            "inputs": [],
+            "name": "TMHMM 2.0",
+            "outputs": [
+                {
+                    "name": "tabular_file",
+                    "type": "tabular"
+                }
+            ],
+            "position": {
+                "left": 643,
+                "top": 443
+            },
+            "post_job_actions": {
+                "HideDatasetActiontabular_file": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "tabular_file"
+                }
+            },
+            "tool_errors": null,
+            "tool_id": "tmhmm2",
+            "tool_state": "{\"__page__\": 0, \"fasta_file\": \"null\", \"chromInfo\": \"\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"__rerun_remap_job_id__\": null}",
+            "tool_version": "0.0.11",
+            "type": "tool",
+            "user_outputs": []
+        },
+        "5": {
+            "annotation": "Select proteins with no predicted transmembrane helices.",
+            "id": 5,
+            "input_connections": {
+                "input": {
+                    "id": 4,
+                    "output_name": "tabular_file"
+                }
+            },
+            "inputs": [],
+            "name": "Filter",
+            "outputs": [
+                {
+                    "name": "out_file1",
+                    "type": "input"
+                }
+            ],
+            "position": {
+                "left": 729,
+                "top": 566
+            },
+            "post_job_actions": {
+                "HideDatasetActionout_file1": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "out_file1"
+                },
+                "RenameDatasetActionout_file1": {
+                    "action_arguments": {
+                        "newname": "Filtered TMHMM results"
+                    },
+                    "action_type": "RenameDatasetAction",
+                    "output_name": "out_file1"
+                }
+            },
+            "tool_errors": null,
+            "tool_id": "Filter1",
+            "tool_state": "{\"__page__\": 0, \"__rerun_remap_job_id__\": null, \"cond\": \"\\\"c5== 0\\\"\", \"input\": \"null\", \"header_lines\": \"\\\"0\\\"\", \"chromInfo\": \"\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\"}",
+            "tool_version": "1.1.0",
+            "type": "tool",
+            "user_outputs": []
+        },
+        "6": {
+            "annotation": "Select those sequences with no transmembrane helices (from those with signal peptides).",
+            "id": 6,
+            "input_connections": {
+                "input_file": {
+                    "id": 3,
+                    "output_name": "output_pos"
+                },
+                "input_tabular": {
+                    "id": 5,
+                    "output_name": "out_file1"
+                }
+            },
+            "inputs": [],
+            "name": "Filter sequences by ID",
+            "outputs": [
+                {
+                    "name": "output_pos",
+                    "type": "fasta"
+                },
+                {
+                    "name": "output_neg",
+                    "type": "fasta"
+                }
+            ],
+            "position": {
+                "left": 893,
+                "top": 281
+            },
+            "post_job_actions": {
+                "HideDatasetActionoutput_neg": {
+                    "action_arguments": {},
+                    "action_type": "HideDatasetAction",
+                    "output_name": "output_neg"
+                },
+                "RenameDatasetActionoutput_pos": {
+                    "action_arguments": {
+                        "newname": "Secreted proteins"
+                    },
+                    "action_type": "RenameDatasetAction",
+                    "output_name": "output_pos"
+                }
+            },
+            "tool_errors": null,
+            "tool_id": "seq_filter_by_id",
+            "tool_state": "{\"__page__\": 0, \"output_choice_cond\": \"{\\\"output_choice\\\": \\\"pos\\\", \\\"__current_case__\\\": 1}\", \"input_file\": \"null\", \"__rerun_remap_job_id__\": null, \"input_tabular\": \"null\", \"chromInfo\": \"\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\"\", \"columns\": \"{\\\"__class__\\\": \\\"UnvalidatedValue\\\", \\\"value\\\": [\\\"1\\\"]}\"}",
+            "tool_version": "0.0.5",
+            "type": "tool",
+            "user_outputs": []
+        }
+    }
+}
\ No newline at end of file