# HG changeset patch # User peterjc # Date 1416575856 18000 # Node ID 41a42022f815e62bf7609a0db63623884f04e012 # Parent ee10017fcd80cd138c2ba225887a21b54a34930b Uploaded v0.2.6, embedded citations diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/README.rst --- a/tools/protein_analysis/README.rst Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/README.rst Fri Nov 21 08:17:36 2014 -0500 @@ -41,23 +41,23 @@ First install those command line tools you wish to use the wrappers for: -1. Install the command line version of SignalP 3.0 and ensure "signalp" is +1. Install the command line version of SignalP 3.0 and ensure ``signalp`` is on the PATH, see: http://www.cbs.dtu.dk/services/SignalP/ -2. Install the command line version of TMHMM 2.0 and ensure "tmhmm" is on +2. Install the command line version of TMHMM 2.0 and ensure ``tmhmm`` is on the PATH, see: http://www.cbs.dtu.dk/services/TMHMM/ -3. Install the command line version of Promoter 2.0 and ensure "promoter" is +3. Install the command line version of Promoter 2.0 and ensure ``promoter`` is on the PATH, see: http://www.cbs.dtu.dk/services/Promoter -4. Install the WoLF PSORT v0.2 package, and ensure "runWolfPsortSummary" +4. Install the WoLF PSORT v0.2 package, and ensure ``runWolfPsortSummary`` is on the PATH (we use an extra wrapper script to change to the WoLF PSORT directory, run runWolfPsortSummary, and then change back to the original directory), see: http://wolfpsort.org/WoLFPSORT_package/version0.2/ 5. Install hmmsearch from HMMER 2.3.2 (the last stable release of HMMER 2) - but put it on the path under the name hmmsearch2 (allowing it to co-exist - with HMMER 3), or edit rlxr_motif.py accordingly. + but put it on the path under the name ``hmmsearch2`` (allowing it to + co-exist with HMMER 3), or edit ``rlxr_motif.py`` accordingly. Verify each of the tools is installed and working from the command line (when logged in as the Galaxy user if appropriate). @@ -66,37 +66,36 @@ Manual Installation =================== -1. Create a folder tools/protein_analysis under your Galaxy installation. +1. Create a folder ``tools/protein_analysis`` under your Galaxy installation. This folder name is not critical, and can be changed if desired - you - must update the paths used in tool_conf.xml to match. + must update the paths used in ``tool_conf.xml`` to match. 2. Copy/move the following files (from this archive) there: - * tmhmm2.xml (Galaxy tool definition) - * tmhmm2.py (Python wrapper script) + * ``tmhmm2.xml`` (Galaxy tool definition) + * ``tmhmm2.py`` (Python wrapper script) - * signalp3.xml (Galaxy tool definition) - * signalp3.py (Python wrapper script) + * ``signalp3.xml`` (Galaxy tool definition) + * ``signalp3.py`` (Python wrapper script) - * promoter2.xml (Galaxy tool definition) - * promoter2.py (Python wrapper script) + * ``promoter2.xml`` (Galaxy tool definition) + * ``promoter2.py`` (Python wrapper script) - * psortb.xml (Galaxy tool definition) - * psortb.py (Python wrapper script) + * ``psortb.xml`` (Galaxy tool definition) + * ``psortb.py`` (Python wrapper script) - * wolf_psort.xml (Galaxy tool definition) - * wolf_psort.py (Python wrapper script) + * ``wolf_psort.xml`` (Galaxy tool definition) + * ``wolf_psort.py`` (Python wrapper script) - * rxlr_motifs.xml (Galaxy tool definition) - * rxlr_motifs.py (Python script) + * ``rxlr_motifs.xml`` (Galaxy tool definition) + * ``rxlr_motifs.py`` (Python script) - * seq_analysis_utils.py (shared Python code) - * LICENCE - * README.rst (this file) + * ``seq_analysis_utils.py`` (shared Python code) + * ``LICENCE`` + * ``README.rst`` (this file) -3. Edit your Galaxy conjuration file tool_conf.xml (to use the tools) AND - also tool_conf.xml.sample (to run the tests) to include the new tools - by adding:: +3. Edit your Galaxy conjuration file ``tool_conf.xml`` to include the + new tools by adding::
@@ -111,22 +110,24 @@ Leave out the lines for any tools you do not wish to use in Galaxy. -4. Copy/move the test-data files (from this archive) to Galaxy's - subfolder test-data. +4. Copy/move the ``test-data/*`` files (from this archive) to Galaxy's + subfolder ``test-data/``. 5. Run the Galaxy functional tests for these new wrappers with:: - ./run_functional_tests.sh -id tmhmm2 - ./run_functional_tests.sh -id signalp3 - ./run_functional_tests.sh -id Psortb - ./run_functional_tests.sh -id rxlr_motifs + $ ./run_tests.sh -id tmhmm2 + $ ./run_tests.sh -id signalp3 + $ ./run_tests.sh -id Psortb + $ ./run_tests.sh -id rxlr_motifs - Alternatively, this should work (assuming you left the name and id as shown in - the XML file tool_conf.xml.sample):: + Alternatively, this should work (assuming you left the seciont name and id + as shown above in your XML file ``tool_conf.xml``):: - ./run_functional_tests.sh -sid Protein_sequence_analysis-protein_analysis + $ ./run_tests.sh -sid Protein_sequence_analysis-protein_analysis - To check the section ID expected, use ./run_functional_tests.sh -list + To check the section ID expected, use: + + $ ./run_tests.sh -list 6. Restart Galaxy and check the new tools are shown and work. @@ -139,7 +140,7 @@ ------- ---------------------------------------------------------------------- v0.0.1 - Initial release v0.0.2 - Corrected some typos in the help text - - Renamed test output file to use Galaxy convention of *.tabular + - Renamed test output file to use Galaxy convention of ``*.tabular`` v0.0.3 - Check for tmhmm2 silent failures (no output) - Additional unit tests v0.0.4 - Ignore comment lines in tmhmm2 output. @@ -150,11 +151,11 @@ v0.0.8 - Added WoLF PSORT wrapper to the suite. v0.0.9 - Added our RXLR motifs tool to the suite. v0.1.0 - Added Promoter 2.0 wrapper (similar to SignalP & TMHMM wrappers) - - Support Galaxy's tag for SignalP, TMHMM & Promoter + - Support Galaxy's ```` tag for SignalP, TMHMM & Promoter v0.1.1 - Fixed an error in the header of the tabular output from Promoter v0.1.2 - Use the new settings in the XML wrappers to catch errors - - Use SGE style $NSLOTS for thread count (otherwise default to 4) -v0.1.3 - Added missing file whisson_et_al_rxlr_eer_cropped.hmm to Tool Shed + - Use SGE style ``$NSLOTS`` for thread count (otherwise default to 4) +v0.1.3 - Added missing file ``whisson_et_al_rxlr_eer_cropped.hmm`` to Tool Shed v0.2.0 - Added PSORTb wrapper to the suite, based on earlier work contributed by Konrad Paszkiewicz. v0.2.1 - Use a script to create the Tool Shed tar-ball (removed some stray @@ -170,13 +171,16 @@ - Adopted standard MIT licence. - Use reStructuredText for this README file. - Development moved to GitHub, https://github.com/peterjc/pico_galaxy +v0.2.6 - Use the new ``$GALAXY_SLOTS`` environment variable for thread count. + - Updated the ``suite_config.xml`` file (overdue). + - Tool definition now embeds citation information. ======= ====================================================================== Developers ========== -This script and other tools are being developed on the following hg branches: +This script and other tools were initially developed on the following hg branches: http://bitbucket.org/peterjc/galaxy-central/src/seq_analysis http://bitbucket.org/peterjc/galaxy-central/src/tools diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/promoter2.xml --- a/tools/protein_analysis/promoter2.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/promoter2.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,13 +1,10 @@ - + Find eukaryotic PolII promoters in DNA sequences - promoter2.py "\$NSLOTS" $fasta_file $tabular_file - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + promoter2.py "\$GALAXY_SLOTS" "$fasta_file" "$tabular_file" ##If the environment variable isn't set, get "", and the python wrapper ##defaults to four threads. @@ -85,4 +82,8 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + + 10.7717/peerj.167 + 10.1093/bioinformatics/15.5.356 + diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/psortb.xml --- a/tools/protein_analysis/psortb.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/psortb.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,14 +1,11 @@ - + Determines sub-cellular localisation of bacterial/archaeal protein sequences psortb.py --version - psortb.py "\$NSLOTS" "$type" "$long" "$cutoff" "$divergent" "$sequence" "$outfile" - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + psortb.py "\$GALAXY_SLOTS" "$type" "$long" "$cutoff" "$divergent" "$sequence" "$outfile" ##If the environment variable isn't set, get "", and python wrapper ##defaults to four threads. @@ -19,9 +16,9 @@ + label="Input sequences for which to predict localisation (protein FASTA format)" /> + label="Organism type (N.B. all sequences in the above file must be of the same type)" > @@ -34,11 +31,11 @@ + label="Sets a cutoff value for reported results (e.g. 7.5)" + help="Leave blank or use zero for no cutoff." /> + label="Sets a cutoff value for the multiple localization flag (e.g. 4.5)" + help="Leave blank or use zero for no cutoff." /> @@ -102,5 +99,9 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + + 10.7717/peerj.167 + 10.1093/bioinformatics/btq249 + diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/rxlr_motifs.xml --- a/tools/protein_analysis/rxlr_motifs.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/rxlr_motifs.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,8 +1,7 @@ - + Find RXLR Effectors of Plant Pathogenic Oomycetes - rxlr_motifs.py $fasta_file 8 $model $tabular_file - ##I want the number of threads to be a Galaxy config option... + rxlr_motifs.py "$fasta_file" "\$GALAXY_SLOTS" $model "$tabular_file" @@ -176,4 +175,14 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + + 10.7717/peerj.167 + + 10.1038/nature06203 + 10.1105/tpc.107.051037 + 10.1371/journal.ppat.0020050 + 10.1101/gr.910003 + 10.1093/bioinformatics/14.9.755 + 10.1093/protein/10.1.1 + diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/seq_analysis_utils.py --- a/tools/protein_analysis/seq_analysis_utils.py Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/seq_analysis_utils.py Fri Nov 21 08:17:36 2014 -0500 @@ -91,6 +91,7 @@ #between records (starting with hash). pass else: + handle.close() raise ValueError("Bad FASTA line %r" % line) handle.close() if title: diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/signalp3.xml --- a/tools/protein_analysis/signalp3.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/signalp3.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,12 +1,10 @@ - + Find signal peptides in protein sequences - signalp3.py $organism $truncate "\$NSLOTS" $fasta_file $tabular_file - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + signalp3.py $organism $truncate "\$GALAXY_SLOTS" $fasta_file $tabular_file ##If the environment variable isn't set, get "", and the python wrapper ##defaults to four threads. @@ -197,4 +195,10 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + + 10.7717/peerj.167 + 10.1016/j.jmb.2004.05.028 + 10.1093/protein/10.1.1 + + diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/suite_config.xml --- a/tools/protein_analysis/suite_config.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/suite_config.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,15 +1,21 @@ - + TMHMM, SignalP, RXLR motifs, WoLF PSORT - + Find transmembrane domains in protein sequences - + Find signal peptides in protein sequences - + + Find eukaryotic PolII promoters in DNA sequences + + + Bacteria/archaea protein subcellular localization prediction + + Eukaryote protein subcellular localization prediction - + Find RXLR Effectors of Plant Pathogenic Oomycetes diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/tmhmm2.xml --- a/tools/protein_analysis/tmhmm2.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/tmhmm2.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,13 +1,10 @@ - + Find transmembrane domains in protein sequences - tmhmm2.py "\$NSLOTS" $fasta_file $tabular_file - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + tmhmm2.py "\$GALAXY_SLOTS" $fasta_file $tabular_file ##If the environment variable isn't set, get "", and the python wrapper ##defaults to four threads. @@ -119,4 +116,9 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + + 10.7717/peerj.167 + 10.1006/jmbi.2000.4315 + + diff -r ee10017fcd80 -r 41a42022f815 tools/protein_analysis/wolf_psort.xml --- a/tools/protein_analysis/wolf_psort.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/wolf_psort.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,10 +1,7 @@ - + Eukaryote protein subcellular localization prediction - wolf_psort.py $organism "\$NSLOTS" "$fasta_file" "$tabular_file" - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + wolf_psort.py $organism "\$GALAXY_SLOTS" "$fasta_file" "$tabular_file" ##If the environment variable isn't set, get "", and python wrapper ##defaults to four threads. @@ -150,4 +147,8 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + + 10.7717/peerj.167 + 10.1093/nar/gkm259 +