Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/rxlr_motifs.py @ 34:7a2e20baacee draft default tip
"v0.2.13 - Python 3 fix for raising StopIteration"
| author | peterjc | 
|---|---|
| date | Thu, 17 Jun 2021 17:58:23 +0000 | 
| parents | 20da7f48b56f | 
| children | 
| rev | line source | 
|---|---|
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 1 #!/usr/bin/env python | 
| 32 | 2 """Implements assorted RXLR motif methods from the literature. | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 3 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 4 This script takes exactly four command line arguments: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 5 * Protein FASTA filename | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 6 * Number of threads | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 7 * Model name (Bhattacharjee2006, Win2007, Whisson2007) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 8 * Output tabular filename | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 9 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 10 The model names are: | 
| 32 | 11 * Bhattacharjee2006: Simple regular expression search for RXLR | 
| 12 with additional requirements for positioning and signal peptide. | |
| 13 * Win2007: Simple regular expression search for RXLR, but with | |
| 14 different positional requirements. | |
| 15 * Whisson2007: As Bhattacharjee2006 but with a more complex regular | |
| 16 expression to look for RXLR-EER domain, and additionally calls HMMER. | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 17 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 18 See the help text in the accompanying Galaxy tool XML file for more | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 19 details including the full references. | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 20 | 
| 32 | 21 Note | 
| 22 ---- | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 23 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0, | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 24 which is no longer available. The current release is SignalP v3.0 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 25 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 26 the predicted cleavage site, as this is expected to be more accurate. | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 27 Also note that the HMM score values have changed from v2.0 to v3.0. | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 28 Whisson et al. (2007) used SignalP v3.0 anyway. | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 29 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 30 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model | 
| 30 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 31 can still be used with hmmsearch from HMMER 3, sadly this does give | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 32 slightly different results. We expect the hmmsearch from HMMER 2.3.2 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 33 (the last stable release of HMMER 2) to be present on the path under | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 34 the name hmmsearch2 (allowing it to co-exist with HMMER 3). | 
| 30 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 35 | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 36 If using Conda, you should therefore install the special "hmmer2" | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 37 package from BioConda which provides "hmmsearch2" etc:: | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 38 | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 39 conda install -c bioconda hmmer2 | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 40 | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 41 See https://bioconda.github.io/recipes/hmmer2/README.html and | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 42 https://anaconda.org/bioconda/hmmer2 | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 43 """ | 
| 30 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 44 | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 45 from __future__ import print_function | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 46 | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 47 import os | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 48 import re | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 49 import subprocess | 
| 30 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 50 import sys | 
| 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 51 | 
| 29 | 52 from seq_analysis_utils import fasta_iterator | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 53 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 54 if "-v" in sys.argv: | 
| 34 | 55 print("RXLR Motifs v0.0.17") | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 56 sys.exit(0) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 57 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 58 if len(sys.argv) != 5: | 
| 32 | 59 sys.exit( | 
| 60 "Requires four arguments: protein FASTA filename, threads, " | |
| 61 "model, and output filename" | |
| 62 ) | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 63 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 64 fasta_file, threads, model, tabular_file = sys.argv[1:] | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 65 hmm_output_file = tabular_file + ".hmm.tmp" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 66 signalp_input_file = tabular_file + ".fasta.tmp" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 67 signalp_output_file = tabular_file + ".tabular.tmp" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 68 min_signalp_hmm = 0.9 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 69 hmmer_search = "hmmsearch2" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 70 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 71 if model == "Bhattacharjee2006": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 72 signalp_trunc = 70 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 73 re_rxlr = re.compile("R.LR") | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 74 min_sp = 10 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 75 max_sp = 40 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 76 max_sp_rxlr = 100 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 77 min_rxlr_start = 1 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 78 # Allow signal peptide to be at most 40aa, and want RXLR to be | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 79 # within 100aa, therefore for the prescreen the max start is 140: | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 80 max_rxlr_start = max_sp + max_sp_rxlr | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 81 elif model == "Win2007": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 82 signalp_trunc = 70 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 83 re_rxlr = re.compile("R.LR") | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 84 min_sp = 10 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 85 max_sp = 40 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 86 min_rxlr_start = 30 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 87 max_rxlr_start = 60 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 88 # No explicit limit on separation of signal peptide clevage | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 89 # and RXLR, but shortest signal peptide is 10, and furthest | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 90 # away RXLR is 60, so effectively limit is 50. | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 91 max_sp_rxlr = max_rxlr_start - min_sp + 1 | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 92 elif model == "Whisson2007": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 93 signalp_trunc = 0 # zero for no truncation | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 94 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]") | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 95 min_sp = 10 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 96 max_sp = 40 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 97 max_sp_rxlr = 100 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 98 min_rxlr_start = 1 | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 99 max_rxlr_start = max_sp + max_sp_rxlr | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 100 else: | 
| 32 | 101 sys.exit( | 
| 102 "Did not recognise the model name %r\n" | |
| 103 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model | |
| 104 ) | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 105 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 106 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 107 def get_hmmer_version(exe, required=None): | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 108 try: | 
| 32 | 109 child = subprocess.Popen( | 
| 110 [exe, "-h"], | |
| 111 universal_newlines=True, | |
| 112 stdout=subprocess.PIPE, | |
| 113 stderr=subprocess.PIPE, | |
| 114 ) | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 115 except OSError: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 116 raise ValueError("Could not run %s" % exe) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 117 stdout, stderr = child.communicate() | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 118 if required: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 119 return required in stdout | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 120 elif "HMMER 2" in stdout: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 121 return 2 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 122 elif "HMMER 3" in stdout: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 123 return 3 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 124 else: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 125 raise ValueError("Could not determine version of %s" % exe) | 
| 29 | 126 | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 127 | 
| 29 | 128 # Run hmmsearch for Whisson et al. (2007) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 129 if model == "Whisson2007": | 
| 32 | 130 hmm_file = os.path.join( | 
| 131 os.path.split(sys.argv[0])[0], "whisson_et_al_rxlr_eer_cropped.hmm" | |
| 132 ) | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 133 if not os.path.isfile(hmm_file): | 
| 29 | 134 sys.exit("Missing HMM file for Whisson et al. (2007)") | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 135 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"): | 
| 29 | 136 sys.exit("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_search) | 
| 18 | 137 | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 138 hmm_hits = set() | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 139 valid_ids = set() | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 140 for title, seq in fasta_iterator(fasta_file): | 
| 29 | 141 name = title.split(None, 1)[0] | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 142 if name in valid_ids: | 
| 29 | 143 sys.exit("Duplicated identifier %r" % name) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 144 else: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 145 valid_ids.add(name) | 
| 18 | 146 if not valid_ids: | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 147 # Special case, don't need to run HMMER if there are no sequences | 
| 18 | 148 pass | 
| 149 else: | |
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 150 # I've left the code to handle HMMER 3 in situ, in case | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 151 # we revisit the choice to insist on HMMER 2. | 
| 32 | 152 hmmer3 = 3 == get_hmmer_version(hmmer_search) | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 153 # Using zero (or 5.6?) for bitscore threshold | 
| 18 | 154 if hmmer3: | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 155 # The HMMER3 table output is easy to parse | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 156 # In HMMER3 can't use both -T and -E | 
| 32 | 157 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" % ( | 
| 158 hmmer_search, | |
| 159 hmm_output_file, | |
| 160 hmm_file, | |
| 161 fasta_file, | |
| 162 ) | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 163 else: | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 164 # For HMMER2 we are stuck with parsing stdout | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 165 # Put 1e6 to effectively have no expectation threshold (otherwise | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 166 # HMMER defaults to 10 and the calculated e-value depends on the | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 167 # input FASTA file, and we can loose hits of interest). | 
| 32 | 168 cmd = "%s -T 0 -E 1e6 %s %s > %s" % ( | 
| 169 hmmer_search, | |
| 170 hmm_file, | |
| 171 fasta_file, | |
| 172 hmm_output_file, | |
| 173 ) | |
| 18 | 174 return_code = os.system(cmd) | 
| 175 if return_code: | |
| 32 | 176 sys.stderr.write("Error %i from hmmsearch:\n%s\n" % (return_code, cmd)) | 
| 177 sys.exit(return_code) | |
| 18 | 178 | 
| 179 handle = open(hmm_output_file) | |
| 180 for line in handle: | |
| 181 if not line.strip(): | |
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 182 # We expect blank lines in the HMMER2 stdout | 
| 18 | 183 continue | 
| 184 elif line.startswith("#"): | |
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 185 # Header | 
| 18 | 186 continue | 
| 187 else: | |
| 29 | 188 name = line.split(None, 1)[0] | 
| 189 # Should be a sequence name in the HMMER3 table output. | |
| 190 # Could be anything in the HMMER2 stdout. | |
| 18 | 191 if name in valid_ids: | 
| 192 hmm_hits.add(name) | |
| 193 elif hmmer3: | |
| 29 | 194 sys.exit("Unexpected identifer %r in hmmsearch output" % name) | 
| 18 | 195 handle.close() | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 196 # if hmmer3: | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 197 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 198 # else: | 
| 29 | 199 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 200 # print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids)) | 
| 18 | 201 os.remove(hmm_output_file) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 202 del valid_ids | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 203 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 204 | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 205 # Prepare short list of candidates containing RXLR to pass to SignalP | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 206 assert min_rxlr_start > 0, "Min value one, since zero based counting" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 207 count = 0 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 208 total = 0 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 209 handle = open(signalp_input_file, "w") | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 210 for title, seq in fasta_iterator(fasta_file): | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 211 total += 1 | 
| 29 | 212 name = title.split(None, 1)[0] | 
| 32 | 213 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper()) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 214 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start: | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 215 # This is a potential RXLR, depending on the SignalP results. | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 216 # Might as well truncate the sequence now, makes the temp file smaller | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 217 if signalp_trunc: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 218 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc])) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 219 else: | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 220 # Does it matter we don't line wrap? | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 221 handle.write(">%s\n%s\n" % (name, seq)) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 222 count += 1 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 223 handle.close() | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 224 # print "Running SignalP on %i/%i potentials." % (count, total) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 225 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 226 | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 227 # Run SignalP (using our wrapper script to get multi-core support etc) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 228 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py") | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 229 if not os.path.isfile(signalp_script): | 
| 29 | 230 sys.exit("Error - missing signalp3.py script") | 
| 32 | 231 cmd = "python '%s' 'euk' '%i' '%s' '%s' '%s'" % ( | 
| 232 signalp_script, | |
| 233 signalp_trunc, | |
| 234 threads, | |
| 235 signalp_input_file, | |
| 236 signalp_output_file, | |
| 237 ) | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 238 return_code = os.system(cmd) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 239 if return_code: | 
| 29 | 240 sys.exit("Error %i from SignalP:\n%s" % (return_code, cmd)) | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 241 # print "SignalP done" | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 242 | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 243 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 244 def parse_signalp(filename): | 
| 32 | 245 """Parse SignalP output, yield tuples of values. | 
| 246 | |
| 247 Returns tuples of ID, HMM_Sprob_score and NN predicted signal | |
| 248 peptide length. | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 249 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 250 For signal peptide length we use NN_Ymax_pos (minus one). | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 251 """ | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 252 handle = open(filename) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 253 line = handle.readline() | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 254 assert line.startswith("#ID\t"), line | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 255 for line in handle: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 256 parts = line.rstrip("\t").split("\t") | 
| 29 | 257 assert len(parts) == 20, repr(line) | 
| 258 yield parts[0], float(parts[18]), int(parts[5]) - 1 | |
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 259 handle.close() | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 260 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 261 | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 262 # Parse SignalP results and apply the strict RXLR criteria | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 263 total = 0 | 
| 32 | 264 tally = {} | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 265 handle = open(tabular_file, "w") | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 266 handle.write("#ID\t%s\n" % model) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 267 signalp_results = parse_signalp(signalp_output_file) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 268 for title, seq in fasta_iterator(fasta_file): | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 269 total += 1 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 270 rxlr = "N" | 
| 29 | 271 name = title.split(None, 1)[0] | 
| 32 | 272 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper()) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 273 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 274 del match | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 275 # This was the criteria for calling SignalP, | 
| 29 | 276 # so it will be in the SignalP results. | 
| 32 | 277 sp_id, sp_hmm_score, sp_nn_len = next(signalp_results) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 278 assert name == sp_id, "%s vs %s" % (name, sp_id) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 279 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 280 match = re_rxlr.search(seq[sp_nn_len:].upper()) | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 281 if match and match.start() + 1 <= max_sp_rxlr: # 1-based counting | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 282 rxlr_start = sp_nn_len + match.start() + 1 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 283 if min_rxlr_start <= rxlr_start <= max_rxlr_start: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 284 rxlr = "Y" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 285 if model == "Whisson2007": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 286 # Combine the signalp with regular expression heuristic and the HMM | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 287 if name in hmm_hits and rxlr == "N": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 288 rxlr = "hmm" # HMM only | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 289 elif rxlr == "N": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 290 rxlr = "neither" # Don't use N (no) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 291 elif name not in hmm_hits and rxlr == "Y": | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 292 rxlr = "re" # Heuristic only | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 293 # Now have a four way classifier: Y, hmm, re, neither | 
| 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 294 # and count is the number of Y results (both HMM and heuristic) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 295 handle.write("%s\t%s\n" % (name, rxlr)) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 296 try: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 297 tally[rxlr] += 1 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 298 except KeyError: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 299 tally[rxlr] = 1 | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 300 handle.close() | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 301 assert sum(tally.values()) == total | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 302 | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 303 # Check the iterator is finished | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 304 try: | 
| 32 | 305 next(signalp_results) | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 306 assert False, "Unexpected data in SignalP output" | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 307 except StopIteration: | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 308 pass | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 309 | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 310 # Cleanup | 
| 6 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 311 os.remove(signalp_input_file) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 312 os.remove(signalp_output_file) | 
| 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 peterjc parents: diff
changeset | 313 | 
| 26 
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
 peterjc parents: 
18diff
changeset | 314 # Short summary to stdout for Galaxy's info display | 
| 30 
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
 peterjc parents: 
29diff
changeset | 315 print("%s for %i sequences:" % (model, total)) | 
| 32 | 316 print(", ".join("%s = %i" % kv for kv in sorted(tally.items()))) | 
