Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/rxlr_motifs.py @ 34:7a2e20baacee draft default tip
"v0.2.13 - Python 3 fix for raising StopIteration"
author | peterjc |
---|---|
date | Thu, 17 Jun 2021 17:58:23 +0000 |
parents | 20da7f48b56f |
children |
rev | line source |
---|---|
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
32 | 2 """Implements assorted RXLR motif methods from the literature. |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
4 This script takes exactly four command line arguments: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 * Protein FASTA filename |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 * Number of threads |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 * Output tabular filename |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 The model names are: |
32 | 11 * Bhattacharjee2006: Simple regular expression search for RXLR |
12 with additional requirements for positioning and signal peptide. | |
13 * Win2007: Simple regular expression search for RXLR, but with | |
14 different positional requirements. | |
15 * Whisson2007: As Bhattacharjee2006 but with a more complex regular | |
16 expression to look for RXLR-EER domain, and additionally calls HMMER. | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 See the help text in the accompanying Galaxy tool XML file for more |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 details including the full references. |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 |
32 | 21 Note |
22 ---- | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0, |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
24 which is no longer available. The current release is SignalP v3.0 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 the predicted cleavage site, as this is expected to be more accurate. |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 Also note that the HMM score values have changed from v2.0 to v3.0. |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 Whisson et al. (2007) used SignalP v3.0 anyway. |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
31 can still be used with hmmsearch from HMMER 3, sadly this does give |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 slightly different results. We expect the hmmsearch from HMMER 2.3.2 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 (the last stable release of HMMER 2) to be present on the path under |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 the name hmmsearch2 (allowing it to co-exist with HMMER 3). |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
35 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
36 If using Conda, you should therefore install the special "hmmer2" |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
37 package from BioConda which provides "hmmsearch2" etc:: |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
38 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
39 conda install -c bioconda hmmer2 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
40 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
41 See https://bioconda.github.io/recipes/hmmer2/README.html and |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
42 https://anaconda.org/bioconda/hmmer2 |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 """ |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
44 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
45 from __future__ import print_function |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
46 |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 import os |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
48 import re |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
49 import subprocess |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
50 import sys |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
51 |
29 | 52 from seq_analysis_utils import fasta_iterator |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
53 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
54 if "-v" in sys.argv: |
34 | 55 print("RXLR Motifs v0.0.17") |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
56 sys.exit(0) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
58 if len(sys.argv) != 5: |
32 | 59 sys.exit( |
60 "Requires four arguments: protein FASTA filename, threads, " | |
61 "model, and output filename" | |
62 ) | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
63 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
64 fasta_file, threads, model, tabular_file = sys.argv[1:] |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
65 hmm_output_file = tabular_file + ".hmm.tmp" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 signalp_input_file = tabular_file + ".fasta.tmp" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 signalp_output_file = tabular_file + ".tabular.tmp" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 min_signalp_hmm = 0.9 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
69 hmmer_search = "hmmsearch2" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
70 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
71 if model == "Bhattacharjee2006": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
72 signalp_trunc = 70 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
73 re_rxlr = re.compile("R.LR") |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
74 min_sp = 10 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
75 max_sp = 40 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
76 max_sp_rxlr = 100 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
77 min_rxlr_start = 1 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
78 # Allow signal peptide to be at most 40aa, and want RXLR to be |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
79 # within 100aa, therefore for the prescreen the max start is 140: |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
80 max_rxlr_start = max_sp + max_sp_rxlr |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 elif model == "Win2007": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
82 signalp_trunc = 70 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
83 re_rxlr = re.compile("R.LR") |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
84 min_sp = 10 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
85 max_sp = 40 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
86 min_rxlr_start = 30 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
87 max_rxlr_start = 60 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
88 # No explicit limit on separation of signal peptide clevage |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
89 # and RXLR, but shortest signal peptide is 10, and furthest |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
90 # away RXLR is 60, so effectively limit is 50. |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
91 max_sp_rxlr = max_rxlr_start - min_sp + 1 |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
92 elif model == "Whisson2007": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
93 signalp_trunc = 0 # zero for no truncation |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
94 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]") |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
95 min_sp = 10 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
96 max_sp = 40 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
97 max_sp_rxlr = 100 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
98 min_rxlr_start = 1 |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
99 max_rxlr_start = max_sp + max_sp_rxlr |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 else: |
32 | 101 sys.exit( |
102 "Did not recognise the model name %r\n" | |
103 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model | |
104 ) | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
107 def get_hmmer_version(exe, required=None): |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
108 try: |
32 | 109 child = subprocess.Popen( |
110 [exe, "-h"], | |
111 universal_newlines=True, | |
112 stdout=subprocess.PIPE, | |
113 stderr=subprocess.PIPE, | |
114 ) | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
115 except OSError: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
116 raise ValueError("Could not run %s" % exe) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
117 stdout, stderr = child.communicate() |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
118 if required: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
119 return required in stdout |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
120 elif "HMMER 2" in stdout: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 return 2 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 elif "HMMER 3" in stdout: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
123 return 3 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
124 else: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
125 raise ValueError("Could not determine version of %s" % exe) |
29 | 126 |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
127 |
29 | 128 # Run hmmsearch for Whisson et al. (2007) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
129 if model == "Whisson2007": |
32 | 130 hmm_file = os.path.join( |
131 os.path.split(sys.argv[0])[0], "whisson_et_al_rxlr_eer_cropped.hmm" | |
132 ) | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
133 if not os.path.isfile(hmm_file): |
29 | 134 sys.exit("Missing HMM file for Whisson et al. (2007)") |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
135 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"): |
29 | 136 sys.exit("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_search) |
18 | 137 |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
138 hmm_hits = set() |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
139 valid_ids = set() |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
140 for title, seq in fasta_iterator(fasta_file): |
29 | 141 name = title.split(None, 1)[0] |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
142 if name in valid_ids: |
29 | 143 sys.exit("Duplicated identifier %r" % name) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
144 else: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
145 valid_ids.add(name) |
18 | 146 if not valid_ids: |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
147 # Special case, don't need to run HMMER if there are no sequences |
18 | 148 pass |
149 else: | |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
150 # I've left the code to handle HMMER 3 in situ, in case |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
151 # we revisit the choice to insist on HMMER 2. |
32 | 152 hmmer3 = 3 == get_hmmer_version(hmmer_search) |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
153 # Using zero (or 5.6?) for bitscore threshold |
18 | 154 if hmmer3: |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
155 # The HMMER3 table output is easy to parse |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
156 # In HMMER3 can't use both -T and -E |
32 | 157 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" % ( |
158 hmmer_search, | |
159 hmm_output_file, | |
160 hmm_file, | |
161 fasta_file, | |
162 ) | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
163 else: |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
164 # For HMMER2 we are stuck with parsing stdout |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
165 # Put 1e6 to effectively have no expectation threshold (otherwise |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
166 # HMMER defaults to 10 and the calculated e-value depends on the |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
167 # input FASTA file, and we can loose hits of interest). |
32 | 168 cmd = "%s -T 0 -E 1e6 %s %s > %s" % ( |
169 hmmer_search, | |
170 hmm_file, | |
171 fasta_file, | |
172 hmm_output_file, | |
173 ) | |
18 | 174 return_code = os.system(cmd) |
175 if return_code: | |
32 | 176 sys.stderr.write("Error %i from hmmsearch:\n%s\n" % (return_code, cmd)) |
177 sys.exit(return_code) | |
18 | 178 |
179 handle = open(hmm_output_file) | |
180 for line in handle: | |
181 if not line.strip(): | |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
182 # We expect blank lines in the HMMER2 stdout |
18 | 183 continue |
184 elif line.startswith("#"): | |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
185 # Header |
18 | 186 continue |
187 else: | |
29 | 188 name = line.split(None, 1)[0] |
189 # Should be a sequence name in the HMMER3 table output. | |
190 # Could be anything in the HMMER2 stdout. | |
18 | 191 if name in valid_ids: |
192 hmm_hits.add(name) | |
193 elif hmmer3: | |
29 | 194 sys.exit("Unexpected identifer %r in hmmsearch output" % name) |
18 | 195 handle.close() |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
196 # if hmmer3: |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
197 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
198 # else: |
29 | 199 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
200 # print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids)) |
18 | 201 os.remove(hmm_output_file) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
202 del valid_ids |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
203 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
204 |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
205 # Prepare short list of candidates containing RXLR to pass to SignalP |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
206 assert min_rxlr_start > 0, "Min value one, since zero based counting" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
207 count = 0 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
208 total = 0 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
209 handle = open(signalp_input_file, "w") |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
210 for title, seq in fasta_iterator(fasta_file): |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
211 total += 1 |
29 | 212 name = title.split(None, 1)[0] |
32 | 213 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper()) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
214 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start: |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
215 # This is a potential RXLR, depending on the SignalP results. |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
216 # Might as well truncate the sequence now, makes the temp file smaller |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
217 if signalp_trunc: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
218 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc])) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
219 else: |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
220 # Does it matter we don't line wrap? |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
221 handle.write(">%s\n%s\n" % (name, seq)) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
222 count += 1 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
223 handle.close() |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
224 # print "Running SignalP on %i/%i potentials." % (count, total) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
225 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
226 |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
227 # Run SignalP (using our wrapper script to get multi-core support etc) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
228 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py") |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
229 if not os.path.isfile(signalp_script): |
29 | 230 sys.exit("Error - missing signalp3.py script") |
32 | 231 cmd = "python '%s' 'euk' '%i' '%s' '%s' '%s'" % ( |
232 signalp_script, | |
233 signalp_trunc, | |
234 threads, | |
235 signalp_input_file, | |
236 signalp_output_file, | |
237 ) | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
238 return_code = os.system(cmd) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
239 if return_code: |
29 | 240 sys.exit("Error %i from SignalP:\n%s" % (return_code, cmd)) |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
241 # print "SignalP done" |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
242 |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
243 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
244 def parse_signalp(filename): |
32 | 245 """Parse SignalP output, yield tuples of values. |
246 | |
247 Returns tuples of ID, HMM_Sprob_score and NN predicted signal | |
248 peptide length. | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
249 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
250 For signal peptide length we use NN_Ymax_pos (minus one). |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
251 """ |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
252 handle = open(filename) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
253 line = handle.readline() |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
254 assert line.startswith("#ID\t"), line |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
255 for line in handle: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
256 parts = line.rstrip("\t").split("\t") |
29 | 257 assert len(parts) == 20, repr(line) |
258 yield parts[0], float(parts[18]), int(parts[5]) - 1 | |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
259 handle.close() |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
260 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
261 |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
262 # Parse SignalP results and apply the strict RXLR criteria |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
263 total = 0 |
32 | 264 tally = {} |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
265 handle = open(tabular_file, "w") |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
266 handle.write("#ID\t%s\n" % model) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
267 signalp_results = parse_signalp(signalp_output_file) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
268 for title, seq in fasta_iterator(fasta_file): |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
269 total += 1 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
270 rxlr = "N" |
29 | 271 name = title.split(None, 1)[0] |
32 | 272 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper()) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
273 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
274 del match |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
275 # This was the criteria for calling SignalP, |
29 | 276 # so it will be in the SignalP results. |
32 | 277 sp_id, sp_hmm_score, sp_nn_len = next(signalp_results) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
278 assert name == sp_id, "%s vs %s" % (name, sp_id) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
279 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
280 match = re_rxlr.search(seq[sp_nn_len:].upper()) |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
281 if match and match.start() + 1 <= max_sp_rxlr: # 1-based counting |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
282 rxlr_start = sp_nn_len + match.start() + 1 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
283 if min_rxlr_start <= rxlr_start <= max_rxlr_start: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
284 rxlr = "Y" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
285 if model == "Whisson2007": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
286 # Combine the signalp with regular expression heuristic and the HMM |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
287 if name in hmm_hits and rxlr == "N": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
288 rxlr = "hmm" # HMM only |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
289 elif rxlr == "N": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
290 rxlr = "neither" # Don't use N (no) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
291 elif name not in hmm_hits and rxlr == "Y": |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
292 rxlr = "re" # Heuristic only |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
293 # Now have a four way classifier: Y, hmm, re, neither |
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
294 # and count is the number of Y results (both HMM and heuristic) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
295 handle.write("%s\t%s\n" % (name, rxlr)) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
296 try: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
297 tally[rxlr] += 1 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
298 except KeyError: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
299 tally[rxlr] = 1 |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
300 handle.close() |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
301 assert sum(tally.values()) == total |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
302 |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
303 # Check the iterator is finished |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
304 try: |
32 | 305 next(signalp_results) |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
306 assert False, "Unexpected data in SignalP output" |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
307 except StopIteration: |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
308 pass |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
309 |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
310 # Cleanup |
6
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
311 os.remove(signalp_input_file) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
312 os.remove(signalp_output_file) |
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
313 |
26
20139cb4c844
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents:
18
diff
changeset
|
314 # Short summary to stdout for Galaxy's info display |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
315 print("%s for %i sequences:" % (model, total)) |
32 | 316 print(", ".join("%s = %i" % kv for kv in sorted(tally.items()))) |