annotate tools/protein_analysis/rxlr_motifs.py @ 34:7a2e20baacee draft default tip

"v0.2.13 - Python 3 fix for raising StopIteration"
author peterjc
date Thu, 17 Jun 2021 17:58:23 +0000
parents 20da7f48b56f
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
2 """Implements assorted RXLR motif methods from the literature.
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * Protein FASTA filename
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * Number of threads
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * Output tabular filename
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 The model names are:
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
11 * Bhattacharjee2006: Simple regular expression search for RXLR
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
12 with additional requirements for positioning and signal peptide.
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
13 * Win2007: Simple regular expression search for RXLR, but with
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
14 different positional requirements.
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
15 * Whisson2007: As Bhattacharjee2006 but with a more complex regular
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
16 expression to look for RXLR-EER domain, and additionally calls HMMER.
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 See the help text in the accompanying Galaxy tool XML file for more
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 details including the full references.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
21 Note
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
22 ----
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0,
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 which is no longer available. The current release is SignalP v3.0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 the predicted cleavage site, as this is expected to be more accurate.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 Also note that the HMM score values have changed from v2.0 to v3.0.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 Whisson et al. (2007) used SignalP v3.0 anyway.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
31 can still be used with hmmsearch from HMMER 3, sadly this does give
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 slightly different results. We expect the hmmsearch from HMMER 2.3.2
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 (the last stable release of HMMER 2) to be present on the path under
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 the name hmmsearch2 (allowing it to co-exist with HMMER 3).
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
35
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
36 If using Conda, you should therefore install the special "hmmer2"
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
37 package from BioConda which provides "hmmsearch2" etc::
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
38
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
39 conda install -c bioconda hmmer2
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
40
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
41 See https://bioconda.github.io/recipes/hmmer2/README.html and
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
42 https://anaconda.org/bioconda/hmmer2
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 """
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
44
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
45 from __future__ import print_function
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
46
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 import os
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 import re
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 import subprocess
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
50 import sys
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
51
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
52 from seq_analysis_utils import fasta_iterator
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
53
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
54 if "-v" in sys.argv:
34
7a2e20baacee "v0.2.13 - Python 3 fix for raising StopIteration"
peterjc
parents: 32
diff changeset
55 print("RXLR Motifs v0.0.17")
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
56 sys.exit(0)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 if len(sys.argv) != 5:
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
59 sys.exit(
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
60 "Requires four arguments: protein FASTA filename, threads, "
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
61 "model, and output filename"
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
62 )
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 fasta_file, threads, model, tabular_file = sys.argv[1:]
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 hmm_output_file = tabular_file + ".hmm.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 signalp_input_file = tabular_file + ".fasta.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 signalp_output_file = tabular_file + ".tabular.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 min_signalp_hmm = 0.9
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 hmmer_search = "hmmsearch2"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 if model == "Bhattacharjee2006":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
72 signalp_trunc = 70
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
73 re_rxlr = re.compile("R.LR")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
74 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
75 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
76 max_sp_rxlr = 100
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
77 min_rxlr_start = 1
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
78 # Allow signal peptide to be at most 40aa, and want RXLR to be
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
79 # within 100aa, therefore for the prescreen the max start is 140:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
80 max_rxlr_start = max_sp + max_sp_rxlr
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 elif model == "Win2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
82 signalp_trunc = 70
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
83 re_rxlr = re.compile("R.LR")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
84 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
85 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
86 min_rxlr_start = 30
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
87 max_rxlr_start = 60
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
88 # No explicit limit on separation of signal peptide clevage
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
89 # and RXLR, but shortest signal peptide is 10, and furthest
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
90 # away RXLR is 60, so effectively limit is 50.
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
91 max_sp_rxlr = max_rxlr_start - min_sp + 1
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 elif model == "Whisson2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
93 signalp_trunc = 0 # zero for no truncation
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
94 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
95 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
96 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
97 max_sp_rxlr = 100
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
98 min_rxlr_start = 1
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
99 max_rxlr_start = max_sp + max_sp_rxlr
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 else:
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
101 sys.exit(
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
102 "Did not recognise the model name %r\n"
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
103 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
104 )
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 def get_hmmer_version(exe, required=None):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 try:
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
109 child = subprocess.Popen(
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
110 [exe, "-h"],
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
111 universal_newlines=True,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
112 stdout=subprocess.PIPE,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
113 stderr=subprocess.PIPE,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
114 )
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 except OSError:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 raise ValueError("Could not run %s" % exe)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 stdout, stderr = child.communicate()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 if required:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 return required in stdout
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 elif "HMMER 2" in stdout:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 return 2
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 elif "HMMER 3" in stdout:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 return 3
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 else:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 raise ValueError("Could not determine version of %s" % exe)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
126
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
128 # Run hmmsearch for Whisson et al. (2007)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 if model == "Whisson2007":
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
130 hmm_file = os.path.join(
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
131 os.path.split(sys.argv[0])[0], "whisson_et_al_rxlr_eer_cropped.hmm"
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
132 )
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 if not os.path.isfile(hmm_file):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
134 sys.exit("Missing HMM file for Whisson et al. (2007)")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
136 sys.exit("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_search)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
137
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 hmm_hits = set()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 valid_ids = set()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 for title, seq in fasta_iterator(fasta_file):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
141 name = title.split(None, 1)[0]
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
142 if name in valid_ids:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
143 sys.exit("Duplicated identifier %r" % name)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
144 else:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
145 valid_ids.add(name)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
146 if not valid_ids:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
147 # Special case, don't need to run HMMER if there are no sequences
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
148 pass
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
149 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
150 # I've left the code to handle HMMER 3 in situ, in case
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
151 # we revisit the choice to insist on HMMER 2.
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
152 hmmer3 = 3 == get_hmmer_version(hmmer_search)
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
153 # Using zero (or 5.6?) for bitscore threshold
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
154 if hmmer3:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
155 # The HMMER3 table output is easy to parse
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
156 # In HMMER3 can't use both -T and -E
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
157 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" % (
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
158 hmmer_search,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
159 hmm_output_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
160 hmm_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
161 fasta_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
162 )
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
163 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
164 # For HMMER2 we are stuck with parsing stdout
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
165 # Put 1e6 to effectively have no expectation threshold (otherwise
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
166 # HMMER defaults to 10 and the calculated e-value depends on the
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
167 # input FASTA file, and we can loose hits of interest).
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
168 cmd = "%s -T 0 -E 1e6 %s %s > %s" % (
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
169 hmmer_search,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
170 hmm_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
171 fasta_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
172 hmm_output_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
173 )
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
174 return_code = os.system(cmd)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
175 if return_code:
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
176 sys.stderr.write("Error %i from hmmsearch:\n%s\n" % (return_code, cmd))
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
177 sys.exit(return_code)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
178
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
179 handle = open(hmm_output_file)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
180 for line in handle:
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
181 if not line.strip():
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
182 # We expect blank lines in the HMMER2 stdout
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
183 continue
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
184 elif line.startswith("#"):
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
185 # Header
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
186 continue
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
187 else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
188 name = line.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
189 # Should be a sequence name in the HMMER3 table output.
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
190 # Could be anything in the HMMER2 stdout.
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
191 if name in valid_ids:
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
192 hmm_hits.add(name)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
193 elif hmmer3:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
194 sys.exit("Unexpected identifer %r in hmmsearch output" % name)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
195 handle.close()
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
196 # if hmmer3:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
197 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
198 # else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
199 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
200 # print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids))
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
201 os.remove(hmm_output_file)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
202 del valid_ids
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
205 # Prepare short list of candidates containing RXLR to pass to SignalP
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
206 assert min_rxlr_start > 0, "Min value one, since zero based counting"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
207 count = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
208 total = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
209 handle = open(signalp_input_file, "w")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210 for title, seq in fasta_iterator(fasta_file):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211 total += 1
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
212 name = title.split(None, 1)[0]
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
213 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper())
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
214 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
215 # This is a potential RXLR, depending on the SignalP results.
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
216 # Might as well truncate the sequence now, makes the temp file smaller
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
217 if signalp_trunc:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
218 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc]))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
219 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
220 # Does it matter we don't line wrap?
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
221 handle.write(">%s\n%s\n" % (name, seq))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
222 count += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
223 handle.close()
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
224 # print "Running SignalP on %i/%i potentials." % (count, total)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
225
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
226
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
227 # Run SignalP (using our wrapper script to get multi-core support etc)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
228 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
229 if not os.path.isfile(signalp_script):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
230 sys.exit("Error - missing signalp3.py script")
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
231 cmd = "python '%s' 'euk' '%i' '%s' '%s' '%s'" % (
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
232 signalp_script,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
233 signalp_trunc,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
234 threads,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
235 signalp_input_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
236 signalp_output_file,
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
237 )
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
238 return_code = os.system(cmd)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
239 if return_code:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
240 sys.exit("Error %i from SignalP:\n%s" % (return_code, cmd))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
241 # print "SignalP done"
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
242
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
243
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
244 def parse_signalp(filename):
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
245 """Parse SignalP output, yield tuples of values.
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
246
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
247 Returns tuples of ID, HMM_Sprob_score and NN predicted signal
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
248 peptide length.
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
249
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
250 For signal peptide length we use NN_Ymax_pos (minus one).
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
251 """
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
252 handle = open(filename)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
253 line = handle.readline()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
254 assert line.startswith("#ID\t"), line
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
255 for line in handle:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
256 parts = line.rstrip("\t").split("\t")
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
257 assert len(parts) == 20, repr(line)
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
258 yield parts[0], float(parts[18]), int(parts[5]) - 1
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
259 handle.close()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
260
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
261
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
262 # Parse SignalP results and apply the strict RXLR criteria
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
263 total = 0
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
264 tally = {}
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
265 handle = open(tabular_file, "w")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
266 handle.write("#ID\t%s\n" % model)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
267 signalp_results = parse_signalp(signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
268 for title, seq in fasta_iterator(fasta_file):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
269 total += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
270 rxlr = "N"
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
271 name = title.split(None, 1)[0]
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
272 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper())
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
273 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
274 del match
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
275 # This was the criteria for calling SignalP,
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
276 # so it will be in the SignalP results.
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
277 sp_id, sp_hmm_score, sp_nn_len = next(signalp_results)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
278 assert name == sp_id, "%s vs %s" % (name, sp_id)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
279 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
280 match = re_rxlr.search(seq[sp_nn_len:].upper())
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
281 if match and match.start() + 1 <= max_sp_rxlr: # 1-based counting
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
282 rxlr_start = sp_nn_len + match.start() + 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
283 if min_rxlr_start <= rxlr_start <= max_rxlr_start:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
284 rxlr = "Y"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
285 if model == "Whisson2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
286 # Combine the signalp with regular expression heuristic and the HMM
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
287 if name in hmm_hits and rxlr == "N":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
288 rxlr = "hmm" # HMM only
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
289 elif rxlr == "N":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
290 rxlr = "neither" # Don't use N (no)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
291 elif name not in hmm_hits and rxlr == "Y":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
292 rxlr = "re" # Heuristic only
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
293 # Now have a four way classifier: Y, hmm, re, neither
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
294 # and count is the number of Y results (both HMM and heuristic)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
295 handle.write("%s\t%s\n" % (name, rxlr))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
296 try:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
297 tally[rxlr] += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
298 except KeyError:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
299 tally[rxlr] = 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
300 handle.close()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
301 assert sum(tally.values()) == total
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
302
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
303 # Check the iterator is finished
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
304 try:
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
305 next(signalp_results)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
306 assert False, "Unexpected data in SignalP output"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
307 except StopIteration:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
308 pass
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
309
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
310 # Cleanup
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
311 os.remove(signalp_input_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
312 os.remove(signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
313
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
314 # Short summary to stdout for Galaxy's info display
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
315 print("%s for %i sequences:" % (model, total))
32
20da7f48b56f "Check this is up to date with all 2020 changes"
peterjc
parents: 30
diff changeset
316 print(", ".join("%s = %i" % kv for kv in sorted(tally.items())))