annotate tools/protein_analysis/rxlr_motifs.py @ 31:3c9bbbc6cbbe draft

v0.2.11 take two; removed tabs in README file
author peterjc
date Thu, 21 Sep 2017 11:23:01 -0400
parents 6d9d7cdf00fc
children 20da7f48b56f
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Implements assorted RXLR motif methods from the literature
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * Protein FASTA filename
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * Number of threads
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * Output tabular filename
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 The model names are:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 Bhattacharjee2006: Simple regular expression search for RXLR
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 with additional requirements for positioning and signal peptide.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 Win2007: Simple regular expression search for RXLR, but with
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 different positional requirements.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 Whisson2007: As Bhattacharjee2006 but with a more complex regular
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 expression to look for RXLR-EER domain, and additionally calls HMMER.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 See the help text in the accompanying Galaxy tool XML file for more
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 details including the full references.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 Note:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0,
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 which is no longer available. The current release is SignalP v3.0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 the predicted cleavage site, as this is expected to be more accurate.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 Also note that the HMM score values have changed from v2.0 to v3.0.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Whisson et al. (2007) used SignalP v3.0 anyway.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
34 can still be used with hmmsearch from HMMER 3, sadly this does give
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 slightly different results. We expect the hmmsearch from HMMER 2.3.2
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 (the last stable release of HMMER 2) to be present on the path under
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 the name hmmsearch2 (allowing it to co-exist with HMMER 3).
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
38
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
39 If using Conda, you should therefore install the special "hmmer2"
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
40 package from BioConda which provides "hmmsearch2" etc::
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
41
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
42 conda install -c bioconda hmmer2
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
43
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
44 See https://bioconda.github.io/recipes/hmmer2/README.html and
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
45 https://anaconda.org/bioconda/hmmer2
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 """
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
47
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
48 from __future__ import print_function
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
49
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 import os
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 import re
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 import subprocess
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
53 import sys
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
54
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
55 from seq_analysis_utils import fasta_iterator
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
56
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
57 if "-v" in sys.argv:
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
58 print("RXLR Motifs v0.0.14")
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
59 sys.exit(0)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 if len(sys.argv) != 5:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
62 sys.exit("Requires four arguments: protein FASTA filename, threads, model, and output filename")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 fasta_file, threads, model, tabular_file = sys.argv[1:]
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 hmm_output_file = tabular_file + ".hmm.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 signalp_input_file = tabular_file + ".fasta.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 signalp_output_file = tabular_file + ".tabular.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 min_signalp_hmm = 0.9
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 hmmer_search = "hmmsearch2"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 if model == "Bhattacharjee2006":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
72 signalp_trunc = 70
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
73 re_rxlr = re.compile("R.LR")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
74 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
75 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
76 max_sp_rxlr = 100
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
77 min_rxlr_start = 1
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
78 # Allow signal peptide to be at most 40aa, and want RXLR to be
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
79 # within 100aa, therefore for the prescreen the max start is 140:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
80 max_rxlr_start = max_sp + max_sp_rxlr
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 elif model == "Win2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
82 signalp_trunc = 70
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
83 re_rxlr = re.compile("R.LR")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
84 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
85 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
86 min_rxlr_start = 30
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
87 max_rxlr_start = 60
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
88 # No explicit limit on separation of signal peptide clevage
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
89 # and RXLR, but shortest signal peptide is 10, and furthest
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
90 # away RXLR is 60, so effectively limit is 50.
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
91 max_sp_rxlr = max_rxlr_start - min_sp + 1
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 elif model == "Whisson2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
93 signalp_trunc = 0 # zero for no truncation
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
94 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
95 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
96 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
97 max_sp_rxlr = 100
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
98 min_rxlr_start = 1
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
99 max_rxlr_start = max_sp + max_sp_rxlr
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
101 sys.exit("Did not recognise the model name %r\n"
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
102 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 def get_hmmer_version(exe, required=None):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 try:
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
107 child = subprocess.Popen([exe, "-h"],
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
108 universal_newlines=True,
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
109 stdout=subprocess.PIPE, stderr=subprocess.PIPE)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 except OSError:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 raise ValueError("Could not run %s" % exe)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 stdout, stderr = child.communicate()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 if required:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 return required in stdout
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 elif "HMMER 2" in stdout:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 return 2
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 elif "HMMER 3" in stdout:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 return 3
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 else:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 raise ValueError("Could not determine version of %s" % exe)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
121
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
123 # Run hmmsearch for Whisson et al. (2007)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 if model == "Whisson2007":
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 hmm_file = os.path.join(os.path.split(sys.argv[0])[0],
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
126 "whisson_et_al_rxlr_eer_cropped.hmm")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 if not os.path.isfile(hmm_file):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
128 sys.exit("Missing HMM file for Whisson et al. (2007)")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
130 sys.exit("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_search)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
131
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 hmm_hits = set()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 valid_ids = set()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
134 for title, seq in fasta_iterator(fasta_file):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
135 name = title.split(None, 1)[0]
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
136 if name in valid_ids:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
137 sys.exit("Duplicated identifier %r" % name)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 else:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 valid_ids.add(name)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
140 if not valid_ids:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
141 # Special case, don't need to run HMMER if there are no sequences
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
142 pass
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
143 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
144 # I've left the code to handle HMMER 3 in situ, in case
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
145 # we revisit the choice to insist on HMMER 2.
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
146 hmmer3 = (3 == get_hmmer_version(hmmer_search))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
147 # Using zero (or 5.6?) for bitscore threshold
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
148 if hmmer3:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
149 # The HMMER3 table output is easy to parse
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
150 # In HMMER3 can't use both -T and -E
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
151 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
152 % (hmmer_search, hmm_output_file, hmm_file, fasta_file)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
153 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
154 # For HMMER2 we are stuck with parsing stdout
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
155 # Put 1e6 to effectively have no expectation threshold (otherwise
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
156 # HMMER defaults to 10 and the calculated e-value depends on the
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
157 # input FASTA file, and we can loose hits of interest).
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
158 cmd = "%s -T 0 -E 1e6 %s %s > %s" \
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
159 % (hmmer_search, hmm_file, fasta_file, hmm_output_file)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
160 return_code = os.system(cmd)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
161 if return_code:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
162 sys.exit("Error %i from hmmsearch:\n%s" % (return_code, cmd), return_code)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
163
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
164 handle = open(hmm_output_file)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
165 for line in handle:
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
166 if not line.strip():
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
167 # We expect blank lines in the HMMER2 stdout
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
168 continue
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
169 elif line.startswith("#"):
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
170 # Header
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
171 continue
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
172 else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
173 name = line.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
174 # Should be a sequence name in the HMMER3 table output.
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
175 # Could be anything in the HMMER2 stdout.
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
176 if name in valid_ids:
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
177 hmm_hits.add(name)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
178 elif hmmer3:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
179 sys.exit("Unexpected identifer %r in hmmsearch output" % name)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
180 handle.close()
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
181 # if hmmer3:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
182 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
183 # else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
184 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
185 # print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids))
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
186 os.remove(hmm_output_file)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
187 del valid_ids
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
188
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
189
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
190 # Prepare short list of candidates containing RXLR to pass to SignalP
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
191 assert min_rxlr_start > 0, "Min value one, since zero based counting"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
192 count = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
193 total = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
194 handle = open(signalp_input_file, "w")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
195 for title, seq in fasta_iterator(fasta_file):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
196 total += 1
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
197 name = title.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
198 match = re_rxlr.search(seq[min_rxlr_start - 1:].upper())
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
199 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
200 # This is a potential RXLR, depending on the SignalP results.
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
201 # Might as well truncate the sequence now, makes the temp file smaller
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
202 if signalp_trunc:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc]))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
205 # Does it matter we don't line wrap?
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
206 handle.write(">%s\n%s\n" % (name, seq))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
207 count += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
208 handle.close()
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
209 # print "Running SignalP on %i/%i potentials." % (count, total)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
212 # Run SignalP (using our wrapper script to get multi-core support etc)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
213 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
214 if not os.path.isfile(signalp_script):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
215 sys.exit("Error - missing signalp3.py script")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
216 cmd = "python %s euk %i %s %s %s" % (signalp_script, signalp_trunc, threads, signalp_input_file, signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
217 return_code = os.system(cmd)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
218 if return_code:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
219 sys.exit("Error %i from SignalP:\n%s" % (return_code, cmd))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
220 # print "SignalP done"
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
221
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
222
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
223 def parse_signalp(filename):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
224 """Parse SignalP output, yield tuples of ID, HMM_Sprob_score and NN predicted signal peptide length.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
225
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
226 For signal peptide length we use NN_Ymax_pos (minus one).
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
227 """
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
228 handle = open(filename)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
229 line = handle.readline()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
230 assert line.startswith("#ID\t"), line
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
231 for line in handle:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
232 parts = line.rstrip("\t").split("\t")
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
233 assert len(parts) == 20, repr(line)
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
234 yield parts[0], float(parts[18]), int(parts[5]) - 1
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
235 handle.close()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
236
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
237
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
238 # Parse SignalP results and apply the strict RXLR criteria
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
239 total = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
240 tally = dict()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
241 handle = open(tabular_file, "w")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
242 handle.write("#ID\t%s\n" % model)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
243 signalp_results = parse_signalp(signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
244 for title, seq in fasta_iterator(fasta_file):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
245 total += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
246 rxlr = "N"
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
247 name = title.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
248 match = re_rxlr.search(seq[min_rxlr_start - 1:].upper())
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
249 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
250 del match
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
251 # This was the criteria for calling SignalP,
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
252 # so it will be in the SignalP results.
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
253 sp_id, sp_hmm_score, sp_nn_len = signalp_results.next()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
254 assert name == sp_id, "%s vs %s" % (name, sp_id)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
255 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
256 match = re_rxlr.search(seq[sp_nn_len:].upper())
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
257 if match and match.start() + 1 <= max_sp_rxlr: # 1-based counting
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
258 rxlr_start = sp_nn_len + match.start() + 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
259 if min_rxlr_start <= rxlr_start <= max_rxlr_start:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
260 rxlr = "Y"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
261 if model == "Whisson2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
262 # Combine the signalp with regular expression heuristic and the HMM
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
263 if name in hmm_hits and rxlr == "N":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
264 rxlr = "hmm" # HMM only
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
265 elif rxlr == "N":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
266 rxlr = "neither" # Don't use N (no)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
267 elif name not in hmm_hits and rxlr == "Y":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
268 rxlr = "re" # Heuristic only
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
269 # Now have a four way classifier: Y, hmm, re, neither
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
270 # and count is the number of Y results (both HMM and heuristic)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
271 handle.write("%s\t%s\n" % (name, rxlr))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
272 try:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
273 tally[rxlr] += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
274 except KeyError:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
275 tally[rxlr] = 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
276 handle.close()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
277 assert sum(tally.values()) == total
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
278
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
279 # Check the iterator is finished
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
280 try:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
281 signalp_results.next()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
282 assert False, "Unexpected data in SignalP output"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
283 except StopIteration:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
284 pass
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
285
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
286 # Cleanup
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
287 os.remove(signalp_input_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
288 os.remove(signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
289
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
290 # Short summary to stdout for Galaxy's info display
30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
291 print("%s for %i sequences:" % (model, total))
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 29
diff changeset
292 print(", ".join("%s = %i" % kv for kv in sorted(tally.iteritems())))