annotate tools/protein_analysis/rxlr_motifs.py @ 29:3cb02adf4326 draft

v0.2.9 Python style improvements
author peterjc
date Wed, 01 Feb 2017 09:46:14 -0500
parents 20139cb4c844
children 6d9d7cdf00fc
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Implements assorted RXLR motif methods from the literature
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * Protein FASTA filename
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * Number of threads
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * Output tabular filename
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 The model names are:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 Bhattacharjee2006: Simple regular expression search for RXLR
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 with additional requirements for positioning and signal peptide.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 Win2007: Simple regular expression search for RXLR, but with
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 different positional requirements.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 Whisson2007: As Bhattacharjee2006 but with a more complex regular
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 expression to look for RXLR-EER domain, and additionally calls HMMER.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 See the help text in the accompanying Galaxy tool XML file for more
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 details including the full references.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 Note:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0,
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 which is no longer available. The current release is SignalP v3.0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 the predicted cleavage site, as this is expected to be more accurate.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 Also note that the HMM score values have changed from v2.0 to v3.0.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Whisson et al. (2007) used SignalP v3.0 anyway.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 can still be used with hmmsearch from HMMER 3 this this does give
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 slightly different results. We expect the hmmsearch from HMMER 2.3.2
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 (the last stable release of HMMER 2) to be present on the path under
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 the name hmmsearch2 (allowing it to co-exist with HMMER 3).
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 """
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 import os
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 import sys
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 import re
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 import subprocess
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
43 from seq_analysis_utils import fasta_iterator
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
44
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
45 if "-v" in sys.argv:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
46 print("RXLR Motifs v0.0.10")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
47 sys.exit(0)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 if len(sys.argv) != 5:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
50 sys.exit("Requires four arguments: protein FASTA filename, threads, model, and output filename")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 fasta_file, threads, model, tabular_file = sys.argv[1:]
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 hmm_output_file = tabular_file + ".hmm.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 signalp_input_file = tabular_file + ".fasta.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 signalp_output_file = tabular_file + ".tabular.tmp"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 min_signalp_hmm = 0.9
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 hmmer_search = "hmmsearch2"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 if model == "Bhattacharjee2006":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
60 signalp_trunc = 70
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
61 re_rxlr = re.compile("R.LR")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
62 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
63 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
64 max_sp_rxlr = 100
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
65 min_rxlr_start = 1
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
66 # Allow signal peptide to be at most 40aa, and want RXLR to be
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
67 # within 100aa, therefore for the prescreen the max start is 140:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
68 max_rxlr_start = max_sp + max_sp_rxlr
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 elif model == "Win2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
70 signalp_trunc = 70
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
71 re_rxlr = re.compile("R.LR")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
72 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
73 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
74 min_rxlr_start = 30
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
75 max_rxlr_start = 60
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
76 # No explicit limit on separation of signal peptide clevage
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
77 # and RXLR, but shortest signal peptide is 10, and furthest
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
78 # away RXLR is 60, so effectively limit is 50.
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
79 max_sp_rxlr = max_rxlr_start - min_sp + 1
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 elif model == "Whisson2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
81 signalp_trunc = 0 # zero for no truncation
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
82 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]")
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
83 min_sp = 10
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
84 max_sp = 40
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
85 max_sp_rxlr = 100
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
86 min_rxlr_start = 1
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
87 max_rxlr_start = max_sp + max_sp_rxlr
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
89 sys.exit("Did not recognise the model name %r\n"
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
90 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 def get_hmmer_version(exe, required=None):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 cmd = "%s -h" % exe
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 try:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 child = subprocess.Popen([exe, "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 except OSError:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 raise ValueError("Could not run %s" % exe)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 stdout, stderr = child.communicate()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 if required:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 return required in stdout
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 elif "HMMER 2" in stdout:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 return 2
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 elif "HMMER 3" in stdout:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 return 3
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 else:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 raise ValueError("Could not determine version of %s" % exe)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
108
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
110 # Run hmmsearch for Whisson et al. (2007)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 if model == "Whisson2007":
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 hmm_file = os.path.join(os.path.split(sys.argv[0])[0],
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 "whisson_et_al_rxlr_eer_cropped.hmm")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 if not os.path.isfile(hmm_file):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
115 sys.exit("Missing HMM file for Whisson et al. (2007)")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
117 sys.exit("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_search)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
118
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 hmm_hits = set()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 valid_ids = set()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 for title, seq in fasta_iterator(fasta_file):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
122 name = title.split(None, 1)[0]
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 if name in valid_ids:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
124 sys.exit("Duplicated identifier %r" % name)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 else:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126 valid_ids.add(name)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
127 if not valid_ids:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
128 # Special case, don't need to run HMMER if there are no sequences
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
129 pass
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
130 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
131 # I've left the code to handle HMMER 3 in situ, in case
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
132 # we revisit the choice to insist on HMMER 2.
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
133 hmmer3 = (3 == get_hmmer_version(hmmer_search))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
134 # Using zero (or 5.6?) for bitscore threshold
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
135 if hmmer3:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
136 # The HMMER3 table output is easy to parse
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
137 # In HMMER3 can't use both -T and -E
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
138 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
139 % (hmmer_search, hmm_output_file, hmm_file, fasta_file)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
141 # For HMMER2 we are stuck with parsing stdout
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
142 # Put 1e6 to effectively have no expectation threshold (otherwise
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
143 # HMMER defaults to 10 and the calculated e-value depends on the
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
144 # input FASTA file, and we can loose hits of interest).
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
145 cmd = "%s -T 0 -E 1e6 %s %s > %s" \
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
146 % (hmmer_search, hmm_file, fasta_file, hmm_output_file)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
147 return_code = os.system(cmd)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
148 if return_code:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
149 sys.exit("Error %i from hmmsearch:\n%s" % (return_code, cmd), return_code)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
150
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
151 handle = open(hmm_output_file)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
152 for line in handle:
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
153 if not line.strip():
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
154 # We expect blank lines in the HMMER2 stdout
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
155 continue
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
156 elif line.startswith("#"):
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
157 # Header
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
158 continue
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
159 else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
160 name = line.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
161 # Should be a sequence name in the HMMER3 table output.
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
162 # Could be anything in the HMMER2 stdout.
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
163 if name in valid_ids:
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
164 hmm_hits.add(name)
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
165 elif hmmer3:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
166 sys.exit("Unexpected identifer %r in hmmsearch output" % name)
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
167 handle.close()
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
168 # if hmmer3:
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
169 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
170 # else:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
171 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
172 # print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids))
18
2b35b5c4b7f4 Uploaded v0.2.5, preview 2, fixed bug in RXLR tools
peterjc
parents: 6
diff changeset
173 os.remove(hmm_output_file)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
174 del valid_ids
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
175
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
176
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
177 # Prepare short list of candidates containing RXLR to pass to SignalP
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
178 assert min_rxlr_start > 0, "Min value one, since zero based counting"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
179 count = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
180 total = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
181 handle = open(signalp_input_file, "w")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
182 for title, seq in fasta_iterator(fasta_file):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
183 total += 1
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
184 name = title.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
185 match = re_rxlr.search(seq[min_rxlr_start - 1:].upper())
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
186 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
187 # This is a potential RXLR, depending on the SignalP results.
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
188 # Might as well truncate the sequence now, makes the temp file smaller
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
189 if signalp_trunc:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
190 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc]))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
191 else:
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
192 # Does it matter we don't line wrap?
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
193 handle.write(">%s\n%s\n" % (name, seq))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
194 count += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
195 handle.close()
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
196 # print "Running SignalP on %i/%i potentials." % (count, total)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
197
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
198
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
199 # Run SignalP (using our wrapper script to get multi-core support etc)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
200 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
201 if not os.path.isfile(signalp_script):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
202 sys.exit("Error - missing signalp3.py script")
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203 cmd = "python %s euk %i %s %s %s" % (signalp_script, signalp_trunc, threads, signalp_input_file, signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204 return_code = os.system(cmd)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
205 if return_code:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
206 sys.exit("Error %i from SignalP:\n%s" % (return_code, cmd))
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
207 # print "SignalP done"
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
208
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
209
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210 def parse_signalp(filename):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211 """Parse SignalP output, yield tuples of ID, HMM_Sprob_score and NN predicted signal peptide length.
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
212
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
213 For signal peptide length we use NN_Ymax_pos (minus one).
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
214 """
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
215 handle = open(filename)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
216 line = handle.readline()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
217 assert line.startswith("#ID\t"), line
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
218 for line in handle:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
219 parts = line.rstrip("\t").split("\t")
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
220 assert len(parts) == 20, repr(line)
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
221 yield parts[0], float(parts[18]), int(parts[5]) - 1
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
222 handle.close()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
223
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
224
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
225 # Parse SignalP results and apply the strict RXLR criteria
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
226 total = 0
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
227 tally = dict()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
228 handle = open(tabular_file, "w")
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
229 handle.write("#ID\t%s\n" % model)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
230 signalp_results = parse_signalp(signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
231 for title, seq in fasta_iterator(fasta_file):
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
232 total += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
233 rxlr = "N"
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
234 name = title.split(None, 1)[0]
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
235 match = re_rxlr.search(seq[min_rxlr_start - 1:].upper())
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
236 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
237 del match
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
238 # This was the criteria for calling SignalP,
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
239 # so it will be in the SignalP results.
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
240 sp_id, sp_hmm_score, sp_nn_len = signalp_results.next()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
241 assert name == sp_id, "%s vs %s" % (name, sp_id)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
242 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
243 match = re_rxlr.search(seq[sp_nn_len:].upper())
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
244 if match and match.start() + 1 <= max_sp_rxlr: # 1-based counting
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
245 rxlr_start = sp_nn_len + match.start() + 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
246 if min_rxlr_start <= rxlr_start <= max_rxlr_start:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
247 rxlr = "Y"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
248 if model == "Whisson2007":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
249 # Combine the signalp with regular expression heuristic and the HMM
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
250 if name in hmm_hits and rxlr == "N":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
251 rxlr = "hmm" # HMM only
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
252 elif rxlr == "N":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
253 rxlr = "neither" # Don't use N (no)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
254 elif name not in hmm_hits and rxlr == "Y":
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
255 rxlr = "re" # Heuristic only
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
256 # Now have a four way classifier: Y, hmm, re, neither
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
257 # and count is the number of Y results (both HMM and heuristic)
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
258 handle.write("%s\t%s\n" % (name, rxlr))
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
259 try:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
260 tally[rxlr] += 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
261 except KeyError:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
262 tally[rxlr] = 1
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
263 handle.close()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
264 assert sum(tally.values()) == total
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
265
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
266 # Check the iterator is finished
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
267 try:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
268 signalp_results.next()
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
269 assert False, "Unexpected data in SignalP output"
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
270 except StopIteration:
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
271 pass
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
272
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
273 # Cleanup
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
274 os.remove(signalp_input_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
275 os.remove(signalp_output_file)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
276
26
20139cb4c844 planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/protein_analysis commit 221d4187992cbb993e02dc3ea0ef0150c7916a4a-dirty
peterjc
parents: 18
diff changeset
277 # Short summary to stdout for Galaxy's info display
6
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
278 print "%s for %i sequences:" % (model, total)
39a6e46cdda3 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
279 print ", ".join("%s = %i" % kv for kv in sorted(tally.iteritems()))