Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/signalp3.py @ 34:7a2e20baacee draft default tip
"v0.2.13 - Python 3 fix for raising StopIteration"
author | peterjc |
---|---|
date | Thu, 17 Jun 2021 17:58:23 +0000 |
parents | 20da7f48b56f |
children |
rev | line source |
---|---|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Wrapper for SignalP v3.0 for use in Galaxy. |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
5
ef7ceca37e3f
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
4 This script takes exactly five command line arguments: |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 * the organism type (euk, gram+ or gram-) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 * length to truncate sequences to (integer) |
7 | 7 * number of threads to use (integer, defaults to one) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 * an input protein FASTA filename |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 * output tabular filename. |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 |
7 | 11 There are two further optional arguments |
12 * cut type (NN_Cmax, NN_Ymax, NN_Smax or HMM_Cmax) | |
13 * output GFF3 filename | |
14 | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 It then calls the standalone SignalP v3.0 program (not the webservice) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 requesting the short output (one line per protein) using both NN and HMM |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 for predictions. |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 First major feature is cleaning up the output. The raw output from SignalP |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 v3.0 looks like this (21 columns space separated): |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 # SignalP-NN euk predictions # SignalP-HMM euk predictions |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? D ? # name ! Cmax pos ? Sprob ? |
29 | 24 gi|2781234|pdb|1JLY| 0.061 17 N 0.043 17 N 0.199 1 N 0.067 N 0.055 N gi|2781234|pdb|1JLY|B Q 0.000 17 N 0.000 N |
25 gi|4959044|gb|AAD342 0.099 191 N 0.012 38 N 0.023 12 N 0.014 N 0.013 N gi|4959044|gb|AAD34209.1|AF069992_1 Q 0.000 0 N 0.000 N | |
26 gi|671626|emb|CAA856 0.139 381 N 0.020 8 N 0.121 4 N 0.067 N 0.044 N gi|671626|emb|CAA85685.1| Q 0.000 0 N 0.000 N | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 gi|3298468|dbj|BAA31 0.208 24 N 0.184 38 N 0.980 32 Y 0.613 Y 0.398 N gi|3298468|dbj|BAA31520.1| Q 0.066 24 N 0.139 N |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 In order to make it easier to use in Galaxy, this wrapper script reformats |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 this to use tab separators. Also it removes the redundant truncated name |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 column, and assigns unique column names in the header: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 #ID NN_Cmax_score NN_Cmax_pos NN_Cmax_pred NN_Ymax_score NN_Ymax_pos NN_Ymax_pred NN_Smax_score NN_Smax_pos NN_Smax_pred NN_Smean_score NN_Smean_pred NN_D_score NN_D_pred HMM_bang HMM_Cmax_score HMM_Cmax_pos HMM_Cmax_pred HMM_Sprob_score HMM_Sprob_pred |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 gi|2781234|pdb|1JLY|B 0.061 17 N 0.043 17 N 0.199 1 N 0.067 N 0.055 N Q 0.000 17 N 0.000 N |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
35 gi|4959044|gb|AAD34209.1|AF069992_1 0.099 191 N 0.012 38 N 0.023 12 N 0.014 N 0.013 N Q 0.000 0 N 0.000 N |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
36 gi|671626|emb|CAA85685.1| 0.139 381 N 0.020 8 N 0.121 4 N 0.067 N 0.044 N Q 0.000 0 N 0.000 N |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
37 gi|3298468|dbj|BAA31520.1| 0.208 24 N 0.184 38 N 0.980 32 Y 0.613 Y 0.398 N Q 0.066 24 N 0.139 N |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
38 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
39 The second major feature is overcoming SignalP's built in limit of 4000 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
40 sequences by breaking up the input FASTA file into chunks. This also allows |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
41 us to pre-trim the sequences since SignalP only needs their starts. |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
42 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 The third major feature is taking advantage of multiple cores (since SignalP |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 v3.0 itself is single threaded) by using the individual FASTA input files to |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
45 run multiple copies of TMHMM in parallel. I would normally use Python's |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 multiprocessing library in this situation but it requires at least Python 2.6 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 and at the time of writing Galaxy still supports Python 2.4. |
7 | 48 |
49 Note that this is somewhat redundant with job-splitting available in Galaxy | |
50 itself (see the SignalP XML file for settings). | |
51 | |
52 Finally, you can opt to have a GFF3 file produced which will describe the | |
53 predicted signal peptide and mature peptide for each protein (using one of | |
54 the predictors which gives a cleavage site). *WORK IN PROGRESS* | |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
55 """ # noqa: E501 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
56 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
57 from __future__ import print_function |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
58 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
59 import os |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
60 import sys |
7 | 61 import tempfile |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
62 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
63 from seq_analysis_utils import fasta_iterator, split_fasta |
7 | 64 from seq_analysis_utils import run_jobs, thread_count |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
65 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 FASTA_CHUNK = 500 |
29 | 67 MAX_LEN = 6000 # Found by trial and error |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
69 if "-v" in sys.argv or "--version" in sys.argv: |
34 | 70 print("SignalP Galaxy wrapper version 0.0.20") |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
71 sys.exit(os.system("signalp -version")) |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
72 |
29 | 73 if len(sys.argv) not in [6, 8]: |
32 | 74 sys.exit( |
75 "Require five (or 7) arguments, organism, truncate, threads, " | |
76 "input protein FASTA file & output tabular file (plus " | |
77 "optionally cut method and GFF3 output file). " | |
78 "Got %i arguments." % (len(sys.argv) - 1) | |
79 ) | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 organism = sys.argv[1] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
82 if organism not in ["euk", "gram+", "gram-"]: |
29 | 83 sys.exit("Organism argument %s is not one of euk, gram+ or gram-" % organism) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
85 try: |
8 | 86 truncate = int(sys.argv[2]) |
29 | 87 except ValueError: |
8 | 88 truncate = 0 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 if truncate < 0: |
29 | 90 sys.exit("Truncate argument %s is not a positive integer (or zero)" % sys.argv[2]) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
91 |
7 | 92 num_threads = thread_count(sys.argv[3], default=4) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 fasta_file = sys.argv[4] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
94 tabular_file = sys.argv[5] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
95 |
7 | 96 if len(sys.argv) == 8: |
8 | 97 cut_method = sys.argv[6] |
98 if cut_method not in ["NN_Cmax", "NN_Ymax", "NN_Smax", "HMM_Cmax"]: | |
29 | 99 sys.exit("Invalid cut method %r" % cut_method) |
8 | 100 gff3_file = sys.argv[7] |
7 | 101 else: |
8 | 102 cut_method = None |
103 gff3_file = None | |
7 | 104 |
105 | |
106 tmp_dir = tempfile.mkdtemp() | |
107 | |
29 | 108 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
109 def clean_tabular(raw_handle, out_handle, gff_handle=None): |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
110 """Clean up SignalP output to make it tabular.""" |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
111 for line in raw_handle: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
112 if not line or line.startswith("#"): |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 continue |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 parts = line.rstrip("\r\n").split() |
29 | 115 assert len(parts) == 21, repr(line) |
32 | 116 assert parts[14].startswith(parts[0]), ( |
8 | 117 "Bad entry in SignalP output, ID miss-match:\n%r" % line |
32 | 118 ) |
29 | 119 # Remove redundant truncated name column (col 0) |
120 # and put full name at start (col 14) | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 parts = parts[14:15] + parts[1:14] + parts[15:] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 out_handle.write("\t".join(parts) + "\n") |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
123 |
29 | 124 |
7 | 125 def make_gff(fasta_file, tabular_file, gff_file, cut_method): |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
126 """Make a GFF file.""" |
32 | 127 cut_col, score_col = { |
128 "NN_Cmax": (2, 1), | |
129 "NN_Ymax": (5, 4), | |
130 "NN_Smax": (8, 7), | |
131 "HMM_Cmax": (16, 15), | |
132 }[cut_method] | |
7 | 133 |
134 source = "SignalP" | |
29 | 135 strand = "." # not stranded |
136 phase = "." # not phased | |
7 | 137 tags = "Note=%s" % cut_method |
29 | 138 |
7 | 139 tab_handle = open(tabular_file) |
140 line = tab_handle.readline() | |
141 assert line.startswith("#ID\t"), line | |
142 | |
143 gff_handle = open(gff_file, "w") | |
144 gff_handle.write("##gff-version 3\n") | |
145 | |
146 for (title, seq), line in zip(fasta_iterator(fasta_file), tab_handle): | |
147 parts = line.rstrip("\n").split("\t") | |
148 seqid = parts[0] | |
149 assert title.startswith(seqid), "%s vs %s" % (seqid, title) | |
29 | 150 if not seq: |
151 # Is it possible to have a zero length reference in GFF3? | |
7 | 152 continue |
153 cut = int(parts[cut_col]) | |
154 if cut == 0: | |
155 assert cut_method == "HMM_Cmax", cut_method | |
29 | 156 # TODO - Why does it do this? |
7 | 157 cut = 1 |
158 assert 1 <= cut <= len(seq), "%i for %s len %i" % (cut, seqid, len(seq)) | |
159 score = parts[score_col] | |
32 | 160 gff_handle.write("##sequence-region %s %i %i\n" % (seqid, 1, len(seq))) |
29 | 161 # If the cut is at the very begining, there is no signal peptide! |
7 | 162 if cut > 1: |
29 | 163 # signal_peptide = SO:0000418 |
32 | 164 gff_handle.write( |
165 "%s\t%s\t%s\t%i\t%i\t%s\t%s\t%s\t%s\n" | |
166 % ( | |
167 seqid, | |
168 source, | |
169 "signal_peptide", | |
170 1, | |
171 cut - 1, | |
172 score, | |
173 strand, | |
174 phase, | |
175 tags, | |
176 ) | |
177 ) | |
29 | 178 # mature_protein_region = SO:0000419 |
32 | 179 gff_handle.write( |
180 "%s\t%s\t%s\t%i\t%i\t%s\t%s\t%s\t%s\n" | |
181 % ( | |
182 seqid, | |
183 source, | |
184 "mature_protein_region", | |
185 cut, | |
186 len(seq), | |
187 score, | |
188 strand, | |
189 phase, | |
190 tags, | |
191 ) | |
192 ) | |
29 | 193 tab_handle.close() |
7 | 194 gff_handle.close() |
195 | |
196 | |
32 | 197 if num_threads == 1: |
198 # Still want to call split_fasta to apply truncation, but | |
199 # no reason to make multiple files - and more chance of | |
200 # hitting file system glitches if we do. So, | |
201 FASTA_CHUNK = sys.maxsize | |
202 | |
203 fasta_files = split_fasta( | |
204 fasta_file, | |
205 os.path.join(tmp_dir, "signalp"), | |
206 n=FASTA_CHUNK, | |
207 truncate=truncate, | |
208 max_len=MAX_LEN, | |
209 ) | |
29 | 210 temp_files = [f + ".out" for f in fasta_files] |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
211 assert len(fasta_files) == len(temp_files) |
32 | 212 jobs = [ |
213 "signalp -short -t %s %s > %s" % (organism, fasta, temp) | |
214 for (fasta, temp) in zip(fasta_files, temp_files) | |
215 ] | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
216 assert len(fasta_files) == len(temp_files) == len(jobs) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
217 |
29 | 218 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
219 def clean_up(file_list): |
29 | 220 """Remove temp files, and if possible the temp directory.""" |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
221 for f in file_list: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
222 if os.path.isfile(f): |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
223 os.remove(f) |
7 | 224 try: |
225 os.rmdir(tmp_dir) | |
29 | 226 except Exception: |
7 | 227 pass |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
228 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
229 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
230 if len(jobs) > 1 and num_threads > 1: |
29 | 231 # A small "info" message for Galaxy to show the user. |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
232 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
233 results = run_jobs(jobs, num_threads) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
234 assert len(fasta_files) == len(temp_files) == len(jobs) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
235 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
236 error_level = results[cmd] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
237 try: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
238 output = open(temp).readline() |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
239 except IOError: |
7 | 240 output = "(no output)" |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
241 if error_level or output.lower().startswith("error running"): |
7 | 242 clean_up(fasta_files + temp_files) |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
243 if output: |
32 | 244 sys.stderr.write( |
245 "One or more tasks failed, e.g. %i from %r gave:\n%s" | |
246 % (error_level, cmd, output) | |
247 ) | |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
248 else: |
32 | 249 sys.stderr.write( |
250 "One or more tasks failed, e.g. %i from %r with no output\n" | |
251 % (error_level, cmd) | |
252 ) | |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
253 sys.exit(error_level) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
254 del results |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
255 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
256 out_handle = open(tabular_file, "w") |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
257 fields = ["ID"] |
29 | 258 # NN results: |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
259 for name in ["Cmax", "Ymax", "Smax"]: |
29 | 260 fields.extend(["NN_%s_score" % name, "NN_%s_pos" % name, "NN_%s_pred" % name]) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
261 fields.extend(["NN_Smean_score", "NN_Smean_pred", "NN_D_score", "NN_D_pred"]) |
29 | 262 # HMM results: |
32 | 263 fields.extend( |
264 [ | |
265 "HMM_type", | |
266 "HMM_Cmax_score", | |
267 "HMM_Cmax_pos", | |
268 "HMM_Cmax_pred", | |
269 "HMM_Sprob_score", | |
270 "HMM_Sprob_pred", | |
271 ] | |
272 ) | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
273 out_handle.write("#" + "\t".join(fields) + "\n") |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
274 for temp in temp_files: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
275 data_handle = open(temp) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
276 clean_tabular(data_handle, out_handle) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
277 data_handle.close() |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
278 out_handle.close() |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
279 |
29 | 280 # GFF3: |
7 | 281 if cut_method: |
8 | 282 make_gff(fasta_file, tabular_file, gff3_file, cut_method) |
7 | 283 |
284 clean_up(fasta_files + temp_files) |