Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/tmhmm2.py @ 33:4fcc441269f5 draft
"This is v0.2.12 with black formating and Python 3 next fix etc"
author | peterjc |
---|---|
date | Thu, 17 Jun 2021 08:33:07 +0000 |
parents | 20da7f48b56f |
children | 7a2e20baacee |
rev | line source |
---|---|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Wrapper for TMHMM v2.0 for use in Galaxy. |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
7 | 4 This script takes exactly three command line arguments - number of threads, |
5 an input protein FASTA filename, and an output tabular filename. It then | |
6 calls the standalone TMHMM v2.0 program (not the webservice) requesting | |
7 the short output (one line per protein). | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 |
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
9 The first major feature is cleaning up the tabular output. The short form raw |
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
10 output from TMHMM v2.0 looks like this (six columns tab separated): |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 |
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
17 If there are any additional 'comment' lines starting with the hash (#) |
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
18 character these are ignored by this script. |
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
19 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 In order to make it easier to use in Galaxy, this wrapper script simplifies |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 this to remove the redundant tags, and instead adds a comment line at the |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 top with the column names: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 |
29 | 24 #ID len ExpAA First60 PredHel Topology |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 gi|2781234|pdb|1JLY|B 304 0.01 60 0.00 0 o |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0 0.00 0 o |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 The second major potential feature is taking advantage of multiple cores |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 into chunks and running multiple copies of TMHMM in parallel. I would normally |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 use Python's multiprocessing library in this situation but it requires at |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4. |
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
35 |
7 | 36 Note that this is somewhat redundant with job-splitting available in Galaxy |
37 itself (see the SignalP XML file for settings). | |
38 | |
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
39 Also tmhmm2 can fail without returning an error code, for example if run on a |
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
40 64 bit machine with only the 32 bit binaries installed. This script will spot |
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
41 when there is no output from tmhmm2, and raise an error. |
32 | 42 """ # noqa: E501 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
43 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
44 from __future__ import print_function |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
45 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 import os |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
47 import sys |
7 | 48 import tempfile |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
49 |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
50 from seq_analysis_utils import run_jobs, split_fasta, thread_count |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
51 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
52 FASTA_CHUNK = 500 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
53 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
54 if "-v" in sys.argv or "--version" in sys.argv: |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
55 sys.exit("TMHMM wrapper version 0.0.16") |
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
56 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 if len(sys.argv) != 4: |
32 | 58 sys.exit( |
59 "Require three arguments, number of threads (int), input protein " | |
60 "FASTA file & output tabular file" | |
61 ) | |
7 | 62 |
63 num_threads = thread_count(sys.argv[1], default=4) | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
64 fasta_file = sys.argv[2] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
65 tabular_file = sys.argv[3] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 |
7 | 67 tmp_dir = tempfile.mkdtemp() |
68 | |
29 | 69 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
70 def clean_tabular(raw_handle, out_handle): |
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
71 """Clean up tabular TMHMM output, returns output line count.""" |
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
72 count = 0 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 for line in raw_handle: |
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
74 if not line.strip() or line.startswith("#"): |
29 | 75 # Ignore any blank lines or comment lines |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
76 continue |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 parts = line.rstrip("\r\n").split("\t") |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
78 try: |
29 | 79 identifier, length, exp_aa, first60, predhel, topology = parts |
80 except ValueError: | |
81 assert len(parts) != 6 | |
82 sys.exit("Bad line: %r" % line) | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
83 assert length.startswith("len="), line |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 length = length[4:] |
29 | 85 assert exp_aa.startswith("ExpAA="), line |
86 exp_aa = exp_aa[6:] | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 assert first60.startswith("First60="), line |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
88 first60 = first60[8:] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 assert predhel.startswith("PredHel="), line |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
90 predhel = predhel[8:] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
91 assert topology.startswith("Topology="), line |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
92 topology = topology[9:] |
32 | 93 out_handle.write( |
94 "%s\t%s\t%s\t%s\t%s\t%s\n" | |
95 % (identifier, length, exp_aa, first60, predhel, topology) | |
96 ) | |
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
97 count += 1 |
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
98 return count |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
100 |
29 | 101 # Note that if the input FASTA file contains no sequences, |
102 # split_fasta returns an empty list (i.e. zero temp files). | |
7 | 103 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK) |
29 | 104 temp_files = [f + ".out" for f in fasta_files] |
32 | 105 jobs = [ |
106 "tmhmm -short %s > %s" % (fasta, temp) | |
107 for fasta, temp in zip(fasta_files, temp_files) | |
108 ] | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
109 |
29 | 110 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
111 def clean_up(file_list): |
29 | 112 """Remove temp files, and if possible the temp directory.""" |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 for f in file_list: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 if os.path.isfile(f): |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
115 os.remove(f) |
7 | 116 try: |
117 os.rmdir(tmp_dir) | |
29 | 118 except Exception: |
7 | 119 pass |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
120 |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
121 |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 if len(jobs) > 1 and num_threads > 1: |
29 | 123 # A small "info" message for Galaxy to show the user. |
30
6d9d7cdf00fc
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
29
diff
changeset
|
124 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
125 results = run_jobs(jobs, num_threads) |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
126 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
127 error_level = results[cmd] |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
128 if error_level: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
129 try: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
130 output = open(temp).readline() |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
131 except IOError: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
132 output = "" |
7 | 133 clean_up(fasta_files + temp_files) |
32 | 134 sys.exit( |
135 "One or more tasks failed, e.g. %i from %r gave:\n%s" | |
136 % (error_level, cmd, output), | |
137 error_level, | |
138 ) | |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
139 del results |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
140 del jobs |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
141 |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
142 out_handle = open(tabular_file, "w") |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
143 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n") |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
144 for temp in temp_files: |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
145 data_handle = open(temp) |
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
146 count = clean_tabular(data_handle, out_handle) |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
147 data_handle.close() |
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
148 if not count: |
7 | 149 clean_up(fasta_files + temp_files) |
29 | 150 sys.exit("No output from tmhmm2") |
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
151 out_handle.close() |
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
152 |
7 | 153 clean_up(fasta_files + temp_files) |