annotate tools/protein_analysis/tmhmm2.py @ 0:a2eeeaa6f75e

Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 17:37:26 -0400
parents
children 9a8a7f680dd6
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for TMHMM v2.0 for use in Galaxy.
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly two command line arguments - an input protein FASTA
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 filename and an output tabular filename. It then calls the standalone TMHMM
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 v2.0 program (not the webservice) requesting the short output (one line per
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 protein).
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9 First major feature is cleaning up the tabular output. The raw output from
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 TMHMM v2.0 looks like this (six columns tab separated):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 In order to make it easier to use in Galaxy, this wrapper script simplifies
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 this to remove the redundant tags, and instead adds a comment line at the
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 top with the column names:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 #ID len ExpAA First60 PredHel Topology
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 gi|2781234|pdb|1JLY|B 304 0.01 60 0.00 0 o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0 0.00 0 o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 The second major potential feature is taking advantage of multiple cores
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 into chunks and running multiple copies of TMHMM in parallel. I would normally
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 use Python's multiprocessing library in this situation but it requires at
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 """
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 import sys
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 import os
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 from seq_analysis_utils import stop_err, split_fasta, run_jobs
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 FASTA_CHUNK = 500
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 if len(sys.argv) != 4:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 stop_err("Require three arguments, number of threads (int), input protein FASTA file & output tabular file")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 try:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 num_threads = int(sys.argv[1])
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 except:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 num_threads = 0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 if num_threads < 1:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 stop_err("Threads argument %s is not a positive integer" % sys.argv[1])
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 fasta_file = sys.argv[2]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 tabular_file = sys.argv[3]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 def clean_tabular(raw_handle, out_handle):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 """Clean up tabular TMHMM output."""
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 for line in raw_handle:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 if not line:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 continue
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 parts = line.rstrip("\r\n").split("\t")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 try:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 identifier, length, expAA, first60, predhel, topology = parts
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 except:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 assert len(parts)!=6
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 stop_err("Bad line: %r" % line)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 assert length.startswith("len="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 length = length[4:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 assert expAA.startswith("ExpAA="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 expAA = expAA[6:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 assert first60.startswith("First60="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 first60 = first60[8:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 assert predhel.startswith("PredHel="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 predhel = predhel[8:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 assert topology.startswith("Topology="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 topology = topology[9:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n" \
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 % (identifier, length, expAA, first60, predhel, topology))
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 temp_files = [f+".out" for f in fasta_files]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 jobs = ["tmhmm %s > %s" % (fasta, temp)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 for fasta, temp in zip(fasta_files, temp_files)]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 def clean_up(file_list):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 for f in file_list:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 if os.path.isfile(f):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 os.remove(f)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 if len(jobs) > 1 and num_threads > 1:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 #A small "info" message for Galaxy to show the user.
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 results = run_jobs(jobs, num_threads)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89 error_level = results[cmd]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 if error_level:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 try:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 output = open(temp).readline()
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 except IOError:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 output = ""
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 clean_up(fasta_files)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 clean_up(temp_files)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 stop_err("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 error_level)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 del results
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 del jobs
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 out_handle = open(tabular_file, "w")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 for temp in temp_files:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 data_handle = open(temp)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 clean_tabular(data_handle, out_handle)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 data_handle.close()
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 out_handle.close()
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 clean_up(fasta_files)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 clean_up(temp_files)