annotate blast_report.py @ 31:11f622d60501 draft default tip

Uploaded
author dfornika
date Tue, 03 Mar 2020 10:55:25 +0000
parents 1d6a2561e05e
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
1 #!/usr/bin/env python
3
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
2
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
3 from __future__ import print_function
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
4
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
5 import argparse
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
6 import re
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
7 import sys
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
8
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
9 from Cheetah.Template import Template
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
10
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
11
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
12 def stop_err(msg):
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
13 sys.stderr.write("%s\n" % msg)
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
14 sys.exit(1)
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
15
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
16
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
17 class BLASTBin:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
18 def __init__(self, label, file):
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
19 self.label = label
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
20 self.dict = {}
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
21
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
22 file_in = open(file)
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
23 for line in file_in:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
24 self.dict[line.rstrip().split('.')[0]] = ''
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
25 file_in.close()
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
26
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
27 def __str__(self):
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
28 return "label: %s dict: %s" % (self.label, str(self.dict))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
29
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
30
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
31 class BLASTQuery:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
32 def __init__(self, query_id):
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
33 self.query_id = query_id
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
34 self.matches = []
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
35 self.match_accessions = {}
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
36 self.bins = {} # {bin(label):[match indexes]}
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
37 self.pident_filtered = 0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
38 self.kw_filtered = 0
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
39 self.kw_filtered_breakdown = {} # {kw:count}
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
40
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
41 def __str__(self):
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
42 format_string = "\t".join([
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
43 "query_id: %s",
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
44 "len(matches): %s",
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
45 "bins (labels only): %s",
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
46 "pident_filtered: %s",
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
47 "kw_filtered: %s",
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
48 "kw_filtered_breakdown: %s"
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
49 ])
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
50 return format_string \
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
51 % (self.query_id,
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
52 str(len(self.matches)),
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
53 str([bin.label for bin in bins]),
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
54 str(self.pident_filtered),
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
55 str(self.kw_filtered),
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
56 str(self.kw_filtered_breakdown))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
57
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
58
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
59 class BLASTMatch:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
60 def __init__(self, subject_acc, subject_descr, score, p_cov, p_ident, subject_bins):
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
61 self.subject_acc = subject_acc
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
62 self.subject_descr = subject_descr
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
63 self.score = score
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
64 self.p_cov = p_cov
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
65 self.p_ident = p_ident
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
66 self.bins = subject_bins
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
67
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
68 def __str__(self):
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
69 return "subject_acc: %s subject_descr: %s score: %s p-cov: %s p-ident: %s" \
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
70 % (self.subject_acc,
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
71 self.subject_descr,
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
72 str(self.score),
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
73 str(round(self.p_cov, 2)),
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
74 str(round(self.p_ident, 2)))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
75
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
76
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
77 # PARSE OPTIONS AND ARGUMENTS
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
78 parser = argparse.ArgumentParser()
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
79
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
80 parser.add_argument('-f', '--filter-keywords',
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
81 dest='filter_keywords',
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
82 )
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
83 parser.add_argument('-i', '--min-identity',
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
84 dest='min_identity',
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
85 )
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
86 parser.add_argument('-b', '--bins',
23
8d92b3b58f5e Uploaded
dfornika
parents: 12
diff changeset
87 dest='bins',
8d92b3b58f5e Uploaded
dfornika
parents: 12
diff changeset
88 action='append',
8d92b3b58f5e Uploaded
dfornika
parents: 12
diff changeset
89 nargs='+'
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
90 )
3
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
91 parser.add_argument('-r', '--discard-redundant',
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
92 dest='discard_redundant',
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
93 default=False,
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
94 action='store_true'
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
95 )
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
96 parser.add_argument('input_tab')
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
97 parser.add_argument('cheetah_tmpl')
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
98 parser.add_argument('output_html')
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
99 parser.add_argument('output_tab')
23
8d92b3b58f5e Uploaded
dfornika
parents: 12
diff changeset
100
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
101 args = parser.parse_args()
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
102
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
103 # BINS
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
104 bins = []
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
105 if args.bins is not None:
26
e2cab62e1943 Uploaded
dfornika
parents: 23
diff changeset
106 for bin in args.bins:
28
7caa67bdcdf0 Uploaded
dfornika
parents: 26
diff changeset
107 bins.append(BLASTBin(bin[0], bin[1]))
26
e2cab62e1943 Uploaded
dfornika
parents: 23
diff changeset
108
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
109 print('database bins: %s' % str([bin.label for bin in bins]))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
110
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
111 # FILTERS
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
112 filter_pident = 0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
113 filter_kws = []
9
2cf43f15a971 Uploaded
dfornika
parents: 8
diff changeset
114 if args.filter_keywords:
2cf43f15a971 Uploaded
dfornika
parents: 8
diff changeset
115 filter_kws = args.filter_keywords.split(',')
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
116 print('minimum percent identity: %s filter_kws: %s' % (str(args.min_identity), str(filter_kws)))
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
117
3
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
118 if args.discard_redundant:
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
119 print('Throwing out redundant hits...')
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
120
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
121
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
122 PIDENT_COL = 2
10
2b5862fee78e Uploaded
dfornika
parents: 9
diff changeset
123 DESCR_COL = 24
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
124 SUBJ_ID_COL = 12
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
125 SCORE_COL = 11
10
2b5862fee78e Uploaded
dfornika
parents: 9
diff changeset
126 PCOV_COL = 25
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
127 queries = []
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
128 current_query = ''
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
129 output_tab = open(args.output_tab, 'w')
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
130
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
131 with open(args.input_tab) as input_tab:
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
132 for line in input_tab:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
133 cols = line.split('\t')
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
134 if cols[0] != current_query:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
135 current_query = cols[0]
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
136 queries.append(BLASTQuery(current_query))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
137
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
138 try:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
139 accs = cols[SUBJ_ID_COL].split('|')[1::2][1::2]
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
140 except IndexError as e:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
141 stop_err("Problem with splitting:" + cols[SUBJ_ID_COL])
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
142
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
143 # keep best (first) hit only for each query and accession id.
3
386a88793078 Uploaded
dfornika
parents: 0
diff changeset
144 if args.discard_redundant:
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
145 if accs[0] in queries[-1].match_accessions:
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
146 continue # don't save the result and skip to the next
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
147 else:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
148 queries[-1].match_accessions[accs[0]] = ''
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
149
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
150 p_ident = float(cols[PIDENT_COL])
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
151 # FILTER BY PIDENT
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
152 if p_ident < filter_pident: # if we are not filtering, filter_pident == 0 and this will never evaluate to True
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
153 queries[-1].pident_filtered += 1
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
154 continue
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
155
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
156 descrs = cols[DESCR_COL]
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
157 # FILTER BY KEY WORDS
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
158 filter_by_kw = False
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
159 for kw in filter_kws:
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
160 kw = kw.strip()
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
161 if kw != '' and re.search(kw, descrs, re.IGNORECASE):
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
162 filter_by_kw = True
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
163 try:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
164 queries[-1].kw_filtered_breakdown[kw] += 1
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
165 except Exception as e:
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
166 queries[-1].kw_filtered_breakdown[kw] = 1
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
167 if filter_by_kw: # if we are not filtering, for loop will not be entered and this will never be True
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
168 queries[-1].kw_filtered += 1
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
169 continue
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
170 descr = descrs.split(';')[0]
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
171
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
172 # ATTEMPT BIN
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
173 subj_bins = []
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
174 for bin in bins: # if we are not binning, bins = [] so for loop not entered
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
175 for acc in accs:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
176 if acc.split('.')[0] in bin.dict:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
177 try:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
178 queries[-1].bins[bin.label].append(len(queries[-1].matches))
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
179 except Exception as e:
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
180 queries[-1].bins[bin.label] = [len(queries[-1].matches)]
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
181 subj_bins.append(bin.label)
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
182 break # this result has been binned to this bin so break
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
183 acc = accs[0]
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
184
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
185 score = int(float(cols[SCORE_COL]))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
186 p_cov = float(cols[PCOV_COL])
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
187
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
188 # SAVE RESULT
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
189 queries[-1].matches.append(
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
190 BLASTMatch(acc, descr, score, p_cov, p_ident, subj_bins)
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
191 )
30
1d6a2561e05e Uploaded
dfornika
parents: 29
diff changeset
192 output_tab.write(line)
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
193 input_tab.close()
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
194 output_tab.close()
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
195
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
196 '''
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
197 for query in queries:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
198 print(query)
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
199 for match in query.matches:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
200 print(' %s' % str(match))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
201 for bin in query.bins:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
202 print(' bin: %s' % bin)
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
203 for x in query.bins[bin]:
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
204 print(' %s' % str(query.matches[x]))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
205 '''
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
206
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
207 namespace = {'queries': queries}
7
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
208 html = Template(file=args.cheetah_tmpl, searchList=[namespace])
445a1923bb97 Uploaded
dfornika
parents: 5
diff changeset
209 out_html = open(args.output_html, 'w')
0
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
210 out_html.write(str(html))
5dfd84907521 planemo upload for repository https://github.com/public-health-bioinformatics/galaxy_tools/blob/master/tools/blast_report_basic commit bc359460bb66db7946cc68ccbd47cd479624c4a1-dirty
dfornika
parents:
diff changeset
211 out_html.close()