annotate variant_effect_predictor/Bio/Search/SearchUtils.pm @ 0:1f6dce3d34e0

Uploaded
author mahtabm
date Thu, 11 Apr 2013 02:01:53 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 Bio::Search::SearchUtils - Utility functions for Bio::Search:: objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 This module is just a collection of subroutines, not an object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 The SearchUtils.pm module is a collection of subroutines used primarily by
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12 Bio::Search::Hit::HitI objects for some of the additional
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 functionality, such as HSP tiling. Right now, the SearchUtils is just a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14 collection of methods, not an object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16 =head1 AUTHOR
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18 Steve Chervitz E<lt>sac@bioperl.orgE<gt>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22 #'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 package Bio::Search::SearchUtils;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 use vars qw($DEBUG);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 $DEBUG = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 =head2 tile_hsps
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 Usage : tile_hsps( $sbjct );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 : This is called automatically by methods in Bio::Search::Hit::GenericHit
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 : that rely on having tiled data.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 Purpose : Collect statistics about the aligned sequences in a set of HSPs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36 : Calculates the following data across all HSPs:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 : -- total alignment length
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38 : -- total identical residues
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 : -- total conserved residues
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40 Returns : n/a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 Argument : A Bio::Search::Hit::HitI object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42 Throws : n/a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 Comments :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 : This method performs more careful summing of data across
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 : all HSPs in the Sbjct object. Only HSPs that are in the same strand
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 : and frame are tiled. Simply summing the data from all HSPs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47 : in the same strand and frame will overestimate the actual
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 : length of the alignment if there is overlap between different HSPs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49 : (often the case).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51 : The strategy is to tile the HSPs and sum over the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 : contigs, collecting data separately from overlapping and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 : non-overlapping regions of each HSP. To facilitate this, the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 : HSP.pm object now permits extraction of data from sub-sections
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55 : of an HSP.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57 : Additional useful information is collected from the results
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 : of the tiling. It is possible that sub-sequences in
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 : different HSPs will overlap significantly. In this case, it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 : is impossible to create a single unambiguous alignment by
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61 : concatenating the HSPs. The ambiguity may indicate the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 : presence of multiple, similar domains in one or both of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63 : aligned sequences. This ambiguity is recorded using the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 : ambiguous_aln() method.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 : This method does not attempt to discern biologically
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 : significant vs. insignificant overlaps. The allowable amount of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 : overlap can be set with the overlap() method or with the -OVERLAP
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 : parameter used when constructing the Hit object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 : For a given hit, both the query and the sbjct sequences are
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72 : tiled independently.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 : -- If only query sequence HSPs overlap,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 : this may suggest multiple domains in the sbjct.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76 : -- If only sbjct sequence HSPs overlap,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 : this may suggest multiple domains in the query.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 : -- If both query & sbjct sequence HSPs overlap,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 : this suggests multiple domains in both.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 : -- If neither query & sbjct sequence HSPs overlap,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81 : this suggests either no multiple domains in either
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 : sequence OR that both sequences have the same
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 : distribution of multiple similar domains.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85 : This method can deal with the special case of when multiple
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 : HSPs exactly overlap.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 : Efficiency concerns:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 : Speed will be an issue for sequences with numerous HSPs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 Bugs : Currently, tile_hsps() does not properly account for
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 : the number of non-tiled but overlapping HSPs, which becomes a problem
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 : as overlap() grows. Large values overlap() may thus lead to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 : incorrect statistics for some hits. For best results, keep overlap()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 : below 5 (DEFAULT IS 2). For more about this, see the "HSP Tiling and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96 : Ambiguous Alignments" section in L<Bio::Search::Hit::GenericHit>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 See Also : L<_adjust_contigs>(), L<Bio::Search::Hit::GenericHit|Bio::Search::Hit::GenericHit>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 #--------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 sub tile_hsps {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 #--------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 my $sbjct = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107 $sbjct->tiled_hsps(1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 $sbjct->gaps('query', 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 $sbjct->gaps('hit', 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 ## Simple summation scheme. Valid if there is only one HSP.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 if( $sbjct->n == 1 or $sbjct->num_hsps == 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 my $hsp = $sbjct->hsp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 $sbjct->length_aln('query', $hsp->length('query'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 $sbjct->length_aln('hit', $hsp->length('sbjct'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 $sbjct->length_aln('total', $hsp->length('total'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 $sbjct->matches( $hsp->matches() );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 $sbjct->gaps('query', $hsp->gaps('query'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 $sbjct->gaps('sbjct', $hsp->gaps('sbjct'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 # print "_tile_hsps(): single HSP, easy stats.\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 # print STDERR "Sbjct: _tile_hsps: summing multiple HSPs\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 $sbjct->length_aln('query', 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 $sbjct->length_aln('sbjct', 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 $sbjct->length_aln('total', 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 $sbjct->matches( 0, 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131 ## More than one HSP. Must tile HSPs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 # print "\nTiling HSPs for $sbjct\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 my($hsp, $qstart, $qstop, $sstart, $sstop);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134 my($frame, $strand, $qstrand, $sstrand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 my(@qcontigs, @scontigs);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136 my $qoverlap = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 my $soverlap = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 my $max_overlap = $sbjct->overlap;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 my $hit_qgaps = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140 my $hit_sgaps = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 my $hit_len_aln = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142 my %start_stop;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 foreach $hsp ($sbjct->hsps()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 # printf " HSP: %s\n%s\n",$hsp->name, $hsp->str('query');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 # printf " Length = %d; Identical = %d; Conserved = %d; Conserved(1-10): %d",$hsp->length, $hsp->length(-TYPE=>'iden'), $hsp->length(-TYPE=>'cons'), $hsp->length(-TYPE=>'cons',-START=>0,-STOP=>10);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 ($qstart, $qstop) = $hsp->range('query');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 ($sstart, $sstop) = $hsp->range('sbjct');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 $frame = $hsp->frame;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 $frame = -1 unless defined $frame;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 ($qstrand, $sstrand) = ($hsp->query->strand,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 $hsp->hit->strand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 # Note: No correction for overlap.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 my ($qgaps, $sgaps) = ($hsp->gaps('query'), $hsp->gaps('hit'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 $hit_qgaps += $qgaps;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 $hit_sgaps += $sgaps;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 $hit_len_aln += $hsp->length;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 ## Collect contigs in the query sequence.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 $qoverlap = &_adjust_contigs('query', $hsp, $qstart, $qstop,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 \@qcontigs, $max_overlap, $frame,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 $qstrand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 ## Collect contigs in the sbjct sequence (needed for domain data and gapped Blast).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 $soverlap = &_adjust_contigs('sbjct', $hsp, $sstart, $sstop,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 \@scontigs, $max_overlap, $frame,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 $sstrand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 ## Collect overall start and stop data for query and sbjct over all HSPs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 if(not defined $start_stop{'qstart'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 $start_stop{'qstart'} = $qstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 $start_stop{'qstop'} = $qstop;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 $start_stop{'sstart'} = $sstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 $start_stop{'sstop'} = $sstop;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 $start_stop{'qstart'} = ($qstart < $start_stop{'qstart'} ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178 $qstart : $start_stop{'qstart'} );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 $start_stop{'qstop'} = ($qstop > $start_stop{'qstop'} ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 $qstop : $start_stop{'qstop'} );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 $start_stop{'sstart'} = ($sstart < $start_stop{'sstart'} ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182 $sstart : $start_stop{'sstart'} );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 $start_stop{'sstop'} = ($sstop > $start_stop{'sstop'} ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 $sstop : $start_stop{'sstop'} );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 # Store the collected data in the Hit object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 $sbjct->gaps('query', $hit_qgaps);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 $sbjct->gaps('hit', $hit_sgaps);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 $sbjct->length_aln('total', $hit_len_aln);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 $sbjct->start('query',$start_stop{'qstart'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 $sbjct->end('query', $start_stop{'qstop'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 $sbjct->start('hit', $start_stop{'sstart'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 $sbjct->end('hit', $start_stop{'sstop'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 ## Collect data across the collected contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 # print "\nQUERY CONTIGS:\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 # print " gaps = $sbjct->{'_gaps_query'}\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 # Account for strand/frame.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 # Strategy: collect data on a per strand+frame basis and save the most significant one.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 my (%qctg_dat);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 foreach(@qcontigs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 # print " query contig: $_->{'start'} - $_->{'stop'}\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 # print " iden = $_->{'iden'}; cons = $_->{'cons'}\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 ($frame, $strand) = ($_->{'frame'}, $_->{'strand'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 $qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_->{'stop'} - $_->{'start'} + 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 $qctg_dat{ "$frame$strand" }->{'qstrand'} = $strand;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 # Find longest contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 my @sortedkeys = reverse sort { $qctg_dat{ $a }->{'length_aln_query'} <=> $qctg_dat{ $b }->{'length_aln_query'} } keys %qctg_dat;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 # Save the largest to the sbjct:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 my $longest = $sortedkeys[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 $sbjct->length_aln('query', $qctg_dat{ $longest }->{'length_aln_query'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 $sbjct->matches($qctg_dat{ $longest }->{'totalIdentical'},
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 $qctg_dat{ $longest }->{'totalConserved'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 $sbjct->strand('query', $qctg_dat{ $longest }->{'qstrand'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226 ## Collect data for sbjct contigs. Important for gapped Blast.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 ## The totalIdentical and totalConserved numbers will be the same
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 ## as determined for the query contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 # print "\nSBJCT CONTIGS:\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 # print " gaps = ", $sbjct->gaps('sbjct'), "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 my (%sctg_dat);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 foreach(@scontigs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 # print " sbjct contig: $_->{'start'} - $_->{'stop'}\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 # print " iden = $_->{'iden'}; cons = $_->{'cons'}\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 ($frame, $strand) = ($_->{'frame'}, $_->{'strand'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 $sctg_dat{ "$frame$strand" }->{'length_aln_sbjct'} += $_->{'stop'} - $_->{'start'} + 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 $sctg_dat{ "$frame$strand" }->{'frame'} = $frame;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 $sctg_dat{ "$frame$strand" }->{'sstrand'} = $strand;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 @sortedkeys = reverse sort { $sctg_dat{ $a }->{'length_aln_sbjct'} <=> $sctg_dat{ $b }->{'length_aln_sbjct'} } keys %sctg_dat;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 # Save the largest to the sbjct:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 $longest = $sortedkeys[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 $sbjct->length_aln('sbjct', $sctg_dat{ $longest }->{'length_aln_sbjct'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 $sbjct->frame( $sctg_dat{ $longest }->{'frame'} );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 $sbjct->strand('hit', $sctg_dat{ $longest }->{'sstrand'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 if($qoverlap) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 if($soverlap) { $sbjct->ambiguous_aln('qs');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 # print "\n*** AMBIGUOUS ALIGNMENT: Query and Sbjct\n\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 else { $sbjct->ambiguous_aln('q');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 # print "\n*** AMBIGUOUS ALIGNMENT: Query\n\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 } elsif($soverlap) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 $sbjct->ambiguous_aln('s');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 # print "\n*** AMBIGUOUS ALIGNMENT: Sbjct\n\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 # Adjust length based on BLAST flavor.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 my $prog = $sbjct->algorithm;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 if($prog eq 'TBLASTN') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 $sbjct->length_aln('sbjct', $sbjct->length_aln('sbjct')/3);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 } elsif($prog eq 'BLASTX' ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 $sbjct->length_aln('query', $sbjct->length_aln('query')/3);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270 } elsif($prog eq 'TBLASTX') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 $sbjct->length_aln('query', $sbjct->length_aln('query')/3);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 $sbjct->length_aln('sbjct', $sbjct->length_aln('sbjct')/3);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 =head2 _adjust_contigs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 Usage : n/a; called automatically during object construction.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 Purpose : Builds HSP contigs for a given BLAST hit.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 : Utility method called by _tile_hsps()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 Argument :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 Throws : Exceptions propagated from Bio::Search::Hit::BlastHSP::matches()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 : for invalid sub-sequence ranges.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 Status : Experimental
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 Comments : This method does not currently support gapped alignments.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 : Also, it does not keep track of the number of HSPs that
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 : overlap within the amount specified by overlap().
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 : This will lead to significant tracking errors for large
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 : overlap values.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 See Also : L<tile_hsps>(), L<Bio::Search::Hit::BlastHSP::matches|Bio::Search::Hit::BlastHSP>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 #-------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 sub _adjust_contigs {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 #-------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 my ($seqType, $hsp, $start, $stop, $contigs_ref,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 $max_overlap, $frame, $strand) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 my $overlap = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 my ($numID, $numCons);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 # printf STDERR "Testing $seqType data: HSP (%s); $start, $stop, strand=$strand, frame=$frame\n", $hsp->$seqType()->seq_id if $DEBUG;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 foreach ( @$contigs_ref) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 # print STDERR " Contig: $_->{'start'} - $_->{'stop'}, strand=$_->{'strand'}, frame=$_->{'frame'}, iden= $_->{'iden'}, cons= $_->{'cons'}\n" if $DEBUG;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 # Don't merge things unless they have matching strand/frame.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 next unless ($_->{'frame'} == $frame and $_->{'strand'} == $strand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 ## Test special case of a nested HSP. Skip it.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 if($start >= $_->{'start'} and $stop <= $_->{'stop'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 # print STDERR "----> Nested HSP. Skipping.\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 $overlap = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 ## Test for overlap at beginning of contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 if($start < $_->{'start'} and $stop > ($_->{'start'} + $max_overlap)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 # print STDERR "----> Overlaps beg: existing beg,end: $_->{'start'},$_->{'stop'}, new beg,end: $start,$stop\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 # Collect stats over the non-overlapping region.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 eval {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 ($numID, $numCons) = $hsp->matches(-SEQ =>$seqType,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327 -START =>$start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 -STOP =>$_->{'start'}-1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 if($@) { warn "\a\n$@\n"; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 $_->{'start'} = $start; # Assign a new start coordinate to the contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 $_->{'iden'} += $numID; # and add new data to #identical, #conserved.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 $_->{'cons'} += $numCons;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 $overlap = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 ## Test for overlap at end of contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 if($stop > $_->{'stop'} and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 $start < ($_->{'stop'} - $max_overlap)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 # print STDERR "----> Overlaps end: existing beg,end: $_->{'start'},$_->{'stop'}, new beg,end: $start,$stop\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 # Collect stats over the non-overlapping region.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 eval {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 ($numID,$numCons) = $hsp->matches(-SEQ =>$seqType,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346 -START =>$_->{'stop'},
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 -STOP =>$stop);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 if($@) { warn "\a\n$@\n"; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 $_->{'stop'} = $stop; # Assign a new stop coordinate to the contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 $_->{'iden'} += $numID; # and add new data to #identical, #conserved.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 $_->{'cons'} += $numCons;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 $overlap = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 $overlap && do {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358 # print STDERR " New Contig data:\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 # print STDERR " Contig: $_->{'start'} - $_->{'stop'}, iden= $_->{'iden'}, cons= $_->{'cons'}\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 ## If there is no overlap, add the complete HSP data.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 !$overlap && do {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 # print STDERR "No overlap. Adding new contig.\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 ($numID,$numCons) = $hsp->matches(-SEQ=>$seqType);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367 push @$contigs_ref, {'start'=>$start, 'stop'=>$stop,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368 'iden'=>$numID, 'cons'=>$numCons,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 'strand'=>$strand, 'frame'=>$frame};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371 $overlap;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374 =head2 get_exponent
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376 Usage : &get_exponent( number );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 Purpose : Determines the power of 10 exponent of an integer, float,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378 : or scientific notation number.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379 Example : &get_exponent("4.0e-206");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 : &get_exponent("0.00032");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381 : &get_exponent("10.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 : &get_exponent("1000.0");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 : &get_exponent("e+83");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 Argument : Float, Integer, or scientific notation number
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385 Returns : Integer representing the exponent part of the number (+ or -).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 : If argument == 0 (zero), return value is "-999".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387 Comments : Exponents are rounded up (less negative) if the mantissa is >= 5.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388 : Exponents are rounded down (more negative) if the mantissa is <= -5.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392 #------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 sub get_exponent {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 #------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395 my $data = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 my($num, $exp) = split /[eE]/, $data;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 if( defined $exp) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400 $num = 1 if not $num;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 $num >= 5 and $exp++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 $num <= -5 and $exp--;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 } elsif( $num == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 $exp = -999;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405 } elsif( not $num =~ /\./) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 $exp = CORE::length($num) -1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 $exp = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409 $num .= '0' if $num =~ /\.$/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410 my ($c);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 my $rev = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 if($num !~ /^0/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413 $num = reverse($num);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 $rev = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416 do { $c = chop($num);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 $c == 0 && $exp++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418 } while( $c ne '.');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420 $exp = -$exp if $num == 0 and not $rev;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 $exp -= 1 if $rev;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 return $exp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426 =head2 collapse_nums
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 Usage : @cnums = collapse_nums( @numbers );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429 Purpose : Collapses a list of numbers into a set of ranges of consecutive terms:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430 : Useful for condensing long lists of consecutive numbers.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 : EXPANDED:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432 : 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 : COLLAPSED:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434 : 1-6 10 12-15 17 18 20-22 24 26 30-32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435 Argument : List of numbers sorted numerically.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 Returns : List of numbers mixed with ranges of numbers (see above).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437 Throws : n/a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439 See Also : L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443 #------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444 sub collapse_nums {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445 #------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 # This is probably not the slickest connectivity algorithm, but will do for now.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 my @a = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448 my ($from, $to, $i, @ca, $consec);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450 $consec = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 for($i=0; $i < @a; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452 not $from and do{ $from = $a[$i]; next; };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453 if($a[$i] == $a[$i-1]+1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454 $to = $a[$i];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455 $consec++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457 if($consec == 1) { $from .= ",$to"; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458 else { $from .= $consec>1 ? "\-$to" : ""; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459 push @ca, split(',', $from);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460 $from = $a[$i];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461 $consec = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462 $to = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465 if(defined $to) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466 if($consec == 1) { $from .= ",$to"; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 else { $from .= $consec>1 ? "\-$to" : ""; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469 push @ca, split(',', $from) if $from;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471 @ca;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475 =head2 strip_blast_html
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477 Usage : $boolean = &strip_blast_html( string_ref );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 : This method is exported.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479 Purpose : Removes HTML formatting from a supplied string.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480 : Attempts to restore the Blast report to enable
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481 : parsing by Bio::SearchIO::blast.pm
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482 Returns : Boolean: true if string was stripped, false if not.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483 Argument : string_ref = reference to a string containing the whole Blast
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 : report containing HTML formatting.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485 Throws : Croaks if the argument is not a scalar reference.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486 Comments : Based on code originally written by Alex Dong Li
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487 : (ali@genet.sickkids.on.ca).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488 : This method does some Blast-specific stripping
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489 : (adds back a '>' character in front of each HSP
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490 : alignment listing).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 : THIS METHOD IS VERY SENSITIVE TO BLAST FORMATTING CHANGES!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494 : Removal of the HTML tags and accurate reconstitution of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 : non-HTML-formatted report is highly dependent on structure of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496 : the HTML-formatted version. For example, it assumes that first
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497 : line of each alignment section (HSP listing) starts with a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498 : <a name=..> anchor tag. This permits the reconstruction of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499 : original report in which these lines begin with a ">".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500 : This is required for parsing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
501 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
502 : If the structure of the Blast report itself is not intended to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
503 : be a standard, the structure of the HTML-formatted version
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
504 : is even less so. Therefore, the use of this method to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
505 : reconstitute parsable Blast reports from HTML-format versions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
506 : should be considered a temorary solution.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
507
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
508 See Also : B<Bio::Search::Processor::BlastIO::new()>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
509
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
510 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
511
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
512 #--------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
513 sub strip_blast_html {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
514 #--------------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
515 # This may not best way to remove html tags. However, it is simple.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
516 # it won't work under following conditions:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
517 # 1) if quoted > appears in a tag (does this ever happen?)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
518 # 2) if a tag is split over multiple lines and this method is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
519 # used to process one line at a time.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
520
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
521 my ($string_ref) = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
522
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
523 ref $string_ref eq 'SCALAR' or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
524 croak ("Can't strip HTML: ".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
525 "Argument is should be a SCALAR reference not a ${\ref $string_ref}\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
526
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
527 my $str = $$string_ref;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
528 my $stripped = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
529
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
530 # Removing "<a name =...>" and adding the '>' character for
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
531 # HSP alignment listings.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
532 $str =~ s/(\A|\n)<a name ?=[^>]+> ?/>/sgi and $stripped = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
533
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
534 # Removing all "<>" tags.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
535 $str =~ s/<[^>]+>|&nbsp//sgi and $stripped = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
536
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
537 # Re-uniting any lone '>' characters.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
538 $str =~ s/(\A|\n)>\s+/\n\n>/sgi and $stripped = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
539
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
540 $$string_ref = $str;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
541 $stripped;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
542 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
543
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
544
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
545 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
546
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
547