annotate variant_effect_predictor/Bio/EnsEMBL/Utils/CigarString.pm @ 2:a5976b2dce6f

changing defualt values for ensembl database
author mahtabm
date Thu, 11 Apr 2013 17:15:42 +1000
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 =head1 LICENSE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 Copyright (c) 1999-2012 The European Bioinformatics Institute and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 Genome Research Limited. All rights reserved.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 This software is distributed under a modified Apache license.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 For license details, please see
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 http://www.ensembl.org/info/about/code_licence.html
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 =head1 CONTACT
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 Please email comments or questions to the public Ensembl
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14 developers list at <dev@ensembl.org>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16 Questions may also be sent to the Ensembl help desk at
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 <helpdesk@ensembl.org>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19 =head1 AUTHOR
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21 Juguang Xiao <juguang@tll.org.sg>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 Bio::EnsEMBL::Utils::CigarString, a utilites module to generate cigar
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 strings
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 Sequence alignment hits were previously stored within the core database
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 as ungapped alignments. This imposed 2 major constraints on alignments:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 a) alignments for a single hit record would require multiple rows in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36 database, and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 b) it was not possible to accurately retrieve the exact original
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38 alignment.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40 Therefore, in the new branch sequence alignments are now stored as
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 ungapped alignments in the cigar line format (where CIGAR stands for
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42 Concise Idiosyncratic Gapped Alignment Report).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 In the cigar line format alignments are sotred as follows:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 M: Match
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47 D: Deletino
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 I: Insertion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 An example of an alignment for a hypthetical protein match is shown
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51 below:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 Query: 42 PGPAGLP----GSVGLQGPRGLRGPLP-GPLGPPL...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55 PG P G GP R PLGP
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 Sbjct: 1672 PGTP*TPLVPLGPWVPLGPSSPR--LPSGPLGPTD...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 protein_align_feature table as the following cigar line:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 7M4D12M2I2MD7M
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 package Bio::EnsEMBL::Utils::CigarString;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 use Bio::EnsEMBL::Root;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70 @ISA = qw(Bio::EnsEMBL::Root);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72 =head2 split_hsp
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 Name : split_hsp (this name is derived from the original sub in BlastWorn)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 Usage : my $hsp; # a ready Bio::Search::HSP::GenericHSP object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76 my $factory = new Bio::EnsEMBL::Utils::CigarString;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 my $cigar_string = $factory->split_hsp($hsp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 Function: generate cigar string.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 Argument: a HSP object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81 Returns : a text string.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85 sub split_hsp {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 my ($self, $hsp) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 $self->throw("a defined object needed") unless($hsp && defined($hsp));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 unless(ref($hsp) && $hsp->isa('Bio::Search::HSP::GenericHSP')){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90 $self->throw("a HSP object needed");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 my ($qtype, $htype) = $self->_findTypes($hsp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 my ($qstrand, $hstrand) = $self->_findStrands($hsp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 my ($qinc, $hinc) = $self->_findIncrements($qstrand,$hstrand,$qtype,$htype);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 my @gaps = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 my @qchars = split(//, $hsp->query_string);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99 my @hchars = split(//, $hsp->hit_string);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 my $qstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101 if($qstrand == 1){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 $qstart = $hsp->query->start;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 }elsif($qstart == -1){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 $qstart = $hsp->query->end;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 }else{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 $self->warn("[$qstart], invalid strand value on query");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107 $qstart = $hsp->query->start;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 # Is this a SearchIO's bug???
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 my $hstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 if($hstrand == 1){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 $hstart = $hsp->subject->start;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 }elsif($hstrand != -1){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 $hstart = $hsp->subject->end;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 }else{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 $self->throw("[$hstart], invalid strand value on subject");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 my $qend = $qstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 my $hend = $hstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 my $count = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 my $found = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 my @align_coordinates = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 while($count <= $#qchars){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 if($qchars[$count] ne '-' && $hchars[$count] ne '-') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 $qend += $qinc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 $hend += $hinc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 $found = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131 }else{ # gapped region
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 push(@align_coordinates, [$qstart, $hstart]) if($found == 1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134 $qstart = $qend;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 $qstart += $qinc if($qchars[$count] ne '-');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 $hstart = $hend;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 $hstart += $hinc if($hchars[$count] ne '-');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140 $qend = $qstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 $hend = $hstart;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142 $found = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 $count++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 if($found){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 push(@align_coordinates, [$qstart, $hstart]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 my $cigar_string = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 my $last = $#align_coordinates;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 if($last >= 0){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 for(my $i=0; $i<$last; $i++){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 my $q_this_start = $align_coordinates[$i]->[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 my $q_next_start = $align_coordinates[$i+1]->[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 my $q_length = ($q_next_start-$q_this_start-1)*$qinc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 $q_length = abs($q_length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159 my $h_this_start = $align_coordinates[$i]->[1];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 my $h_next_start = $align_coordinates[$i+1]->[1];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 my $h_length = ($h_next_start-$h_this_start-1)*$hinc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 $h_length = abs($h_length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 my $diff = $q_length - $h_length;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 if($diff > 0){ # Insertion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 $cigar_string .= $diff unless($diff == 1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 $cigar_string .= 'I';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 }elsif($diff < 0){ # Deletion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 $cigar_string .= -$diff unless($diff == -1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 $cigar_string .= 'D';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 }else{ # e.g $diff == 0, Match
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 $cigar_string .= $q_length unless($q_length == 1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 $cigar_string .= 'M';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 } # for
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 } # if
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 return $cigar_string;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 sub _findStrands {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 my ($self,$hsp) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 my $qstrand = $hsp->query->strand;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 unless($qstrand == 1 || $qstrand == -1){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 $self->warn("query's strand value is neither 1 or -1");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 $qstrand = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192 my $hstrand = $hsp->subject->strand;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 unless($hstrand == 1 || $hstrand == -1){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 $self->warn("subject's strand value is neither 1 or -1");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 $hstrand = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 return ( $qstrand, $hstrand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 sub _findTypes {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 my ($self,$hsp) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 my $type1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 my $type2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 my $len1 = $hsp->query->length();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 my $len2 = $hsp->subject->length();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 if ($len1/$len2 > 2) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 $type1 = 'dna';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 $type2 = 'pep';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 } elsif ($len2/$len1 > 2) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 $type1 = 'pep';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 $type2 = 'dna';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 $type1 = 'dna';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 $type2 = 'dna';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 return ($type1,$type2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 sub _findIncrements {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 my ($self,$qstrand,$hstrand,$qtype,$htype) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226 my $qinc = 1 * $qstrand;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 my $hinc = 1 * $hstrand;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229 if ($qtype eq 'dna' && $htype eq 'pep') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 $qinc = 3 * $qinc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 if ($qtype eq 'pep' && $htype eq 'dna') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 $hinc = 3 * $hinc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 return ($qinc,$hinc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 # This is a core logic of cigar string. The finite state machine theory is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 # apply. See the below table, x-axis represents the input, with 3 options:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 # (+/+) -- Both current query and subject bases are non-gap. Match
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242 # (-/+) -- The current query base is gap, but subject not. Deletion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 # (+/-) -- The current subject base is gap, but query not. Insertion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244 # While the y-axis means the current state with letter 'M', 'D', 'I'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 # The content of this table is the action taken in response of the input and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 # the current state.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 # R remain the state, counter increment.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 # G;X generate the cigar line based on the current state and counter;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 # clear the counter to zero and change to the state X
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 # || +/+ | -/+ | +/- |
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 # -------+----------------------+
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 # M || R | G;D | G;I |
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 # ------------------------------+
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 # D || G;M | R | G;I |
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 # ------------------------------+
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 # I || G;M | G;D | R |
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 =head2 generate_cigar_string
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 Name : generate_cigar_string
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 Usage: $cigar_string = $self->generate_cigar_string(\@qchars, \@hchars);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 Function: generate the cigar string for a piece of alignment.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 Args: 2 array references. The lengths of 2 arrays are the same
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 Return: a cigar string
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 # Developer's Note: The method is originally abstracted from the concept of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 # cigar string. It only asks the essential information of 2 sequence characters
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 # of the alignment, while the BlastWorn::split_HSP asks more unused information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 # for cigar string, which is useful to form align_coordinates. - Juguang
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 my ($count, $state); # strictly only used in the following 2 subs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 sub generate_cigar_string {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 # my ($self, $qstart, $hstart, $qinc, $hinc, $qchars_ref, $hchars_ref) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 my ($self, $qchars_ref, $hchars_ref) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 my @qchars = @{$qchars_ref};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 my @hchars = @{$hchars_ref};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 unless(scalar(@qchars) == scalar(@hchars)){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 $self->throw("two sequences are not equal in lengths");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 $count = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 $state = 'M'; # the current state of gaps, (M, D, I)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 my $cigar_string = '';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295 for(my $i=0; $i <= $#qchars; $i++){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 my $qchar = $qchars[$i];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 my $hchar = $hchars[$i];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 if($qchar ne '-' && $hchar ne '-'){ # Match
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 $cigar_string .= $self->_sub_cigar_string('M');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 }elsif($qchar eq '-'){ # Deletion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 $cigar_string .= $self->_sub_cigar_string('D');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 }elsif($hchar eq '-'){ # Insertion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 $cigar_string .= $self->_sub_cigar_string('I');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 }else{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 $self->throw("Impossible state that 2 gaps on each seq aligned");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 $cigar_string .= $self->_sub_cigar_string('X'); # not forget the tail.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 return $cigar_string;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 sub _sub_cigar_string {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313 my ($self, $new_state) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 my $sub_cigar_string = '';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 if($state eq $new_state){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 $count++; # Remain the state and increase the counter
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 }else{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 $sub_cigar_string .= $count unless $count == 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 $sub_cigar_string .= $state;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320 $count = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 $state = $new_state;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 return $sub_cigar_string;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 =head2 generate_cigar_string_by_hsp
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 Name : generate_cigar_string_by_hsp
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 Usage : my $hsp; # a ready GenericHSP object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 my $cigar_string = $self->generate_cigar_string_by_hsp($hsp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 Function: generate a cigar string by given HSP object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 Args : a GenericHSP object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 Returns: a text string of cigar string
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 sub generate_cigar_string_by_hsp {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 my ($self, $hsp) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 unless(ref($hsp) && $hsp->isa('Bio::Search::HSP::GenericHSP')){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 $self->throw("A GenericHSP object needed");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 my @qchars = split(//, $hsp->query_string);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 my @hchars = split(//, $hsp->hit_string);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 return $self->generate_cigar_string(\@qchars, \@hchars);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 1;