annotate variant_effect_predictor/Bio/Tools/pSW.pm @ 0:1f6dce3d34e0

Uploaded
author mahtabm
date Thu, 11 Apr 2013 02:01:53 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 ## $Id: pSW.pm,v 1.21 2002/10/22 07:45:22 lapp Exp $
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 # BioPerl module for Bio::Tools::pSW
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 # Cared for by Ewan Birney <birney@sanger.ac.uk>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8 # Copyright Ewan Birney
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16 Bio::Tools::pSW - pairwise Smith Waterman object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 use Bio::Tools::pSW;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21 use Bio::AlignIO;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22 my $factory = new Bio::Tools::pSW( '-matrix' => 'blosum62.bla',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23 '-gap' => 12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 '-ext' => 2,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 #use the factory to make some output
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29 $factory->align_and_show($seq1,$seq2,STDOUT);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 # make a Bio::SimpleAlign and do something with it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 my $aln = $factory->pairwise_alignment($seq1,$seq2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 my $alnout = new Bio::AlignIO(-format => 'msf',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 -fh => \*STDOUT);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 $alnout->write_aln($aln);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 =head1 INSTALLATION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 This module is included with the central Bioperl distribution:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 http://bio.perl.org/Core/Latest
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 ftp://bio.perl.org/pub/DIST
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 Follow the installation instructions included in the INSTALL file.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 pSW is an Alignment Factory for protein sequences. It builds pairwise
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51 alignments using the Smith-Waterman algorithm. The alignment algorithm is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 implemented in C and added in using an XS extension. The XS extension basically
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 comes from the Wise2 package, but has been slimmed down to only be the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 alignment part of that (this is a good thing!). The XS extension comes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55 from the bioperl-ext package which is distributed along with bioperl.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 I<Warning:> This package will not work if you have not compiled the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57 bioperl-ext package.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 The mixture of C and Perl is ideal for this sort of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 problem. Here are some plus points for this strategy:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 =over 2
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 =item Speed and Memory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 The algorithm is actually implemented in C, which means it is faster than
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 a pure perl implementation (I have never done one, so I have no idea
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 how faster) and will use considerably less memory, as it efficiently
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 assigns memory for the calculation.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 =item Algorithm efficiency
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 The algorithm was written using Dynamite, and so contains an automatic
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 switch to the linear space divide-and-conquer method. This means you
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 could effectively align very large sequences without killing your machine
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76 (it could take a while though!).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 =back
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 User feedback is an integral part of the evolution of this and other
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85 Bioperl modules. Send your comments and suggestions preferably to one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 of the Bioperl mailing lists. Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 http://bioperl.org/MailList.shtml - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 the bugs and their resolution. Bug reports can be submitted via email
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 =head1 AUTHOR
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 Ewan Birney, birney@sanger.ac.uk or birney@ebi.ac.uk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 =head1 CONTRIBUTORS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 Jason Stajich, jason@bioperl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110 The rest of the documentation details each of the object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 methods. Internal methods are usually preceded with an underscore "_".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 # Let the code begin...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 package Bio::Tools::pSW;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 no strict ( 'refs');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 BEGIN {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 eval {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 require Bio::Ext::Align;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 if ( $@ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 die("\nThe C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed.\n Please read the install the bioperl-ext package\n\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 exit(1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 use Bio::Tools::AlignFactory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 use Bio::SimpleAlign;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136 @ISA = qw(Bio::Tools::AlignFactory);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140 sub new {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 my($class,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 my $self = $class->SUPER::new(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 my($matrix,$gap,$ext) = $self->_rearrange([qw(MATRIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 GAP
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 EXT
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 )],@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 #default values - we have to load matrix into memory, so
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 # we need to check it out now
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 if( ! defined $matrix || !($matrix =~ /\w/) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 $matrix = 'blosum62.bla';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 $self->matrix($matrix); # will throw exception if it can't load it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 $self->gap(12) unless defined $gap;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 $self->ext(2) unless defined $ext;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 # I'm pretty sure I am not doing this right... ho hum...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 # This was not roght ($gap and $ext could not be 0) It is fixed now /AE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 if( defined $gap ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 if( $gap =~ /^\d+$/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 $self->gap($gap);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 $self->throw("Gap penalty must be a number, not [$gap]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 if( defined $ext ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 if( $ext =~ /^\d+$/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 $self->ext($ext);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 $self->throw("Extension penalty must be a number, not [$ext]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 return $self;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 =head2 pairwise_alignment
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 Title : pairwise_alignment
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 Usage : $aln = $factory->pairwise_alignment($seq1,$seq2)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 Function: Makes a SimpleAlign object from two sequences
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 Returns : A SimpleAlign object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192 sub pairwise_alignment{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 my ($self,$seq1,$seq2) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 my($t1,$t2,$aln,$out,@str1,@str2,@ostr1,@ostr2,$alc,$tstr,$tid,$start1,$end1,$start2,$end2,$alctemp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 if( ! defined $seq1 || ! $seq1->isa('Bio::PrimarySeqI') ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197 ! defined $seq2 || ! $seq2->isa('Bio::PrimarySeqI') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 $self->warn("Cannot call pairwise_alignment without specifing 2 sequences (Bio::PrimarySeqI objects)");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 return undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 # fix Jitterbug #1044
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 if( $seq1->length() < 2 ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 $seq2->length() < 2 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 $self->warn("cannot align sequences with length less than 2");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 return undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 $self->set_memory_and_report();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 # create engine objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 $seq1->display_id('seq1') unless ( defined $seq1->id() );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 $seq2->display_id('seq2') unless ( defined $seq2->id() );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 $t1 = &Bio::Ext::Align::new_Sequence_from_strings($seq1->id(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 $seq1->seq());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 $t2 = &Bio::Ext::Align::new_Sequence_from_strings($seq2->id(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 $seq2->seq());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 $aln = &Bio::Ext::Align::Align_Sequences_ProteinSmithWaterman($t1,$t2,$self->{'matrix'},-$self->gap,-$self->ext);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 if( ! defined $aln || $aln == 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 $self->throw("Unable to build an alignment");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 # free sequence engine objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 $t1 = $t2 = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 # now we have to get into the AlnBlock structure and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226 # figure out what is aligned to what...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 # we are going to need the sequences as arrays for convience
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 @str1 = split(//, $seq1->seq());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 @str2 = split(//, $seq2->seq());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 # get out start points
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 # The alignment is in alignment coordinates - ie the first
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 # residues starts at -1 and ends at 0. (weird I know).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 # bio-coordinates are +2 from this...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 $start1 = $aln->start()->alu(0)->start +2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 $start2 = $aln->start()->alu(1)->start +2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242 # step along the linked list of alc units...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244 for($alc = $aln->start();$alc->at_end() != 1;$alc = $alc->next()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 if( $alc->alu(0)->text_label eq 'SEQUENCE' ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 push(@ostr1,$str1[$alc->alu(0)->start+1]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 # assumme it is in insert!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 push(@ostr1,'-');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 if( $alc->alu(1)->text_label eq 'SEQUENCE' ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 push(@ostr2,$str2[$alc->alu(1)->start+1]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 # assumme it is in insert!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 push(@ostr2,'-');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 $alctemp = $alc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 # get out end points
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 # end points = real residue end in 'C' coordinates = residue
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 # end in biocoordinates. Oh... the wonder of coordinate systems!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 $end1 = $alctemp->alu(0)->end+1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 $end2 = $alctemp->alu(1)->end+1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 # get rid of the alnblock
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 $alc = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 $aln = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275 # new SimpleAlignment
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 $out = Bio::SimpleAlign->new(); # new SimpleAlignment
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 $tstr = join('',@ostr1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 $tid = $seq1->id();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 $out->add_seq(Bio::LocatableSeq->new( -seq=> $tstr,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 -start => $start1,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 -end => $end1,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 -id=>$tid ));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 $tstr = join('',@ostr2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 $tid = $seq2->id();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 $out->add_seq(Bio::LocatableSeq->new( -seq=> $tstr,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 -start => $start2,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 -end => $end2,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 -id=> $tid ));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 # give'm back the alignment
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 return $out;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 =head2 align_and_show
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 Title : align_and_show
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 Usage : $factory->align_and_show($seq1,$seq2,STDOUT)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 sub align_and_show {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 my($self,$seq1,$seq2,$fh) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 my($t1,$t2,$aln,$id,$str);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 if( ! defined $seq1 || ! $seq1->isa('Bio::PrimarySeqI') ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 ! defined $seq2 || ! $seq2->isa('Bio::PrimarySeqI') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 $self->warn("Cannot call align_and_show without specifing 2 sequences (Bio::PrimarySeqI objects)");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 return undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313 # fix Jitterbug #1044
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 if( $seq1->length() < 2 ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 $seq2->length() < 2 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 $self->warn("cannot align sequences with length less than 2");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 return undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 if( ! defined $fh ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320 $fh = \*STDOUT;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 $self->set_memory_and_report();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 $seq1->display_id('seq1') unless ( defined $seq1->id() );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 $seq2->display_id('seq2') unless ( defined $seq2->id() );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 $t1 = &Bio::Ext::Align::new_Sequence_from_strings($seq1->id(),$seq1->seq());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 $t2 = &Bio::Ext::Align::new_Sequence_from_strings($seq2->id(),$seq2->seq());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 $aln = &Bio::Ext::Align::Align_Sequences_ProteinSmithWaterman($t1,$t2,$self->{'matrix'},-$self->gap,-$self->ext);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 if( ! defined $aln || $aln == 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 $self->throw("Unable to build an alignment");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 &Bio::Ext::Align::write_pretty_seq_align($aln,$t1,$t2,12,50,$fh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 =head2 matrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 Title : matrix()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 Usage : $factory->matrix('blosum62.bla');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 Function : Reads in comparison matrix based on name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 Argument : comparison matrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 sub matrix {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 my($self,$comp) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 my $temp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 if( !defined $comp ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 $self->throw("You must have a comparison matrix to set!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 # talking to the engine here...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 $temp = &Bio::Ext::Align::CompMat::read_Blast_file_CompMat($comp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 if( !(defined $temp) || $temp == 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 $self->throw("$comp cannot be read as a BLAST comparison matrix file");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 $self->{'matrix'} = $temp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370 =head2 gap
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 Title : gap
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373 Usage : $gap = $factory->gap() #get
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374 : $factory->gap($value) #set
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 Function : the set get for the gap penalty
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 Returns : gap value
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378 Arguments : new value
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 sub gap {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 my ($self,$val) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 if( defined $val ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387 if( $val < 0 ) { # Fixed so that gap==0 is allowed /AE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388 $self->throw("Can't have a gap penalty less than 0");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 $self->{'gap'} = $val;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392 return $self->{'gap'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396 =head2 ext
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398 Title : ext
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 Usage : $ext = $factory->ext() #get
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400 : $factory->ext($value) #set
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 Function : the set get for the ext penalty
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 Returns : ext value
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 Arguments : new value
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 sub ext {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409 my ($self,$val) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 if( defined $val ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 if( $val < 0 ) { # Fixed so that gap==0 is allowed /AE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413 $self->throw("Can't have a gap penalty less than 0");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 $self->{'ext'} = $val;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 return $self->{'ext'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421