annotate variant_effect_predictor/Bio/Tools/pSW.pm @ 1:d6778b5d8382 draft default tip

Deleted selected files
author willmclaren
date Fri, 03 Aug 2012 10:05:43 -0400
parents 21066c0abaf5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1 ## $Id: pSW.pm,v 1.21 2002/10/22 07:45:22 lapp Exp $
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
2
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
3 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
4 # BioPerl module for Bio::Tools::pSW
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
5 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
6 # Cared for by Ewan Birney <birney@sanger.ac.uk>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
7 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
8 # Copyright Ewan Birney
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
9 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
10 # You may distribute this module under the same terms as perl itself
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
11
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
12 # POD documentation - main docs before the code
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
13
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
14 =head1 NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
15
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
16 Bio::Tools::pSW - pairwise Smith Waterman object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
17
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
18 =head1 SYNOPSIS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
19
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
20 use Bio::Tools::pSW;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
21 use Bio::AlignIO;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
22 my $factory = new Bio::Tools::pSW( '-matrix' => 'blosum62.bla',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
23 '-gap' => 12,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
24 '-ext' => 2,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
25 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
26
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
27 #use the factory to make some output
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
28
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
29 $factory->align_and_show($seq1,$seq2,STDOUT);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
30
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
31 # make a Bio::SimpleAlign and do something with it
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
32
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
33 my $aln = $factory->pairwise_alignment($seq1,$seq2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
34 my $alnout = new Bio::AlignIO(-format => 'msf',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
35 -fh => \*STDOUT);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
36
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
37 $alnout->write_aln($aln);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
38
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
39 =head1 INSTALLATION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
40
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
41 This module is included with the central Bioperl distribution:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
42
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
43 http://bio.perl.org/Core/Latest
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
44 ftp://bio.perl.org/pub/DIST
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
45
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
46 Follow the installation instructions included in the INSTALL file.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
47
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
48 =head1 DESCRIPTION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
49
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
50 pSW is an Alignment Factory for protein sequences. It builds pairwise
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
51 alignments using the Smith-Waterman algorithm. The alignment algorithm is
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
52 implemented in C and added in using an XS extension. The XS extension basically
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
53 comes from the Wise2 package, but has been slimmed down to only be the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
54 alignment part of that (this is a good thing!). The XS extension comes
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
55 from the bioperl-ext package which is distributed along with bioperl.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
56 I<Warning:> This package will not work if you have not compiled the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
57 bioperl-ext package.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
58
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
59 The mixture of C and Perl is ideal for this sort of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
60 problem. Here are some plus points for this strategy:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
61
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
62 =over 2
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
63
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
64 =item Speed and Memory
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
65
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
66 The algorithm is actually implemented in C, which means it is faster than
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
67 a pure perl implementation (I have never done one, so I have no idea
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
68 how faster) and will use considerably less memory, as it efficiently
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
69 assigns memory for the calculation.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
70
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
71 =item Algorithm efficiency
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
72
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
73 The algorithm was written using Dynamite, and so contains an automatic
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
74 switch to the linear space divide-and-conquer method. This means you
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
75 could effectively align very large sequences without killing your machine
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
76 (it could take a while though!).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
77
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
78 =back
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
79
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
80 =head1 FEEDBACK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
81
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
82 =head2 Mailing Lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
83
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
84 User feedback is an integral part of the evolution of this and other
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
85 Bioperl modules. Send your comments and suggestions preferably to one
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
86 of the Bioperl mailing lists. Your participation is much appreciated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
87
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
88 bioperl-l@bioperl.org - General discussion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
89 http://bioperl.org/MailList.shtml - About the mailing lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
90
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
91 =head2 Reporting Bugs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
92
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
93 Report bugs to the Bioperl bug tracking system to help us keep track
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
94 the bugs and their resolution. Bug reports can be submitted via email
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
95 or the web:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
96
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
97 bioperl-bugs@bio.perl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
98 http://bugzilla.bioperl.org/
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
99
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
100 =head1 AUTHOR
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
101
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
102 Ewan Birney, birney@sanger.ac.uk or birney@ebi.ac.uk
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
103
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
104 =head1 CONTRIBUTORS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
105
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
106 Jason Stajich, jason@bioperl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
107
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
108 =head1 APPENDIX
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
109
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
110 The rest of the documentation details each of the object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
111 methods. Internal methods are usually preceded with an underscore "_".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
112
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
113 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
114
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
115 # Let the code begin...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
116
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
117 package Bio::Tools::pSW;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
118 use vars qw(@ISA);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
119 use strict;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
120 no strict ( 'refs');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
121
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
122 BEGIN {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
123 eval {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
124 require Bio::Ext::Align;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
125 };
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
126 if ( $@ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
127 die("\nThe C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed.\n Please read the install the bioperl-ext package\n\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
128 exit(1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
129 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
130 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
131
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
132 use Bio::Tools::AlignFactory;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
133 use Bio::SimpleAlign;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
134
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
135
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
136 @ISA = qw(Bio::Tools::AlignFactory);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
137
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
138
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
139
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
140 sub new {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
141 my($class,@args) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
142
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
143 my $self = $class->SUPER::new(@args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
144
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
145 my($matrix,$gap,$ext) = $self->_rearrange([qw(MATRIX
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
146 GAP
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
147 EXT
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
148 )],@args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
149
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
150 #default values - we have to load matrix into memory, so
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
151 # we need to check it out now
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
152 if( ! defined $matrix || !($matrix =~ /\w/) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
153 $matrix = 'blosum62.bla';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
154 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
155
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
156 $self->matrix($matrix); # will throw exception if it can't load it
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
157 $self->gap(12) unless defined $gap;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
158 $self->ext(2) unless defined $ext;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
159
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
160 # I'm pretty sure I am not doing this right... ho hum...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
161 # This was not roght ($gap and $ext could not be 0) It is fixed now /AE
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
162 if( defined $gap ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
163 if( $gap =~ /^\d+$/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
164 $self->gap($gap);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
165 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
166 $self->throw("Gap penalty must be a number, not [$gap]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
167 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
168 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
169 if( defined $ext ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
170 if( $ext =~ /^\d+$/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
171 $self->ext($ext);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
172 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
173 $self->throw("Extension penalty must be a number, not [$ext]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
174 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
175 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
176
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
177 return $self;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
178 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
179
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
180
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
181 =head2 pairwise_alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
182
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
183 Title : pairwise_alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
184 Usage : $aln = $factory->pairwise_alignment($seq1,$seq2)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
185 Function: Makes a SimpleAlign object from two sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
186 Returns : A SimpleAlign object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
187 Args :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
188
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
189
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
190 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
191
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
192 sub pairwise_alignment{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
193 my ($self,$seq1,$seq2) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
194 my($t1,$t2,$aln,$out,@str1,@str2,@ostr1,@ostr2,$alc,$tstr,$tid,$start1,$end1,$start2,$end2,$alctemp);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
195
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
196 if( ! defined $seq1 || ! $seq1->isa('Bio::PrimarySeqI') ||
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
197 ! defined $seq2 || ! $seq2->isa('Bio::PrimarySeqI') ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
198 $self->warn("Cannot call pairwise_alignment without specifing 2 sequences (Bio::PrimarySeqI objects)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
199 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
200 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
201 # fix Jitterbug #1044
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
202 if( $seq1->length() < 2 ||
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
203 $seq2->length() < 2 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
204 $self->warn("cannot align sequences with length less than 2");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
205 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
206 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
207 $self->set_memory_and_report();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
208 # create engine objects
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
209 $seq1->display_id('seq1') unless ( defined $seq1->id() );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
210 $seq2->display_id('seq2') unless ( defined $seq2->id() );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
211
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
212 $t1 = &Bio::Ext::Align::new_Sequence_from_strings($seq1->id(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
213 $seq1->seq());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
214 $t2 = &Bio::Ext::Align::new_Sequence_from_strings($seq2->id(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
215 $seq2->seq());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
216 $aln = &Bio::Ext::Align::Align_Sequences_ProteinSmithWaterman($t1,$t2,$self->{'matrix'},-$self->gap,-$self->ext);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
217 if( ! defined $aln || $aln == 0 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
218 $self->throw("Unable to build an alignment");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
219 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
220
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
221 # free sequence engine objects
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
222
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
223 $t1 = $t2 = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
224
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
225 # now we have to get into the AlnBlock structure and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
226 # figure out what is aligned to what...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
227
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
228 # we are going to need the sequences as arrays for convience
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
229
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
230 @str1 = split(//, $seq1->seq());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
231 @str2 = split(//, $seq2->seq());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
232
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
233 # get out start points
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
234
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
235 # The alignment is in alignment coordinates - ie the first
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
236 # residues starts at -1 and ends at 0. (weird I know).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
237 # bio-coordinates are +2 from this...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
238
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
239 $start1 = $aln->start()->alu(0)->start +2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
240 $start2 = $aln->start()->alu(1)->start +2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
241
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
242 # step along the linked list of alc units...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
243
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
244 for($alc = $aln->start();$alc->at_end() != 1;$alc = $alc->next()) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
245 if( $alc->alu(0)->text_label eq 'SEQUENCE' ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
246 push(@ostr1,$str1[$alc->alu(0)->start+1]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
247 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
248 # assumme it is in insert!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
249 push(@ostr1,'-');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
250 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
251
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
252 if( $alc->alu(1)->text_label eq 'SEQUENCE' ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
253 push(@ostr2,$str2[$alc->alu(1)->start+1]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
254 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
255 # assumme it is in insert!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
256 push(@ostr2,'-');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
257 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
258 $alctemp = $alc;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
259 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
260
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
261 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
262 # get out end points
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
263 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
264
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
265 # end points = real residue end in 'C' coordinates = residue
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
266 # end in biocoordinates. Oh... the wonder of coordinate systems!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
267
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
268 $end1 = $alctemp->alu(0)->end+1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
269 $end2 = $alctemp->alu(1)->end+1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
270
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
271 # get rid of the alnblock
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
272 $alc = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
273 $aln = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
274
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
275 # new SimpleAlignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
276 $out = Bio::SimpleAlign->new(); # new SimpleAlignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
277
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
278 $tstr = join('',@ostr1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
279 $tid = $seq1->id();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
280 $out->add_seq(Bio::LocatableSeq->new( -seq=> $tstr,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
281 -start => $start1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
282 -end => $end1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
283 -id=>$tid ));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
284
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
285 $tstr = join('',@ostr2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
286 $tid = $seq2->id();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
287 $out->add_seq(Bio::LocatableSeq->new( -seq=> $tstr,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
288 -start => $start2,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
289 -end => $end2,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
290 -id=> $tid ));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
291
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
292 # give'm back the alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
293
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
294 return $out;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
295 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
296
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
297 =head2 align_and_show
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
298
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
299 Title : align_and_show
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
300 Usage : $factory->align_and_show($seq1,$seq2,STDOUT)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
301
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
302 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
303
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
304 sub align_and_show {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
305 my($self,$seq1,$seq2,$fh) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
306 my($t1,$t2,$aln,$id,$str);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
307
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
308 if( ! defined $seq1 || ! $seq1->isa('Bio::PrimarySeqI') ||
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
309 ! defined $seq2 || ! $seq2->isa('Bio::PrimarySeqI') ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
310 $self->warn("Cannot call align_and_show without specifing 2 sequences (Bio::PrimarySeqI objects)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
311 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
312 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
313 # fix Jitterbug #1044
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
314 if( $seq1->length() < 2 ||
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
315 $seq2->length() < 2 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
316 $self->warn("cannot align sequences with length less than 2");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
317 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
318 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
319 if( ! defined $fh ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
320 $fh = \*STDOUT;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
321 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
322 $self->set_memory_and_report();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
323 $seq1->display_id('seq1') unless ( defined $seq1->id() );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
324 $seq2->display_id('seq2') unless ( defined $seq2->id() );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
325
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
326 $t1 = &Bio::Ext::Align::new_Sequence_from_strings($seq1->id(),$seq1->seq());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
327
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
328 $t2 = &Bio::Ext::Align::new_Sequence_from_strings($seq2->id(),$seq2->seq());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
329 $aln = &Bio::Ext::Align::Align_Sequences_ProteinSmithWaterman($t1,$t2,$self->{'matrix'},-$self->gap,-$self->ext);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
330 if( ! defined $aln || $aln == 0 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
331 $self->throw("Unable to build an alignment");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
332 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
333
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
334 &Bio::Ext::Align::write_pretty_seq_align($aln,$t1,$t2,12,50,$fh);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
335
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
336 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
337
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
338 =head2 matrix
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
339
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
340 Title : matrix()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
341 Usage : $factory->matrix('blosum62.bla');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
342 Function : Reads in comparison matrix based on name
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
343 :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
344 Returns :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
345 Argument : comparison matrix
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
346
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
347 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
348
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
349 sub matrix {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
350 my($self,$comp) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
351 my $temp;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
352
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
353 if( !defined $comp ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
354 $self->throw("You must have a comparison matrix to set!");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
355 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
356
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
357 # talking to the engine here...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
358
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
359 $temp = &Bio::Ext::Align::CompMat::read_Blast_file_CompMat($comp);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
360
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
361 if( !(defined $temp) || $temp == 0 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
362 $self->throw("$comp cannot be read as a BLAST comparison matrix file");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
363 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
364
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
365 $self->{'matrix'} = $temp;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
366 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
367
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
368
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
369
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
370 =head2 gap
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
371
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
372 Title : gap
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
373 Usage : $gap = $factory->gap() #get
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
374 : $factory->gap($value) #set
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
375 Function : the set get for the gap penalty
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
376 Example :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
377 Returns : gap value
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
378 Arguments : new value
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
379
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
380 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
381
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
382 sub gap {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
383 my ($self,$val) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
384
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
385
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
386 if( defined $val ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
387 if( $val < 0 ) { # Fixed so that gap==0 is allowed /AE
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
388 $self->throw("Can't have a gap penalty less than 0");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
389 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
390 $self->{'gap'} = $val;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
391 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
392 return $self->{'gap'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
393 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
394
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
395
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
396 =head2 ext
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
397
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
398 Title : ext
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
399 Usage : $ext = $factory->ext() #get
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
400 : $factory->ext($value) #set
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
401 Function : the set get for the ext penalty
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
402 Example :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
403 Returns : ext value
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
404 Arguments : new value
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
405
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
406 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
407
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
408 sub ext {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
409 my ($self,$val) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
410
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
411 if( defined $val ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
412 if( $val < 0 ) { # Fixed so that gap==0 is allowed /AE
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
413 $self->throw("Can't have a gap penalty less than 0");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
414 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
415 $self->{'ext'} = $val;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
416 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
417 return $self->{'ext'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
418 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
419
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
420
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
421