Mercurial > repos > mahtabm > ensembl
comparison variant_effect_predictor/Bio/Tools/Run/StandAloneBlast.pm @ 0:1f6dce3d34e0
Uploaded
| author | mahtabm |
|---|---|
| date | Thu, 11 Apr 2013 02:01:53 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:1f6dce3d34e0 |
|---|---|
| 1 # $Id: StandAloneBlast.pm,v 1.23.2.3 2003/03/29 20:18:51 jason Exp $ | |
| 2 # | |
| 3 # BioPerl module for Bio::Tools::StandAloneBlast | |
| 4 # | |
| 5 # Cared for by Peter Schattner | |
| 6 # | |
| 7 # Copyright Peter Schattner | |
| 8 # | |
| 9 # You may distribute this module under the same terms as perl itself | |
| 10 | |
| 11 # POD documentation - main docs before the code | |
| 12 | |
| 13 =head1 NAME | |
| 14 | |
| 15 Bio::Tools::Run::StandAloneBlast - Object for the local execution of the | |
| 16 NCBI Blast program suite (blastall, blastpgp, bl2seq) | |
| 17 | |
| 18 =head1 SYNOPSIS | |
| 19 | |
| 20 Local-blast "factory object" creation and blast-parameter initialization: | |
| 21 | |
| 22 @params = ('database' => 'swissprot','outfile' => 'blast1.out', | |
| 23 '_READMETHOD' => 'Blast'); | |
| 24 | |
| 25 $factory = Bio::Tools::Run::StandAloneBlast->new(@params); | |
| 26 | |
| 27 Blast a sequence against a database: | |
| 28 | |
| 29 $str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta' ); | |
| 30 $input = $str->next_seq(); | |
| 31 $input2 = $str->next_seq(); | |
| 32 $blast_report = $factory->blastall($input); | |
| 33 | |
| 34 Run an iterated Blast (psiblast) of a sequence against a database: | |
| 35 | |
| 36 $factory->j(3); # 'j' is blast parameter for # of iterations | |
| 37 $factory->outfile('psiblast1.out'); | |
| 38 $factory = Bio::Tools::Run::StandAloneBlast->new(@params); | |
| 39 $blast_report = $factory->blastpgp($input); | |
| 40 | |
| 41 Use blast to align 2 sequences against each other: | |
| 42 | |
| 43 $factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out'); | |
| 44 $factory->bl2seq($input, $input2); | |
| 45 | |
| 46 Various additional options and input formats are available. See the | |
| 47 DESCRIPTION section for details. | |
| 48 | |
| 49 =head1 DESCRIPTION | |
| 50 | |
| 51 This DESCRIPTION only documents Bio::Tools::Run::StandAloneBlast: - a | |
| 52 Bioperl object for running the NCBI standAlone BLAST package. Blast, | |
| 53 itself, is a large & complex program - for more information regarding | |
| 54 BLAST, please see the BLAST documentation which accompanies the BLAST | |
| 55 distribution. BLAST is available from ftp://ncbi.nlm.nih.gov/blast/. | |
| 56 | |
| 57 (A source of confusion in documenting a BLAST interface is that the | |
| 58 term "program" is used in - at least - three different ways in the | |
| 59 BLAST documentation. In this DESCRIPTION, "program" will refer to the | |
| 60 BLAST routine set by BLAST's C<-p> parameter that can be set to blastn, | |
| 61 blastp, tblastx etc. We will use the term Blast "executable" to refer | |
| 62 to the various different executable files that may be called - ie | |
| 63 blastall, blastpgp or bl2seq. In addition, there are several BLAST | |
| 64 capabilities (which are also referred to as "programs") and are | |
| 65 implemented by using specific combinations of BLAST executables, | |
| 66 programs and parameters. They will be referred by their specific | |
| 67 names - eg PSIBLAST and PHIBLAST. ) | |
| 68 | |
| 69 StandAloneBlast has been tested so far only under Linux. I expect | |
| 70 that it should also work under other Unix systems. However, since the | |
| 71 module is implemented using (unix) system calls, modification may be | |
| 72 necessary before StandAloneBlast would work under non-Unix | |
| 73 operating systems (eg Windows, MacOS). Before running | |
| 74 StandAloneBlast it is necessary: to install BLAST on your system, | |
| 75 to edit set the environmental variable $BLASTDIR or your $PATH | |
| 76 variable to point to the BLAST directory, and to ensure that users | |
| 77 have execute privileges for the BLAST program. If the databases | |
| 78 which will be searched by BLAST are located in the data subdirectory | |
| 79 of the blast program directory (the default installation location), | |
| 80 StandAloneBlast will find them; however, if the database files are | |
| 81 located in any other location, environmental variable $BLASTDATADIR | |
| 82 will need to be set to point to that directory. | |
| 83 | |
| 84 The use of the StandAloneBlast module is as follows: Initially, a | |
| 85 local blast "factory object" is created. The constructor may be passed | |
| 86 an optional array of (non-default) parameters to be used by the | |
| 87 factory, eg: | |
| 88 | |
| 89 @params = ('program' => 'blastn', 'database' => 'ecoli.nt'); | |
| 90 $factory = Bio::Tools::Run::StandAloneBlast->new(@params); | |
| 91 | |
| 92 Any parameters not explicitly set will remain as the defaults of the | |
| 93 BLAST executable. Note each BLAST executable has somewhat different | |
| 94 parameters and options. See the BLAST Documentation for a description | |
| 95 or run the BLAST executable from the command line followed solely with | |
| 96 a "-" to see a list of options and default values for that executable; | |
| 97 eg E<gt>blastall -. | |
| 98 | |
| 99 BLAST parameters can be changed and/or examined at any time after the | |
| 100 factory has been created. The program checks that any | |
| 101 parameter/switch being set/read is valid. Except where specifically | |
| 102 noted, StandAloneBlast uses the same single-letter, case-sensitive | |
| 103 parameter names as the actual blast program. Currently no checks are | |
| 104 included to verify that parameters are of the proper type (eg string | |
| 105 or numeric) or that their values are within the proper range. | |
| 106 | |
| 107 As an example, to change the value of the Blast parameter 'e' ('e' is | |
| 108 the parameter for expectation-value cutoff) | |
| 109 | |
| 110 $expectvalue = 0.01; | |
| 111 $factory->e($expectvalue); | |
| 112 | |
| 113 Note that for improved script readibility one can modify the name of | |
| 114 the BLAST parameters as desired as long as the initial letter (and | |
| 115 case) of the parameter are preserved, eg | |
| 116 $factory-E<gt>expectvalue($expectvalue); Unfortunately, some of the BLAST | |
| 117 parameters are not the single letter one might expect (eg "iteration | |
| 118 round" in blastpgp is 'j'). Again one can check by using (eg) | |
| 119 | |
| 120 > blastpgp - . | |
| 121 | |
| 122 Once the factory has been created and the appropriate parameters set, | |
| 123 one can call one of the supported blast executables. The input | |
| 124 sequence(s) to these executables may be fasta file(s) as described in | |
| 125 the BLAST documentation. | |
| 126 | |
| 127 $inputfilename = 't/testquery.fa'; | |
| 128 $blast_report = $factory->blastall($inputfilename); | |
| 129 | |
| 130 In addition, sequence input may be in the form of either a Bio::Seq | |
| 131 object or or an array of Bio::Seq objects, eg | |
| 132 | |
| 133 $input = Bio::Seq->new(-id=>"test query",-seq=>"ACTACCCTTTAAATCAGTGGGGG"); | |
| 134 $blast_report = $factory->blastall($input); | |
| 135 | |
| 136 For blastall and non-psiblast blastpgp runs, report object is either a | |
| 137 BPlite.pm or Bio::SearchIO object, selected by the user with the | |
| 138 parameter _READMETHOD. (The leading underscore is needed to | |
| 139 distinguish this option from options which are passed to the BLAST | |
| 140 executable.) The default parser is Bio::SearchIO::blast. For | |
| 141 (multiple iteration) psiblast and bl2seq runs the report is | |
| 142 automatically parsed by the BPpsilite.pm and BPbl2seq.pm parsers | |
| 143 respectively, since neither Blast.pm nor BPlite can parse these | |
| 144 reports. In any case, the "raw" blast report is also available. The | |
| 145 filename is set by the in the 'outfile' parameter and has the default | |
| 146 value of "blastreport.out". | |
| 147 | |
| 148 For psiblast execution in BLAST's "jumpstart" mode, the program must | |
| 149 be passed (in addition to the query sequence itself) an alignment | |
| 150 containing the query sequence (in the form of a SimpleAlign object) as | |
| 151 well as a "mask" specifying at what residues position-specific scoring | |
| 152 matrices (PSSMs) are to used and at what residues default scoring | |
| 153 matrices (eg BLOSUM) are to be used. See psiblast documentation for | |
| 154 more details. The mask itself is a string of 0's and 1's which is the | |
| 155 same length as each sequence in the alignment and has a "1" at | |
| 156 locations where (PSSMs) are to be used and a "0" at all other | |
| 157 locations. So for example: | |
| 158 | |
| 159 $str = Bio::AlignIO->new(-file=> "cysprot.msf", '-format' => 'msf' ); | |
| 160 $aln = $str->next_aln(); | |
| 161 $len = $aln->length_aln(); | |
| 162 $mask = '1' x $len; # simple case where PSSM's to be used at all residues | |
| 163 $report = $factory->blastpgp("cysprot1.fa", $aln, $mask); | |
| 164 | |
| 165 For bl2seq execution, StandAloneBlast.pm can be combined with | |
| 166 AlignIO.pm to directly produce a SimpleAlign object from the alignment | |
| 167 of the two sequences produced by bl2seq as in: | |
| 168 | |
| 169 #Get 2 sequences | |
| 170 $str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta', ); | |
| 171 my $seq3 = $str->next_seq(); | |
| 172 my $seq4 = $str->next_seq(); | |
| 173 | |
| 174 # Run bl2seq on them | |
| 175 $factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out'); | |
| 176 my $bl2seq_report = $factory->bl2seq($seq3, $seq4); | |
| 177 | |
| 178 # Use AlignIO.pm to create a SimpleAlign object from the bl2seq report | |
| 179 $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq'); | |
| 180 $aln = $str->next_aln(); | |
| 181 | |
| 182 For more examples of syntax and use of Blast.pm, the user is | |
| 183 encouraged to run the scripts standaloneblast.pl in the bioperl | |
| 184 /examples directory and StandAloneBlast.t in the bioperl /t directory. | |
| 185 | |
| 186 Note: There is a similar (but older) perl object interface offered by | |
| 187 nhgri. The nhgri module only supports blastall and does not support | |
| 188 blastpgp, psiblast, phiblast, bl2seq etc. This module can be found at | |
| 189 http://genome.nhgri.nih.gov/blastall/. | |
| 190 | |
| 191 =head1 DEVELOPERS NOTES | |
| 192 | |
| 193 B<STILL TO BE WRITTEN> | |
| 194 | |
| 195 Note: This module is still under development. If you would like that a | |
| 196 specific BLAST feature be added to this perl interface, let me know. | |
| 197 | |
| 198 =head1 FEEDBACK | |
| 199 | |
| 200 =head2 Mailing Lists | |
| 201 | |
| 202 User feedback is an integral part of the evolution of this and other | |
| 203 Bioperl modules. Send your comments and suggestions preferably to one | |
| 204 of the Bioperl mailing lists. Your participation is much appreciated. | |
| 205 | |
| 206 bioperl-l@bioperl.org - General discussion | |
| 207 http://bio.perl.org/MailList.html - About the mailing lists | |
| 208 | |
| 209 =head2 Reporting Bugs | |
| 210 | |
| 211 Report bugs to the Bioperl bug tracking system to help us keep track | |
| 212 the bugs and their resolution. Bug reports can be submitted via email | |
| 213 or the web: | |
| 214 | |
| 215 bioperl-bugs@bio.perl.org | |
| 216 http://bio.perl.org/bioperl-bugs/ | |
| 217 | |
| 218 =head1 AUTHOR - Peter Schattner | |
| 219 | |
| 220 Email schattner@alum.mit.edu | |
| 221 | |
| 222 =head1 APPENDIX | |
| 223 | |
| 224 The rest of the documentation details each of the object | |
| 225 methods. Internal methods are usually preceded with a _ | |
| 226 | |
| 227 =cut | |
| 228 | |
| 229 package Bio::Tools::Run::StandAloneBlast; | |
| 230 | |
| 231 use vars qw($AUTOLOAD @ISA $PROGRAMDIR $DATADIR | |
| 232 @BLASTALL_PARAMS @BLASTPGP_PARAMS | |
| 233 @BL2SEQ_PARAMS @OTHER_PARAMS %OK_FIELD | |
| 234 ); | |
| 235 use strict; | |
| 236 use Bio::Root::Root; | |
| 237 use Bio::Root::IO; | |
| 238 use Bio::Seq; | |
| 239 use Bio::SeqIO; | |
| 240 use Bio::Tools::BPbl2seq; | |
| 241 use Bio::Tools::BPpsilite; | |
| 242 use Bio::SearchIO; | |
| 243 use Bio::Tools::Run::WrapperBase; | |
| 244 use Bio::Factory::ApplicationFactoryI; | |
| 245 | |
| 246 BEGIN { | |
| 247 | |
| 248 @BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q | |
| 249 D a O J M W z K L Y S T l U y Z); | |
| 250 @BLASTPGP_PARAMS = qw(d i A f e m o y P F G E X N g S H a I h c | |
| 251 j J Z O M v b C R W z K L Y p k T Q B l U); | |
| 252 @BL2SEQ_PARAMS = qw(i j p g o d a G E X W M q r F e S T m); | |
| 253 | |
| 254 | |
| 255 # Non BLAST parameters start with underscore to differentiate them | |
| 256 # from BLAST parameters | |
| 257 @OTHER_PARAMS = qw(_READMETHOD); | |
| 258 | |
| 259 # _READMETHOD = 'BPlite' (default) or 'Blast' | |
| 260 # my @other_switches = qw(QUIET); | |
| 261 | |
| 262 | |
| 263 # Authorize attribute fields | |
| 264 foreach my $attr (@BLASTALL_PARAMS, @BLASTPGP_PARAMS, | |
| 265 @BL2SEQ_PARAMS, @OTHER_PARAMS ) | |
| 266 { $OK_FIELD{$attr}++; } | |
| 267 | |
| 268 # You will need to enable Blast to find the Blast program. This can be done | |
| 269 # in (at least) two different ways: | |
| 270 # 1. define an environmental variable blastDIR: | |
| 271 # export BLASTDIR=/home/peter/blast or | |
| 272 # 2. include a definition of an environmental variable BLASTDIR in every script that will | |
| 273 # use StandAloneBlast.pm. | |
| 274 # BEGIN {$ENV{BLASTDIR} = '/home/peter/blast/'; } | |
| 275 $PROGRAMDIR = $ENV{'BLASTDIR'} || ''; | |
| 276 | |
| 277 # If local BLAST databases are not stored in the standard | |
| 278 # /data directory, the variable BLASTDATADIR will need to be set explicitly | |
| 279 $DATADIR = $ENV{'BLASTDATADIR'} || $ENV{'BLASTDB'} || ''; | |
| 280 } | |
| 281 | |
| 282 @ISA = qw(Bio::Root::Root | |
| 283 Bio::Tools::Run::WrapperBase | |
| 284 Bio::Factory::ApplicationFactoryI); | |
| 285 | |
| 286 =head1 BLAST parameters | |
| 287 | |
| 288 Essentially all BLAST parameter can be set via StandAloneBlast.pm. | |
| 289 Some of the most commonly used parameters are listed below. All | |
| 290 parameters have defaults and are optional (I think.) For a complete | |
| 291 listing of settable parameters, run the relevant executable BLAST | |
| 292 program with the option "-" as in blastall - | |
| 293 | |
| 294 =head2 Blastall | |
| 295 | |
| 296 -p Program Name [String] | |
| 297 Input should be one of "blastp", "blastn", "blastx", | |
| 298 "tblastn", or "tblastx". | |
| 299 -d Database [String] default = nr | |
| 300 The database specified must first be formatted with formatdb. | |
| 301 Multiple database names (bracketed by quotations) will be accepted. | |
| 302 An example would be -d "nr est" | |
| 303 -i Query File [File In] Set by StandAloneBlast.pm from script. | |
| 304 default = stdin. The query should be in FASTA format. If multiple FASTA entries are in the input | |
| 305 file, all queries will be searched. | |
| 306 -e Expectation value (E) [Real] default = 10.0 | |
| 307 -o BLAST report Output File [File Out] Optional, | |
| 308 default = ./blastreport.out ; set by StandAloneBlast.pm | |
| 309 -S Query strands to search against database (for blast[nx], and tblastx). 3 is both, 1 is top, 2 is bottom [Integer] | |
| 310 default = 3 | |
| 311 | |
| 312 =head2 Blastpgp (including Psiblast) | |
| 313 | |
| 314 -j is the maximum number of rounds (default 1; i.e., regular BLAST) | |
| 315 -h is the e-value threshold for including sequences in the | |
| 316 score matrix model (default 0.001) | |
| 317 -c is the "constant" used in the pseudocount formula specified in the paper (default 10) | |
| 318 -B Multiple alignment file for PSI-BLAST "jump start mode" Optional | |
| 319 -Q Output File for PSI-BLAST Matrix in ASCII [File Out] Optional | |
| 320 | |
| 321 =head2 Bl2seq | |
| 322 | |
| 323 -i First sequence [File In] | |
| 324 -j Second sequence [File In] | |
| 325 -p Program name: blastp, blastn, blastx. For blastx 1st argument should be nucleotide [String] | |
| 326 default = blastp | |
| 327 -o alignment output file [File Out] default = stdout | |
| 328 -e Expectation value (E) [Real] default = 10.0 | |
| 329 -S Query strands to search against database (blastn only). 3 is both, 1 is top, 2 is bottom [Integer] | |
| 330 default = 3 | |
| 331 | |
| 332 =cut | |
| 333 | |
| 334 sub new { | |
| 335 my ($caller, @args) = @_; | |
| 336 # chained new | |
| 337 my $self = $caller->SUPER::new(@args); | |
| 338 | |
| 339 # to facilitiate tempfile cleanup | |
| 340 my ($tfh,$tempfile) = $self->io->tempfile(); | |
| 341 close($tfh); # we don't want the filehandle, just a temporary name | |
| 342 $self->outfile($tempfile); | |
| 343 $self->_READMETHOD('Blast'); | |
| 344 while (@args) { | |
| 345 my $attr = shift @args; | |
| 346 my $value = shift @args; | |
| 347 next if( $attr eq '-verbose'); | |
| 348 # the workaround to deal with initializing | |
| 349 $attr = 'p' if $attr =~ /^\s*program\s*$/; | |
| 350 $self->$attr($value); | |
| 351 } | |
| 352 return $self; | |
| 353 } | |
| 354 | |
| 355 sub AUTOLOAD { | |
| 356 my $self = shift; | |
| 357 my $attr = $AUTOLOAD; | |
| 358 $attr =~ s/.*:://; | |
| 359 my $attr_letter = substr($attr, 0, 1) ; | |
| 360 | |
| 361 # actual key is first letter of $attr unless first attribute | |
| 362 # letter is underscore (as in _READMETHOD), the $attr is a BLAST | |
| 363 # parameter and should be truncated to its first letter only | |
| 364 $attr = ($attr_letter eq '_') ? $attr : $attr_letter; | |
| 365 $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr}; | |
| 366 # $self->throw("Unallowed parameter: $attr !") unless $ok_field{$attr_letter}; | |
| 367 $self->{$attr_letter} = shift if @_; | |
| 368 return $self->{$attr_letter}; | |
| 369 } | |
| 370 | |
| 371 =head1 Methods | |
| 372 | |
| 373 =head2 executable | |
| 374 | |
| 375 Title : executable | |
| 376 Usage : my $exe = $blastfactory->executable('blastall'); | |
| 377 Function: Finds the full path to the 'codeml' executable | |
| 378 Returns : string representing the full path to the exe | |
| 379 Args : [optional] name of executable to set path to | |
| 380 [optional] boolean flag whether or not warn when exe is not found | |
| 381 | |
| 382 | |
| 383 =cut | |
| 384 | |
| 385 sub executable { | |
| 386 my ($self, $exename, $exe,$warn) = @_; | |
| 387 $exename = 'blastall' unless defined $exename; | |
| 388 | |
| 389 if( defined $exe && -x $exe ) { | |
| 390 $self->{'_pathtoexe'}->{$exename} = $exe; | |
| 391 } | |
| 392 unless( defined $self->{'_pathtoexe'}->{$exename} ) { | |
| 393 my $f = $self->program_path($exename); | |
| 394 $exe = $self->{'_pathtoexe'}->{$exename} = $f if(-e $f && -x $f ); | |
| 395 | |
| 396 # This is how I meant to split up these conditionals --jason | |
| 397 # if exe is null we will execute this (handle the case where | |
| 398 # PROGRAMDIR pointed to something invalid) | |
| 399 unless( $exe ) { # we didn't find it in that last conditional | |
| 400 if( ($exe = $self->io->exists_exe($exename)) && -x $exe ) { | |
| 401 $self->{'_pathtoexe'}->{$exename} = $exe; | |
| 402 } else { | |
| 403 $self->warn("Cannot find executable for $exename") if $warn; | |
| 404 $self->{'_pathtoexe'}->{$exename} = undef; | |
| 405 } | |
| 406 } | |
| 407 } | |
| 408 return $self->{'_pathtoexe'}->{$exename}; | |
| 409 } | |
| 410 | |
| 411 | |
| 412 =head2 program_path | |
| 413 | |
| 414 Title : program_path | |
| 415 Usage : my $path = $factory->program_path(); | |
| 416 Function: Builds path for executable | |
| 417 Returns : string representing the full path to the exe | |
| 418 Args : none | |
| 419 | |
| 420 =cut | |
| 421 | |
| 422 sub program_path { | |
| 423 my ($self,$program_name) = @_; | |
| 424 my @path; | |
| 425 push @path, $self->program_dir if $self->program_dir; | |
| 426 push @path, $program_name .($^O =~ /mswin/i ?'.exe':''); | |
| 427 | |
| 428 return Bio::Root::IO->catfile(@path); | |
| 429 } | |
| 430 | |
| 431 =head2 program_dir | |
| 432 | |
| 433 Title : program_dir | |
| 434 Usage : my $dir = $factory->program_dir(); | |
| 435 Function: Abstract get method for dir of program. To be implemented | |
| 436 by wrapper. | |
| 437 Returns : string representing program directory | |
| 438 Args : none | |
| 439 | |
| 440 =cut | |
| 441 | |
| 442 sub program_dir { | |
| 443 $PROGRAMDIR; | |
| 444 } | |
| 445 | |
| 446 sub program { | |
| 447 my $self = shift; | |
| 448 if( wantarray ) { | |
| 449 return ($self->executable, $self->p()); | |
| 450 } else { | |
| 451 return $self->executable(@_); | |
| 452 } | |
| 453 } | |
| 454 | |
| 455 =head2 blastall | |
| 456 | |
| 457 Title : blastall | |
| 458 Usage : $blast_report = $factory->blastall('t/testquery.fa'); | |
| 459 or | |
| 460 $input = Bio::Seq->new(-id=>"test query", | |
| 461 -seq=>"ACTACCCTTTAAATCAGTGGGGG"); | |
| 462 $blast_report = $factory->blastall($input); | |
| 463 or | |
| 464 $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects | |
| 465 $blast_report = $factory->blastall(\@seq_array); | |
| 466 Returns : Reference to a Blast object or BPlite object | |
| 467 containing the blast report. | |
| 468 Args : Name of a file or Bio::Seq object or an array of | |
| 469 Bio::Seq object containing the query sequence(s). | |
| 470 Throws an exception if argument is not either a string | |
| 471 (eg a filename) or a reference to a Bio::Seq object | |
| 472 (or to an array of Seq objects). If argument is string, | |
| 473 throws exception if file corresponding to string name can | |
| 474 not be found. | |
| 475 | |
| 476 =cut | |
| 477 | |
| 478 sub blastall { | |
| 479 my ($self,$input1) = @_; | |
| 480 $self->io->_io_cleanup(); | |
| 481 my $executable = 'blastall'; | |
| 482 my $input2; | |
| 483 # Create input file pointer | |
| 484 my $infilename1 = $self->_setinput($executable, $input1); | |
| 485 if (! $infilename1) {$self->throw(" $input1 ($infilename1) not Bio::Seq object or array of Bio::Seq objects or file name!");} | |
| 486 | |
| 487 $self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastall) | |
| 488 | |
| 489 my $blast_report = &_generic_local_blast($self, $executable, | |
| 490 $input1, $input2); | |
| 491 } | |
| 492 | |
| 493 =head2 blastpgp | |
| 494 | |
| 495 Title : blastpgp | |
| 496 Usage : $blast_report = $factory-> blastpgp('t/testquery.fa'); | |
| 497 or | |
| 498 $input = Bio::Seq->new(-id=>"test query", | |
| 499 -seq=>"ACTADDEEQQPPTCADEEQQQVVGG"); | |
| 500 $blast_report = $factory->blastpgp ($input); | |
| 501 or | |
| 502 $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects | |
| 503 $blast_report = $factory-> blastpgp(\@seq_array); | |
| 504 Returns : Reference to a Blast object or BPlite object containing | |
| 505 the blast report. | |
| 506 Args : Name of a file or Bio::Seq object. In psiblast jumpstart | |
| 507 mode two additional arguments are required: a SimpleAlign | |
| 508 object one of whose elements is the query and a "mask" to | |
| 509 determine how BLAST should select scoring matrices see | |
| 510 DESCRIPTION above for more details. | |
| 511 | |
| 512 Throws an exception if argument is not either a string | |
| 513 (eg a filename) or a reference to a Bio::Seq object | |
| 514 (or to an array of Seq objects). If argument is string, | |
| 515 throws exception if file corresponding to string name can | |
| 516 not be found. | |
| 517 Returns : Reference to either a BPlite.pm, Blast.pm or BPpsilite.pm | |
| 518 object containing the blast report. | |
| 519 | |
| 520 =cut | |
| 521 | |
| 522 sub blastpgp { | |
| 523 my $self = shift; | |
| 524 my $executable = 'blastpgp'; | |
| 525 my $input1 = shift; | |
| 526 my $input2 = shift; | |
| 527 my $mask = shift; # used by blastpgp's -B option to specify which residues are position aligned | |
| 528 | |
| 529 my ($infilename1, $infilename2 ) = $self->_setinput($executable, | |
| 530 $input1, $input2, | |
| 531 $mask); | |
| 532 if (!$infilename1) {$self->throw(" $input1 not Bio::Seq object or array of Bio::Seq objects or file name!");} | |
| 533 $self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastpgp) | |
| 534 if ($input2) { | |
| 535 unless ($infilename2) {$self->throw("$input2 not SimpleAlign Object in pre-aligned psiblast\n");} | |
| 536 $self->B($infilename2); # set file name of partial alignment to inputfilename2 (-B param of blastpgp) | |
| 537 } | |
| 538 my $blast_report = &_generic_local_blast($self, $executable, $input1, $input2); | |
| 539 } | |
| 540 | |
| 541 =head2 bl2seq | |
| 542 | |
| 543 Title : bl2seq | |
| 544 Usage : $factory-> blastpgp('t/seq1.fa', 't/seq2.fa'); | |
| 545 or | |
| 546 $input1 = Bio::Seq->new(-id=>"test query1", | |
| 547 -seq=>"ACTADDEEQQPPTCADEEQQQVVGG"); | |
| 548 $input2 = Bio::Seq->new(-id=>"test query2", | |
| 549 -seq=>"ACTADDEMMMMMMMDEEQQQVVGG"); | |
| 550 $blast_report = $factory->bl2seq ($input1, $input2); | |
| 551 Returns : Reference to a BPbl2seq object containing the blast report. | |
| 552 Args : Names of 2 files or 2 Bio::Seq objects containing the | |
| 553 sequences to be aligned by bl2seq. | |
| 554 | |
| 555 Throws an exception if argument is not either a pair of | |
| 556 strings (eg filenames) or references to Bio::Seq objects. | |
| 557 If arguments are strings, throws exception if files | |
| 558 corresponding to string names can not be found. | |
| 559 | |
| 560 =cut | |
| 561 | |
| 562 sub bl2seq { | |
| 563 my $self = shift; | |
| 564 my $executable = 'bl2seq'; | |
| 565 my $input1 = shift; | |
| 566 my $input2 = shift; | |
| 567 | |
| 568 # Create input file pointer | |
| 569 my ($infilename1, $infilename2 ) = $self->_setinput($executable, | |
| 570 $input1, $input2); | |
| 571 if (!$infilename1){$self->throw(" $input1 not Seq Object or file name!");} | |
| 572 if (!$infilename2){$self->throw("$input2 not Seq Object or file name!");} | |
| 573 | |
| 574 $self->i($infilename1); # set file name of first sequence to | |
| 575 # be aligned to inputfilename1 | |
| 576 # (-i param of bl2seq) | |
| 577 $self->j($infilename2); # set file name of first sequence to | |
| 578 # be aligned to inputfilename2 | |
| 579 # (-j param of bl2seq) | |
| 580 | |
| 581 my $blast_report = &_generic_local_blast($self, $executable); | |
| 582 } | |
| 583 ################################################# | |
| 584 | |
| 585 =head2 _generic_local_blast | |
| 586 | |
| 587 Title : _generic_local_blast | |
| 588 Usage : internal function not called directly | |
| 589 Returns : Blast or BPlite object | |
| 590 Args : Reference to calling object and name of BLAST executable | |
| 591 | |
| 592 =cut | |
| 593 | |
| 594 sub _generic_local_blast { | |
| 595 my $self = shift; | |
| 596 my $executable = shift; | |
| 597 | |
| 598 # Create parameter string to pass to Blast program | |
| 599 my $param_string = $self->_setparams($executable); | |
| 600 | |
| 601 # run Blast | |
| 602 my $blast_report = &_runblast($self, $executable, $param_string); | |
| 603 } | |
| 604 | |
| 605 | |
| 606 =head2 _runblast | |
| 607 | |
| 608 Title : _runblast | |
| 609 Usage : Internal function, not to be called directly | |
| 610 Function: makes actual system call to Blast program | |
| 611 Example : | |
| 612 Returns : Report object in the appropriate format (BPlite, | |
| 613 BPpsilite, Blast, or BPbl2seq) | |
| 614 Args : Reference to calling object, name of BLAST executable, | |
| 615 and parameter string for executable | |
| 616 | |
| 617 =cut | |
| 618 | |
| 619 sub _runblast { | |
| 620 my ($self,$executable,$param_string) = @_; | |
| 621 my ($blast_obj,$exe); | |
| 622 if( ! ($exe = $self->executable($executable)) ) { | |
| 623 $self->warn("cannot find path to $executable"); | |
| 624 return undef; | |
| 625 } | |
| 626 my $commandstring = $exe. $param_string; | |
| 627 | |
| 628 # next line for debugging | |
| 629 $self->debug( "$commandstring \n"); | |
| 630 | |
| 631 my $status = system($commandstring); | |
| 632 | |
| 633 $self->throw("$executable call crashed: $? $commandstring\n") unless ($status==0) ; | |
| 634 my $outfile = $self->o() ; # get outputfilename | |
| 635 my $signif = $self->e() || 1e-5 ; | |
| 636 | |
| 637 # set significance cutoff to set expectation value or default value | |
| 638 # (may want to make this value vary for different executables) | |
| 639 | |
| 640 # If running bl2seq or psiblast (blastpgp with multiple iterations), | |
| 641 # the specific parsers for these programs must be used (ie BPbl2seq or | |
| 642 # BPpsilite). Otherwise either the Blast parser or the BPlite | |
| 643 # parsers can be selected. | |
| 644 | |
| 645 if ($executable =~ /bl2seq/i) { | |
| 646 if( $self->verbose > 0 ) { | |
| 647 open(OUT, $outfile) || $self->throw("cannot open $outfile"); | |
| 648 while(<OUT>) { $self->debug($_)} | |
| 649 close(OUT); | |
| 650 } | |
| 651 # Added program info so BPbl2seq can compute strand info | |
| 652 $blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile, | |
| 653 -REPORT_TYPE => $self->p ); | |
| 654 # $blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile); | |
| 655 } | |
| 656 elsif ($executable =~ /blastpgp/i && defined $self->j() && | |
| 657 $self->j() > 1) { | |
| 658 print "using psilite parser\n"; | |
| 659 $blast_obj = Bio::Tools::BPpsilite->new(-file => $outfile); | |
| 660 } | |
| 661 elsif ($self->_READMETHOD =~ /^Blast/i ) { | |
| 662 $blast_obj = Bio::SearchIO->new(-file=>$outfile, | |
| 663 -format => 'blast' ) ; | |
| 664 } | |
| 665 elsif ($self->_READMETHOD =~ /^BPlite/i ) { | |
| 666 $blast_obj = Bio::Tools::BPlite->new(-file=>$outfile); | |
| 667 } else { | |
| 668 $self->warn("Unrecognized readmethod ".$self->_READMETHOD. " or executable $executable\n"); | |
| 669 } | |
| 670 return $blast_obj; | |
| 671 } | |
| 672 | |
| 673 =head2 _setinput | |
| 674 | |
| 675 Title : _setinput | |
| 676 Usage : Internal function, not to be called directly | |
| 677 Function: Create input file(s) for Blast executable | |
| 678 Example : | |
| 679 Returns : name of file containing Blast data input | |
| 680 Args : Seq object reference or input file name | |
| 681 | |
| 682 =cut | |
| 683 | |
| 684 sub _setinput { | |
| 685 my ($self, $executable, $input1, $input2) = @_; | |
| 686 my ($seq, $temp, $infilename1, $infilename2,$fh ) ; | |
| 687 # If $input1 is not a reference it better be the name of a file with | |
| 688 # the sequence/ alignment data... | |
| 689 $self->io->_io_cleanup(); | |
| 690 | |
| 691 SWITCH: { | |
| 692 unless (ref $input1) { | |
| 693 $infilename1 = (-e $input1) ? $input1 : 0 ; | |
| 694 last SWITCH; | |
| 695 } | |
| 696 # $input may be an array of BioSeq objects... | |
| 697 if (ref($input1) =~ /ARRAY/i ) { | |
| 698 ($fh,$infilename1) = $self->io->tempfile(); | |
| 699 $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); | |
| 700 foreach $seq (@$input1) { | |
| 701 unless ($seq->isa("Bio::PrimarySeqI")) {return 0;} | |
| 702 $temp->write_seq($seq); | |
| 703 } | |
| 704 close $fh; | |
| 705 $fh = undef; | |
| 706 last SWITCH; | |
| 707 } | |
| 708 # $input may be a single BioSeq object... | |
| 709 elsif ($input1->isa("Bio::PrimarySeqI")) { | |
| 710 ($fh,$infilename1) = $self->io->tempfile(); | |
| 711 | |
| 712 # just in case $input1 is taken from an alignment and has spaces (ie | |
| 713 # deletions) indicated within it, we have to remove them - otherwise | |
| 714 # the BLAST programs will be unhappy | |
| 715 | |
| 716 my $seq_string = $input1->seq(); | |
| 717 $seq_string =~ s/\W+//g; # get rid of spaces in sequence | |
| 718 $input1->seq($seq_string); | |
| 719 $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); | |
| 720 $temp->write_seq($input1); | |
| 721 close $fh; | |
| 722 undef $fh; | |
| 723 # $temp->write_seq($input1); | |
| 724 last SWITCH; | |
| 725 } | |
| 726 $infilename1 = 0; # Set error flag if you get here | |
| 727 } # End SWITCH | |
| 728 unless ($input2) { return $infilename1; } | |
| 729 SWITCH2: { | |
| 730 unless (ref $input2) { | |
| 731 $infilename2 = (-e $input2) ? $input2 : 0 ; | |
| 732 last SWITCH2; | |
| 733 } | |
| 734 if ($input2->isa("Bio::PrimarySeqI") && $executable eq 'bl2seq' ) { | |
| 735 ($fh,$infilename2) = $self->io->tempfile(); | |
| 736 | |
| 737 $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); | |
| 738 $temp->write_seq($input2); | |
| 739 close $fh; | |
| 740 undef $fh; | |
| 741 last SWITCH2; | |
| 742 } | |
| 743 # Option for using psiblast's pre-alignment "jumpstart" feature | |
| 744 elsif ($input2->isa("Bio::SimpleAlign") && | |
| 745 $executable eq 'blastpgp' ) { | |
| 746 # a bit of a lie since it won't be a fasta file | |
| 747 ($fh,$infilename2) = $self->io->tempfile(); | |
| 748 | |
| 749 # first we retrieve the "mask" that determines which residues should | |
| 750 # by scored according to their position and which should be scored | |
| 751 # using the non-position-specific matrices | |
| 752 | |
| 753 my @mask = split("", shift ); # get mask | |
| 754 | |
| 755 # then we have to convert all the residues in every sequence to upper | |
| 756 # case at the positions that we want psiblast to use position specific | |
| 757 # scoring | |
| 758 | |
| 759 foreach $seq ( $input2->each_seq() ) { | |
| 760 my @seqstringlist = split("",$seq->seq()); | |
| 761 for (my $i = 0; $i < scalar(@mask); $i++) { | |
| 762 unless ( $seqstringlist[$i] =~ /[a-zA-Z]/ ) {next} | |
| 763 $seqstringlist[$i] = $mask[$i] ? uc $seqstringlist[$i]: lc $seqstringlist[$i] ; | |
| 764 } | |
| 765 my $newseqstring = join("", @seqstringlist); | |
| 766 $seq->seq($newseqstring); | |
| 767 } | |
| 768 # Now we need to write out the alignment to a file | |
| 769 # in the "psi format" which psiblast is expecting | |
| 770 $input2->map_chars('\.','-'); | |
| 771 $temp = Bio::AlignIO->new(-fh=> $fh, '-format' => 'psi'); | |
| 772 $temp->write_aln($input2); | |
| 773 close $fh; | |
| 774 undef $fh; | |
| 775 last SWITCH2; | |
| 776 } | |
| 777 $infilename2 = 0; # Set error flag if you get here | |
| 778 } # End SWITCH2 | |
| 779 return ($infilename1, $infilename2); | |
| 780 } | |
| 781 | |
| 782 =head2 _setparams | |
| 783 | |
| 784 Title : _setparams | |
| 785 Usage : Internal function, not to be called directly | |
| 786 Function: Create parameter inputs for Blast program | |
| 787 Example : | |
| 788 Returns : parameter string to be passed to Blast | |
| 789 Args : Reference to calling object and name of BLAST executable | |
| 790 | |
| 791 =cut | |
| 792 | |
| 793 sub _setparams { | |
| 794 my ($self,$executable) = @_; | |
| 795 my ($attr, $value, @execparams); | |
| 796 | |
| 797 if ($executable eq 'blastall') {@execparams = @BLASTALL_PARAMS; } | |
| 798 if ($executable eq 'blastpgp') {@execparams = @BLASTPGP_PARAMS; } | |
| 799 if ($executable eq 'bl2seq') {@execparams = @BL2SEQ_PARAMS; } | |
| 800 | |
| 801 my $param_string = ""; | |
| 802 for $attr ( @execparams ) { | |
| 803 $value = $self->$attr(); | |
| 804 next unless (defined $value); | |
| 805 # Need to prepend datadirectory to database name | |
| 806 if ($attr eq 'd' && ($executable ne 'bl2seq')) { | |
| 807 # This is added so that you can specify a DB with a full path | |
| 808 if (! (-e $value.".nin" || -e $value.".pin")){ | |
| 809 $value = File::Spec->catdir($DATADIR,$value); | |
| 810 } | |
| 811 } | |
| 812 # put params in format expected by Blast | |
| 813 $attr = '-'. $attr ; | |
| 814 $param_string .= " $attr $value "; | |
| 815 } | |
| 816 | |
| 817 # if ($self->quiet()) { $param_string .= ' >/dev/null';} | |
| 818 | |
| 819 return $param_string; | |
| 820 } | |
| 821 | |
| 822 | |
| 823 =head1 Bio::Tools::Run::Wrapper methods | |
| 824 | |
| 825 =cut | |
| 826 | |
| 827 =head2 no_param_checks | |
| 828 | |
| 829 Title : no_param_checks | |
| 830 Usage : $obj->no_param_checks($newval) | |
| 831 Function: Boolean flag as to whether or not we should | |
| 832 trust the sanity checks for parameter values | |
| 833 Returns : value of no_param_checks | |
| 834 Args : newvalue (optional) | |
| 835 | |
| 836 | |
| 837 =cut | |
| 838 | |
| 839 =head2 save_tempfiles | |
| 840 | |
| 841 Title : save_tempfiles | |
| 842 Usage : $obj->save_tempfiles($newval) | |
| 843 Function: | |
| 844 Returns : value of save_tempfiles | |
| 845 Args : newvalue (optional) | |
| 846 | |
| 847 | |
| 848 =cut | |
| 849 | |
| 850 =head2 outfile_name | |
| 851 | |
| 852 Title : outfile_name | |
| 853 Usage : my $outfile = $tcoffee->outfile_name(); | |
| 854 Function: Get/Set the name of the output file for this run | |
| 855 (if you wanted to do something special) | |
| 856 Returns : string | |
| 857 Args : [optional] string to set value to | |
| 858 | |
| 859 | |
| 860 =cut | |
| 861 | |
| 862 | |
| 863 =head2 tempdir | |
| 864 | |
| 865 Title : tempdir | |
| 866 Usage : my $tmpdir = $self->tempdir(); | |
| 867 Function: Retrieve a temporary directory name (which is created) | |
| 868 Returns : string which is the name of the temporary directory | |
| 869 Args : none | |
| 870 | |
| 871 | |
| 872 =cut | |
| 873 | |
| 874 =head2 cleanup | |
| 875 | |
| 876 Title : cleanup | |
| 877 Usage : $tcoffee->cleanup(); | |
| 878 Function: Will cleanup the tempdir directory after a PAML run | |
| 879 Returns : none | |
| 880 Args : none | |
| 881 | |
| 882 | |
| 883 =cut | |
| 884 | |
| 885 =head2 io | |
| 886 | |
| 887 Title : io | |
| 888 Usage : $obj->io($newval) | |
| 889 Function: Gets a L<Bio::Root::IO> object | |
| 890 Returns : L<Bio::Root::IO> | |
| 891 Args : none | |
| 892 | |
| 893 | |
| 894 =cut | |
| 895 | |
| 896 sub DESTROY { | |
| 897 my $self= shift; | |
| 898 unless ( $self->save_tempfiles ) { | |
| 899 $self->cleanup(); | |
| 900 } | |
| 901 $self->SUPER::DESTROY(); | |
| 902 } | |
| 903 | |
| 904 1; | |
| 905 __END__ |
